Software Requirements:
Operating System: Ubuntu 16.04 LTS.
Hardware Requirements:
Processor: Pentium Dual Core +
Internet Connection
Prerequisites: Basic knowledge of programming and data analysis.
Contents
Module-1
Getting started: The basics of R
– Why and what is R?
– Setting up your machine
– R Studio and R-Core
R Programming Constructs
– Basic Syntax, Data Types
– Variables, Operators,
– Decision Making,
– Loops, Functions,
– Strings, Vectors,
– Lists, Matrix, Array,
– Factors, Data Frames
– Data shaping and reshaping
– R packages use and installation
Module-2
R Data Interfaces
– Loading the CSV Files
– Excel Files
– Web data
– XML Data
– Database connectivity—MySQL
– Inter-data communication and conversion
R charts and Graphs
– Drawing Pie Charts
– Bar Charts
– Box plots
– Histograms
– Line Graphs
– Scatter Plots
The ggplot2 package, plotrix package
Module-3
Basic Data Analytics using R (on real world datasets)
– Create data subsets
– Merge Data
– Sort Data
– Transposing Data
– Melting Data to long format
– Casting data to wide format
Data Preprocessing in R (on real world datasets)
– Data cleaning
– Data integration
– Data transformation
– Error correcting
The Exploratory data analysis (on real world datasets)
– Training and testing data
– Data cleaning
– Label encoding
– One Hot encoding
Module-4
Forecasting Numeric Data – Regression Methods
– Understanding regression
– Simple linear regression
– Multiple linear regression
– Example – predicting medical expenses using linear regression collecting data exploring and preparing the data
– Exploring relationships among features – the correlation matrix
– Visualizing relationships among features – the scatterplot matrix
– Training a model on the data
– Evaluating model performance
– Improving model performance
Lazy Learning – Classification Using Nearest Neighbors
– Understanding nearest neighbor classification
– The k-NN algorithm
– Measuring similarity with distance
– Choosing an appropriate k
– Preparing data for use with k-NN
– Why is the k-NN algorithm lazy?
– Example – diagnosing breast cancer with the k-NN algorithm
– Collecting data
– Exploring and preparing the data
– Transformation – normalizing numeric data
– Data preparation – creating training and test datasets
– Training a model on the data
– Evaluating model performance
– Improving model performance
Module-5
Divide and Conquer – Classification Using Decision Trees & Rules
– Understanding decision trees
– Divide and conquer
– The C5.0 decision tree algorithm
– Choosing the best split
– Pruning the decision tree
– Example – identifying risky bank loans using C5.0 decision trees
– Collecting data
– Exploring and preparing the data
– Data preparation – creating random training and test datasets
– Training a model on the data
– Evaluating model performance
– Improving model performance
Finding Patterns – Market Basket Analysis with Association Rules
– Understanding association rules
– The Apriori algorithm for association rule learning
– Measuring rule interest – support and confidence
– Building a set of rules with the Apriori principle
– Example – identifying frequently purchased groceries with association rules
– Collecting data
– Exploring and preparing the data
– Data preparation – creating a sparse matrix for transaction data
– Visualizing item support – item frequency plots
– Visualizing the transaction data – plotting the sparse matrix
– Training a model on the data
– Evaluating model performance
– Improving model performance
Finding Groups of Data – Clustering with k-means
– Understanding clustering
– Clustering as a machine learning task
– The k-means clustering algorithm
– Using distance to assign and update clusters
– Choosing the appropriate number of clusters
– Example – finding teen market segments using k-means clustering
– Collecting data
– Exploring and preparing the data
– Data preparation – dummy coding missing values
– Data preparation – imputing the missing values
– Training a model on the data
– Evaluating model performance
– Improving model performance