Software Requirements:
Operating System: Ubuntu 16.04 LTS.

Hardware Requirements:
Processor: Pentium Dual Core +
Internet Connection

Prerequisites: Basic knowledge of programming and data analysis.

Contents

Module-1

Getting started:  The basics of R
–   Why and what is R?
–   Setting up your machine
–   R Studio and R-Core
R Programming Constructs
–   Basic Syntax, Data Types
–   Variables, Operators,
–   Decision Making,
–   Loops, Functions,
–   Strings, Vectors,
–   Lists, Matrix, Array,
–   Factors, Data Frames
–   Data shaping and reshaping
–   R packages use and installation

Module-2

R Data Interfaces
–   Loading the CSV Files
–   Excel Files
–   Web data
–   XML Data
–   Database connectivity—MySQL
–   Inter-data communication and conversion
R charts and Graphs
–   Drawing Pie Charts
–   Bar Charts
–   Box plots
–   Histograms
–   Line Graphs
–   Scatter Plots
The ggplot2 package, plotrix package

Module-3

Basic Data Analytics using R (on real world datasets)
–   Create data subsets
–   Merge Data
–   Sort Data
–   Transposing Data
–   Melting Data to long format
–   Casting data to wide format
Data Preprocessing in R (on real world datasets)
–   Data cleaning
–   Data integration
–   Data transformation
–   Error correcting
The Exploratory data analysis (on real world datasets)
–  Training and testing data
–  Data cleaning
–  Label encoding
–  One Hot encoding

Module-4

Forecasting Numeric Data – Regression Methods    
–  Understanding regression
–  Simple linear regression
–  Multiple linear regression
–  Example – predicting medical expenses using linear regression collecting data exploring and preparing the data
–  Exploring relationships among features – the correlation matrix
–  Visualizing relationships among features – the scatterplot matrix
–  Training a model on the data
–  Evaluating model performance
–  Improving model performance
Lazy Learning – Classification Using Nearest Neighbors    
–  Understanding nearest neighbor classification
–  The k-NN algorithm
–  Measuring similarity with distance
–  Choosing an appropriate k
–  Preparing data for use with k-NN
–  Why is the k-NN algorithm lazy?
–  Example – diagnosing breast cancer with the k-NN algorithm
–  Collecting data
–  Exploring and preparing the data
–  Transformation – normalizing numeric data
–  Data preparation – creating training and test datasets
–  Training a model on the data
–  Evaluating model performance
–  Improving model performance

Module-5

Divide and Conquer – Classification Using Decision Trees & Rules    
–  Understanding decision trees
–  Divide and conquer
–  The C5.0 decision tree algorithm
–  Choosing the best split
–  Pruning the decision tree
–  Example – identifying risky bank loans using C5.0 decision trees
–  Collecting data
–  Exploring and preparing the data
–  Data preparation – creating random training and test datasets
–  Training a model on the data
–  Evaluating model performance
–  Improving model performance
Finding Patterns – Market Basket Analysis with Association Rules    
–  Understanding association rules
–  The Apriori algorithm for association rule learning
–  Measuring rule interest – support and confidence
–  Building a set of rules with the Apriori principle
–  Example – identifying frequently purchased groceries with association rules
–  Collecting data
–  Exploring and preparing the data
–  Data preparation – creating a sparse matrix for transaction data
–  Visualizing item support – item frequency plots
–  Visualizing the transaction data – plotting the sparse matrix
–  Training a model on the data
–  Evaluating model performance
–  Improving model performance
Finding Groups of Data – Clustering with k-means    
–  Understanding clustering
–  Clustering as a machine learning task
–  The k-means clustering algorithm
–  Using distance to assign and update clusters
–  Choosing the appropriate number of clusters
–  Example – finding teen market segments using k-means clustering
–  Collecting data
–  Exploring and preparing the data
–  Data preparation – dummy coding missing values
–  Data preparation – imputing the missing values
–  Training a model on the data
–  Evaluating model performance
–  Improving model performance