Hadoop and Big Data Sets – MITU Skillologies – Aritificial Intelligence, Data Science

Big Data Overview
What is Big Data?

Benefits of Big Data

Big Data Technologies
Big Data Solutions
• Traditional Approach

• Hadoop Architecture

• MapReduce

• Hadoop Distributed File System

• How Does Hadoop Work?
Environment Setup
• Preinstallation Setup

• SSH Setup and Key Generation

• Installing Java

• Downloading Hadoop

• Hadoop Operation Modes

• Installing Hadoop in Standalone Mode

• Installing Hadoop in Pseudo Distributed Mode

• Verifying Hadoop Installation
HDFS Overview
• Features of HDFS

• HDFS Architecture

• Goals of HDFS
HDFS Operations
• Starting HDFS

• Listing Files in HDFS

• Inserting Data into HDFS

• Retrieving Data from HDFS

• Shutting Down the HDFS

• Hadoop Commands
MapReduce
• What is MapReduce?

• Inputs and Outputs (Java Perspective)•

• Compilation and Execution of Process Units Program

• Important Commands

• How to Interact with MapReduce Jobs
Streaming
• Mapper and Reducer

• Exmples using Python

• How streaming works
Multi Node Cluster
• Creating User Account

• Mapping the nodes

• Configuring Key Based Login

• Installing Hadoop

• Configuring Hadoop

• Installing Hadoop on Slave Servers

• Configuring Hadoop on Master Server

• Starting Hadoop Services

• Adding a New DataNode in the Hadoop Cluster

• Adding User and SSH Access

• Set Hostname of New Node

• Start the DataNode on New Node

Download Presentations

01. Overview of BigData
02. BigData Solution
03. Hadoop Installation
04. HDFS Overview
05. HDFS Operations
06. MapReduce
07. Multinode Clusters

08. Hive Installation on Ubuntu

Sample Programs :
Download zip file contains : ProcessUnits Java Program, Mapper Python Program, Reducer Python Program, Stream Program
Statistical Text File