Yashikha.com - One of the Leading Companies for Corporate Trainings in Emerging Technologies and workshops in Mumbai and India.

BigData Training Program

Who should attend?

  • CXOs wanting to understand Blockchain concepts
  • Startup Founders building disrupting solutions
  • CTOs developing
  • Business Owners, SME, MSMEs, Business Decision Makers

Learning Outcomes:

Yashikha's BigData Course for Corporates will provide an overview of the structure and mechanism of Hadoop, HDFCS, YARN, Hive, Pig, Impala, Python, RDD, Spark modules.

Decision makers, CXOs and startups shall be able to embrace Blockchain technology to manufacture disruptive solutions for their clients and in turn evolve their own business models.

Course Duration:

15 Hours

Course Pedagogy:

  • Why Apache Hadoop?
  • Problem in Data Driven Businesses
  • How Hadoop Solves it and why Big Data Solutions
  • What comprises of Hadoop, Subprojects and Ecosystem
  • HDFS
  • HDFS Feature
  • HDFS Architecture – Non HA
  • HDFS Architecture – HA
  • Writing and Reading Files in HDFS
  • NameNode Memory and Load Handling
  • Basic HDFS Security
  • HDFS CLI
  • HDFS UIs
  • Mapreduce and YARN
  • YARN Architecture
  • MapReduce Architecture and Hands-on
  • Spark Architecture and Hands-on
  • How YARN executes MR and Spark jobs
  • How to see YARN Applications in WEB UIs and Shell
  • YARN Application Logs
  • Hands-on on all the above
  • Data Ingestion in Hadoop Cluster
  • Data Ingestion using Flume – Architecture and Hands-on
  • Data Ingestion using Sqoop – Architecture and Hands-on
  • Data Ingestion using Kafka – Architecture and Hands-on
  • Hadoop REST APIs
  • Introduction & Hands on to Hive, Pig, Impala/Tez
  • Introduction to Hadoop Clients and Hue Interface
  • Install and configure Hadoop Clients on Gateway
  • Install and configure Hue
  • Introduction to Apache Oozie
  • Introduction to Python/Scala
  • Features of Functional Paradigm
  • Mutable and Immutable Data
  • First Class Functions
  • Variables, Control structures, Functions and Objects
  • Hadoop Data Formats
  • Introduction to Data Formats
  • Introduction to AVRO
  • Parquet
  • Compressions
  • Overview of Partitions
  • Partitions in Hive and Impala
  • Dealing with Hive Partition Tables
  • Hands-on – Hive partition tables
  • Spark – High Level Entry to Development
  • Directed Acyclic Graph
  • Types of Spark CLI – Spark-shell and pyspark
  • Functional Programming in spark
  • Introduction to Spark RDD
  • Hands-on – Running Spark applications using spark CLI
  • How RDDs are created from files or data in memory
  • Handling File Formats
  • Additional Operations on RDD
  • Hands-on Process Data Files using spark RDD
  • Key Value Pair RDD
  • Other Pair RDD Concepts
  • Pair RDD to join Datasets
  • Hands-on – Using Pair RDD to join Dataset in spark cli
  • How to write a spark application – Scala and Pyspark
  • Run Spark Appliations in YARN
  • Access Spark Application Web UI and controlling the applications
  • Configuring application properties and Loggings
  • Hands-On – Writing a spark applications – pyspark and Scala
  • Hands-on Configuring a spark applications
  • Parallel Processing in Spark
  • RDD partitions
  • Partitions of File-Based RDDs
  • HDFS and Data Locality
  • Executing parallel operations
  • Stages and Tasks
  • Hands-on – Viewing stages and jobs in spark applicationUI
  • RDD Persistence
  • Spark Data Processing patterns
  • Spark Dataframes
  • Spark SQLContext
  • Spark ML Libraries
  • Hands On – All the above