avvacadotechinfo

Apache Spark and Scala Certification Training

SUPPORT TOLL FREE NO : 1-312-4769-976

Apache Spark and Scala Certification Training is designed to provide knowledge and skills to become a successful Spark Developer and prepare you for the Cloudera Certified Associate Spark Hadoop Developer Certification Exam CCA175. You will get in-depth knowledge of concepts such as HDFS, Flume, Sqoop, RDDs, Spark Streaming, MLlib, SparkSQL, Kafka cluster & API by taking this Course.

128K + satisfied learners. Reviews

Self - Paced Learning

569

449

Training Features

Course Duration

You will undergo self-paced learning where you will get an in-depth knowledge of various concepts that will be covered in the course.

Real-life Case Studies

Towards the end of the training, you will be working on a project where you will implement the techniques learnt to visualize.

Assignments

Each class has practical assignments which shall be finished before the next class and helps you to apply the concepts taught during

24 x 7 Expert Support

We have 24x7 online support team to resolve all your technical queries, through ticket based tracking system, for the lifetime.

Forum

We have a community forum for all our customers that further facilitates learning through peer interaction and knowledge

Course Description

About the Apache Spark course

Apache Spark Certification Training Course is designed to provide knowledge and skills to become a successful Big Data Developer.

You will understand basics of Big Data and Hadoop. You will learn how Spark enables in-memory data processing and runs much faster than Hadoop MapReduce. You will also learn about RDDs, different APIs, which Spark offers such as Spark Streaming, MLlib, Clustering, and Spark SQL. This Avvacado Tech Info course is an integral part of a Big Data Developer's Career path. It will also encompass the fundamental concepts like data capturing using Flume, data loading using Sqoop, Kafka cluster, Kafka API.

This course is designed to provide knowledge and skills to become a successful Spark and Hadoop Developer and would help to clear the CCA Spark and Hadoop Developer (CCA175) Examination.

Why should go for this training?

Market for Big Data analytics is growing across the world and this strong growth pattern translates into a great opportunity for all the IT Professionals. Here are the few Professional IT groups, who are continuously enjoying the benefits moving into Big data domain:

Developers and Architects
BI /ETL/DW professionals
Senior IT Professionals
Testing professionals
Mainframe professionals
Freshers
Big Data enthusiasts
Software Architects, Engineers and Developers
Data Scientists and Analytics professionals

What are the pre-requisites for this course?

As such, there are no pre-requisites for this course. Knowledge of Scala will definitely be a plus point for learning Spark, but is not mandatory.

Curriculum

Introduction to Scala for Apache Spark

Objectives - In this module, you will understand the basics of Scala that are required for programming Spark applications. You will learn about the basic constructs of Scala such as variable types, control structures, collections, and more.

Topics:

o What is Scala?

o Why Scala for Spark?

o Scala in other frameworks

o Introduction to Scala REPL

o Basic Scala operations

o Variable Types in Scala

o Control Structures in Scala

o Foreach loop, Functions and Procedures

o Collections in Scala- Array

o ArrayBuffer, Map, Tuples, Lists, and more

Hands On:

o Scala REPL Detailed Demo

OOPS and Functional Programming in Scala

Objectives - In this module, you will learn about object oriented programming and functional programming techniques in Scala.

Topics:

o Class in Scala

o Getters and Setters

o Custom Getters and Setters

o Properties with only Getters

o Auxiliary Constructor and Primary Constructor

o Singletons

o Extending a Class

o Overriding Methods

o Traits as Interfaces and Layered Traits

o Functional Programming

o Higher Order Functions

o Anonymous Functions, and more

Hands On:

o Case Class Demo

o Layered Traits

Introduction to Big Data & Hadoop

Objectives - In this module, you will understand Big Data, the limitations of the existing solutions for Big Data problem, how Hadoop solves the Big Data problem, Hadoop ecosystem components, Hadoop Architecture, HDFS, Rack Awareness, and Replication. You will learn about the Hadoop Cluster Architecture, important configuration files in a Hadoop Cluster. You will get an overview of Apache Sqoop and how it is used in importing and exporting tables from RDBMS to HDFS & vice versa.

Topics:

o What is Big Data?

o Big Data Customer Scenarios

o Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case

o How Hadoop Solves the Big Data Problem

o What is Hadoop?

o Hadoop’s Key Characteristics

o Hadoop Ecosystem and HDFS

o Hadoop Core Components

o Rack Awareness and Block Replication

o Avvacado Tech Info’s VM Tour

o YARN and Its Advantage

o Hadoop Cluster and Its Architecture

o Hadoop: Different Cluster Modes

o Data Loading using Sqoop

Hands-On:

o A Tour of Avvacado Tech Info’s Hadoop & Spark VM

o Basic Hadoop Commands

o Importing and Exporting Data Using Sqoop

Apache Spark Framework

Objectives - In this module, you will understand different frameworks available for Big Data Analytics and the module also includes a first-hand introduction to Spark, demo on Building and Running a Spark Application and Web UI.

Topics:

o Big Data Analytics with Batch & Real-Time Processing

o Why Spark is Needed?

o What is Spark?

o How Spark Differs from Its Competitors?

o Spark at eBay

o Spark’s Place in Hadoop Ecosystem

o Spark Components & It’s Architecture

o Running Programs on Scala IDE & Spark Shell

o Spark Web UI

o Configuring Spark Properties

Hands On:

o Building and Running Spark Application

o Spark Application Web UI

o Configuring Spark Properties

Playing with RDDs

Objectives - In this module, you will learn one of the fundamental building blocks of Spark - RDDs and related manipulations for implementing business logics (Transformations, Actions and Functions performed on RDD). You will learn about Spark applications, how it is developed and configuring Spark properties.

Topics:

o Challenges in Existing Computing Methods

o Probable Solution & How RDD Solves the Problem

o What is RDD, It’s Functions, Transformations & Actions?

o Data Loading and Saving Through RDDs

o Key-Value Pair RDDs and Other Pair RDDs

o RDD Lineage

o RDD Persistence

o WordCount Program Using RDD Concepts

o RDD Partitioning & How It Helps Achieve Parallelization

Hands On:

o Loading data in RDDs

o Saving data through RDDs

o RDD Transformations

o RDD Actions and Functions

o RDD Partitions

o WordCount through RDDs

DataFrames and Spark SQL

Objectives - In this module, you will learn about Spark SQL which is used to process structured data with SQL queries. You will learn about data-frames and datasets in Spark SQL and perform SQL operations on data-frames.

Topics:

o Need for Spark SQL

o What is Spark SQL?

o Spark SQL Architecture

o SQL Context in Spark SQL

o Data Frames & Datasets

o Interoperating with RDDs

o JSON and Parquet File Formats

o Loading Data through Different Sources

Hands On:

o Spark SQL – Creating data frames

o Loading and transforming data through different sources

o Stock Market Analysis

Machine learning using Spark MLlib

Objectives – In this module you will learn about what is the need for machine learning, types of ML concepts, clustering and MLlib (i.e. Spark’s machine learning library), various algorithms supported by MLlib and implement K-Means Clustering.

Topics:

o What is Machine Learning?

o Where is Machine Learning Used?

o Different Types of Machine Learning Techniques

o Face Detection: USE CASE

o Understanding MLlib

o Features of MLlib and MLlib Tools

o Various ML algorithms supported by MLlib

o K-Means Clustering & How It Works with MLlib

o Analysis on US Election Data: K-Means MLlib USE CASE

Hands On:

o Machine Learning MLlib

o K- Means Clustering

Understanding Apache Kafka and Kafka Cluster

Objectives - In this module, you will understand Kafka and Kafka Architecture. Afterwards you will go through the details of Kafka Cluster and you will also learn how to configure different types of Kafka Cluster.

Topics:

o Need for Kafka

o What is Kafka?

o Core Concepts of Kafka

o Kafka Architecture

o Where is Kafka Used?

o Understanding the Components of Kafka Cluster

o Configuring Kafka Cluster

o Producer and Consumer

Hands On:

o Configuring Single Node Single Broker Cluster

o Configuring Single Node Multi Broker Cluster

Capturing Data with Apache Flume and Integration with Kafka

Objectives – In this module you will get an introduction to Apache Flume and its basic architecture and how it is integrated with Apache Kafka for event processing.

Topics:

o Need of Apache Flume

o What is Apache Flume?

o Basic Flume Architecture

o Flume Sources

o Flume Sinks

o Flume Channels

o Flume Configuration

o Integrating Apache Flume and Apache Kafka

Hands On:

o Flume Commands

o Setting up Flume Agent

o Streaming Twitter Data into HDFS

Apache Spark Streaming

Objectives – In this module you will get an opportunity to work on Spark streaming which is used to build scalable fault-tolerant streaming applications. You will learn about DStreams and various Transformations performed on it. You will get to know about main streaming operators, Sliding Window Operators and Stateful Operators.

Topics:

o Drawbacks in Existing Computing Methods

o Why Streaming is Necessary?

o What is Spark Streaming?

o Spark Streaming Features

o Spark Streaming Workflow

o How Uber Uses Streaming Data

o Streaming Context & DStreams

o Transformations on DStreams

o WordCount Program using Spark Streaming

o Describe Windowed Operators and Why it is Useful

o Important Windowed Operators

o Slice, Window and ReduceByWindow Operators

o Stateful Operators

o Perform Twitter Sentimental Analysis Using Spark Streaming

Hands On:

• Creating DStreams

• Transactions and Actions performed on DStreams.

• Output Operations in DStreams

• Sliding Window Operations

• Stateful Operations

• Twitter Sentimental Analysis