Understanding Spark basics - Overview of Big Data and Spark, Installing Spark, Distributed data processing system, Spark shell.
Writing Spark applications, Spark algorithms, Sparks core APIs in Scala/Java or in Python, Sparks architecture and developer API, Predictive analytics based on MLlib, Clustering with KMeans, Building classifiers, Modeling, Visualization techniques (matplotlib, ggplot2, D3, etc.).
Streaming architecture: How DStreams break down into RDD batches, Receivers running inside Executor task slots, Kafka, Multiple receivers, Union transformation, Sliding window operations on DStreams, Stateless transformations, Statefull transformation, Window transformation, Output operations, Persistence.
Resilient Distributed Datasets (RDDs) - Narrow vs. Wide dependencies, Types of RDDs (HadoopRDD, MappedRDD, FilteredRDD, CassandraRDD, SchemaRDD, etc), Preserves partitioning parameter, Broadcast , Accumulators, RDD operations - Transformations in RDD, Actions in RDD, Loading data in RDD, Key-value pair.
Spark SQL - Combining SQL, Machine learning, and streaming for Unified pipelines; Data transformation techniques, Loading of data, Hive queries through Spark, Spark applications, SQL library, Support for JSON and parquet file formats.
Who Should Attend?
After completing this course and successfully passing the certification examination, the student will be awarded the “Certificate in Spark” certification.
If a learner chooses not to take up the examination, they will still get a 'Participation Certificate'