Loading Events

« All Events

221 – Developing Big Data Applications

“Developing Big Data Applications” is a 4-days, comprehensive course covering the concepts, tools and technologies for designing, developing, testing, and deploying computer programs in order to process Big Data, using MapReduce and Spark.

Course Coverage

Developing MapReduce Applications

  • Setting Up a Development Environment
  • Creating and Submitting MapReduce Jobs
  • Unit Testing MapReduce Programs
  • Deploying a MapReduce Job
  • Using 3rd party Libraries

Designing MapReduce Algorithms

  • Parallel Execution
  • Distributed Aggregations
  • Common Relational Algebra Operations
  • Distributed Cache
  • MapReduce Counters

MapReduce I/O

  • Input Formats
  • Output Formats
  • Serialization and Deserialization

MapReduce Design Patterns

  • Customizing Partitioning
  • Customizing Sorting Behavior
  • Customizing Grouping Behavior
  • Total and Secondary Sort
  • Join
  • Text Processing

Introduction to Apache Spark

  • Resilient Distributed Datasets
  • Loading RDDs
  • Common RDD Operations
  • Data Analysis using Spark Shell

Developing Spark Applications – I

  • Spark Applications Lifecycle
  • Setting Up a Development Environment
  • Creating and Submitting Spark Jobs
  • Unit Testing Spark Programs
  • Deploying a Spark Job

Spark API

  • RDD Operations
  • Pair RDDs

Developing Spark Applications – II

  • Serialization
  • Passing Functions to Spark Operations
  • Using 3rd party Libraries

Designing Spark Applications

  • Caching
  • Custom Partitioning
  • Broadcast Variables
  • Accumulators
  • Strategies for Writing Effective Spark Programs


Request Info

Download Brochure


Event Category:


3 Days