Loading Events

« All Events

231 – Big Data Analytics

This course covers the underlying concepts, tools, and technologies for writing scripts and performing queries for analytical purposes on data residing in Hadoop clusters. The attendees will capture knowledge about the concepts and techniques to setup a data warehouse from a Hadoop dataset, working with different storage formats, querying Big Data, and building an analytics pipeline.

Course Coverage

Big Data Analytics

  • Typical Analytics Applications on Big Data
  • Common Data Sources

Ingesting Big Data

  • Collecting Continuous Log Streams (Apache Flume)
  • Collecting from External Databases (Apache Sqoop)

Programming Big Data

  • MapReduce Paradigm
  • Input/Output Formats
  • MapReduce Algorithms for Relational Operations

Apache Pig

  • Data Analysis with Pig
  • Pig Latin
  • Loading/Storing Data
  • Pig Data Types
  • Relational Operations
  • Built-in Functions

Apache Pig – II

  • Configuring Pig
  • Controlling Parallelism
  • User Defined Function
  • Join Strategies
  • Optimizations

Apache Hive Overview

  • Hive Architecture
  • Hive Tables
  • HiveQL
  • Hive Data Types and Operators
  • Built-in Functions

Hive Tables

  • Creating Hive Tables
  • Partitioned Tables
  • Bucketed Sorted Tables
  • Skewed Tables
  • Storage and Row Formats
  • Populating Hive Tables and Partitions

Querying Hive Tables

  • Writing Hive Queries
  • Views
  • User Defined Functions
  • Join Optimizations
  • Window (Analytic) Functions

Request Info

Download Brochure


Event Category:


3 Days