Loading Events

« All Events

  • This event has passed.

211 – Big Data Essentials

August 8 - August 10


“Big Data Essentials” is a 3-days introductory course covering the paradigms for storing and processing Big Data, as well as the breadth of the open source Big Data technologies: from the core components YARN and HDFS to the ecosystem tools for collecting, analyzing and learning from Big Data. The attendees will have a solid understanding of the motivation and the use cases for Big Data, and the components that make a typical Hadoop v2 Cluster, including:



– MapReduce

– Apache Spark

– Apache Flume

– Apache Sqoop

– Apache Pig

– Apache Hive

With hands-on labs and demonstrations, the participants will get familiar with the ecosystem tools, by actually implementing solutions using their laptops, and running them on pre-installed Hadoop clusters provided by Analytics Center.

Hands-on Labs

Guided by the instructor, the attendees will implement hands-on labs and observe demonstrations, in order to:

– Understand the HDFS block storage

– Understand how MapReduce and Spark works

– Collect log streams into HDFS using Apache Flume

– Import data into HDFS from relational databases using Apache Sqoop

– Write Pig Scripts to process Big Data

– Create and populate Hive tables

– Run SQL queries on Hive tables

Course Prerequisites

Big Data Essentials is the initial, common course for all of our learning tracks (Big Data Developer, Big Data Analyst, Data Scientist), and its content is a prerequisite for further courses (Developing Big Data Applications, Architecting Big Data Solutions, Big Data Analytics, Big Data Science). The course is tailored for both software developers and data analysis professionals.

For taking this essential course, a basic knowledge of SQL is the only prerequisite; and an ability to read Java and Scala code, as well as basic Linux usage knowledge is preferable.

Course Coverage

Big Data Analysis Stack

Big Data Ecosystem
Big Data Use Cases
Open Source Big Data Analysis Stack

Big Data Infrastructures

Hadoop Distributed File System (HDFS)
Cluster Resources Management (YARN)

Programming Big Data

Distributed Execution Patterns
Parallel Execution
Working with Records of Pairs
Partitioning (Shuffling)

Distributed Execution Engines

Apache Hadoop MapReduce
Apache Spark

Ingesting Big Data

Collecting Continuous Log Streams (Apache Flume)
Collecting from External Databases (Apache Sqoop)

Big Data Analytics

High Level Big Data Programming Abstractions (Apache Pig)
Big Data Warehouse (Apache Hive)

*Please note that our trainings are in Turkish.


Tribe Loading Animation Image

Request Info

Download Brochure


August 8
August 10
Event Categories:


Istanbul Venue
Istanbul, 34345 Turkey
+ Google Map


Analytics Center


12 Hours