Loading Events

« All Events

  • This event has passed.

Big Data Essentials

November 29, 2019 - December 1, 2019

“Big Data Essentials” is a 3-days introductory course covering the paradigms for Big Data Storage, Big Data Computation Models & Distributed Execution Engines, Structured Big Data, and Streaming Analytics, utilizing open source Big Data technologies: from the core components YARN and HDFS to the ecosystem tools for collecting, analyzing and learning from Big Data. The attendees will have a solid understanding of the motivation and the use cases for Big Data, and the components that make a typical Big Data Cluster, including:- HDFS
– MapReduce
– Apache Spark (Spark Core, Spark SQL, Spark Streaming) – Apache Hive – Apache Flume
– Apache Sqoop

With hands-on labs and demonstrations, the participants will get familiar with the ecosystem tools, by actually implementing solutions using their laptops, and running them on pre-installed clusters provided by Analytics Center.


Hands-on Labs:


Guided by the instructor, the attendees will implement hands-on labs and observe demonstrations, in order to:

– Understand the HDFS block storage
– Understand how MapReduce and Spark works
– Collect log streams into HDFS using Apache Flume
– Import data into HDFS from relational databases using Apache Sqoop – Understand Spark DataFrames – Create and populate Hive tables
– Run SQL queries on Hive tables & DataFrames
– Run Stream Computations w/ Spark Streaming


Course Prerequisites:


Big Data Essentials is the initial, common course for all of our learning tracks (Big Data Developer,
Big Data Analyst, Data Scientist), and its content is a prerequisite for further courses
(Developing Big Data Applications, Architecting Big Data Solutions, Big Data Analytics, Big Data Science). The course is tailored for both software developers and data analysis professionals.

For taking this essential course, a basic knowledge of SQL is the only prerequisite; and an ability to read Java and Scala code, as well as basic Linux usage knowledge is preferable.


Course Coverage:


Big Data Revolution & Ecosystem

I. Big Data Origins
II. Characteristics of Big Data
III. Open Source Big Data Analysis Stack IV. Typical Big Data Clusters
V. Big Data Applications


Programming Big Data

I. Distributed Execution Patterns II. Parallel Execution
III. Aggregations
IV. Working with Records of Pairs V. Partitioning (Shuffling)


Ingesting Big Data

I. Collecting Continuous Log Streams (Apache Flume) II. Collecting from External Databases (Apache Sqoop)


Streaming Big Data

I. Spark Streaming: Distributed (and discretized) Stream Representation

II. Stream Computations
III. SQL on Streams


Big Data Infrastructures

I. Hadoop Distributed File System (HDFS)

II. Cluster Resources Management (YARN)


Distributed Execution Engines

I. Apache Hadoop MapReduce

II. Apache Spark


Structured Big Data

I. Big Data Warehousing: Apache Hive
II. Apache Spark DataFrames & SQL on DataFrames



November 29, 2019
December 1, 2019
Event Categories:


Istanbul Venue
Istanbul, 34345 Turkey
+90 212 217 63 88


Analytics Center
+90 212 217 63 88