Loading Events

« All Events

211 – Big Data Essentials

December 12 - December 14

$1250

“Big Data Essentials” is a 3-days introductory course covering the paradigms for Big Data Storage, Big Data Computation Models & Distributed Execution Engines, Structured Big Data, and Streaming Analytics, utilizing open source Big Data technologies: from the core components YARN and HDFS to the ecosystem tools for collecting, analyzing and learning from Big Data. The attendees will have a solid understanding of the motivation and the use cases for Big Data, and the components that make a typical Big Data Cluster, including:

– HDFS
– YARN
– MapReduce
– Apache Spark (Spark Core, Spark SQL, Spark Streaming) – Apache Hive – Apache Flume
– Apache Sqoop

With hands-on labs and demonstrations, the participants will get familiar with the ecosystem tools, by actually implementing solutions using their laptops, and running them on pre-installed clusters provided by Analytics Center.

Hands-on Labs:

Guided by the instructor, the attendees will implement hands-on labs and observe demonstrations, in order to:

– Understand the HDFS block storage
– Understand how MapReduce and Spark works
– Collect log streams into HDFS using Apache Flume
– Import data into HDFS from relational databases using Apache Sqoop – Understand Spark DataFrames – Create and populate Hive tables
– Run SQL queries on Hive tables & DataFrames
– Run Stream Computations w/ Spark Streaming

Course Prerequisites:

Big Data Essentials is the initial, common course for all of our learning tracks (Big Data Developer,
Big Data Analyst, Data Scientist), and its content is a prerequisite for further courses
(Developing Big Data Applications, Architecting Big Data Solutions, Big Data Analytics, Big Data Science). The course is tailored for both software developers and data analysis professionals.

For taking this essential course, a basic knowledge of SQL is the only prerequisite; and an ability to read Java and Scala code, as well as basic Linux usage knowledge is preferable.

Course Coverage:

Big Data Revolution & Ecosystem

I. Big Data Origins
II. Characteristics of Big Data
III. Open Source Big Data Analysis Stack IV. Typical Big Data Clusters
V. Big Data Applications

Programming Big Data

I. Distributed Execution Patterns II. Parallel Execution
III. Aggregations
IV. Working with Records of Pairs V. Partitioning (Shuffling)

Ingesting Big Data

I. Collecting Continuous Log Streams (Apache Flume) II. Collecting from External Databases (Apache Sqoop)

Streaming Big Data

I. Spark Streaming: Distributed (and discretized) Stream Representation

II. Stream Computations
III. SQL on Streams

Big Data Infrastructures

I. Hadoop Distributed File System (HDFS)

II. Cluster Resources Management (YARN)

Distributed Execution Engines

I. Apache Hadoop MapReduce

II. Apache Spark

Structured Big Data

I. Big Data Warehousing: Apache Hive
II. Apache Spark DataFrames & SQL on DataFrames

 

Register

Tribe Loading Animation Image

Request Info

Download Brochure

Details

Start:
December 12
End:
December 14
Cost:
$1250
Event Categories:
,

Venue

Istanbul Venue
Adres
Istanbul, 34345 Turkey
+ Google Map
Phone:
+90 212 217 63 88

Organizer

Analytics Center
Phone:
+90 212 217 63 88

Other

Duration
12 Hours