Loading Events

« All Events

  • This event has passed.

331 – Machine Learning on Big Data

November 14, 2018 - November 16, 2018


From data munging to evaluating models, “Machine Learning on Big Data” is a 3-days course covering the entire Data Science pipeline: converting collected Big Data into mathematical data structures, algorithms for learning distributed regression, classification and recommender system models, implementing such models using Apache Mahout and Apache Spark, assessing how the models work. In this course several data mining/machine learning models and scalable learning algorithms are covered, including:

  • Generalized Linear Models: Linear Regression and Logistic Regression
  • Decision Trees
  • Clustering
  • Mixed Membership Models: (Latent Dirichlet Allocation)
  • Similarity Analysis
  • Matrix Factorization
  • Learning Ensembles: Random Forests

The participants will get familiar with the cutting-edge open source Machine Learning libraries, and run Machine Learning pipelines on pre-installed Hadoop/Spark clusters provided by Analytics Center.

Hands-on Labs

With hands-on labs and demonstrations, the participants will utilize ecosystem tools to:

  • Prepare a prototyping (interactive notebook) environment for large scale data analysis
  • Transform Big Data into Machine Learning data structures
  • Understand scalable Machine Learning algorithms
  • Write programs to learn and evaluate supervised learning models
  • Write programs for clustering data
  • Write programs for finding mixed memberships to clusters (topic modeling)
  • Write programs to learn and evaluate recommender system models
  • Design a typical analytics/machine learning pipeline

Course Prerequisites

“Machine Learning on Big Data” is the core course for the Data Scientist learning track. In addition to an appreciation of what Machine Learning is capable of, the attendees are expected to have an understanding of how Big Data Processing technologies work in general.

The attendees should be able to write simple programs either in Scala or Python, but the amount of programming is minimal.

Course Coverage


Machine Learning Essentials & Big Learning

  1. Learning from Data
  2. Common Machine Learning Tasks
  • Example Use Cases
  1. Machine Learning in the Big Data Era
  2. Machine Learning on Big Data: Challenges
  3. Machine Learning Pipelines:
    1. Data Preparation/Transformation
    2. Learning
    3. Evaluation
    4. Deployment

Recap of Big Data Essentials

  1. HDFS & YARN
  2. Big Data Processing Patterns
  • Big Data Processing Engines


Big Data Science Ecosystem

  1. Apache Spark
    1. Apache Spark Basics
    2. RDD APIs: Scala API and PySpark
    3. Spark DataFrames (and Datasets)
    4. Spark Streaming
    5. Spark ML Pipelines
  2. Apache Mahout
    1. Mahout Distributed Matrices
    2. Mahout Matrix DSL
  • Apache Zeppelin for Interactive Data Analysis Notebooks

Request Info

Download Brochure


November 14, 2018
November 16, 2018
Event Category:


Istanbul Venue
Istanbul, 34345 Turkey
+ Google Map
+90 212 217 63 88


Analytics Center
+90 212 217 63 88


24 Hours