Loading Events

« All Events

  • This event has passed.

Machine Learning on Big Data

March 28 - March 29

$690

From data munging to evaluating models, “Machine Learning on Big Data” is a 2-days course covering the entire Data Science pipeline: converting collected Big Data into mathematical data structures, algorithms for learning distributed regression, classification and recommender system models, implementing such models using Apache Mahout and Apache Spark, assessing how the models work. In this course several data mining/machine learning models and scalable learning algorithms are covered, including:

  • Generalized Linear Models: Linear Regression and Logistic Regression
  • Decision Trees
  • Clustering
  • Mixed Membership Models: (Latent Dirichlet Allocation)
  • Similarity Analysis
  • Matrix Factorization
  • Learning Ensembles: Random Forests

The participants will get familiar with the cutting-edge open source Machine Learning libraries, and run Machine Learning pipelines on pre-installed Hadoop/Spark clusters provided by Analytics Center.

Hands-on Labs

With hands-on labs and demonstrations, the participants will utilize ecosystem tools to:

  • Prepare a prototyping (interactive notebook) environment for large scale data analysis
  • Transform Big Data into Machine Learning data structures
  • Understand scalable Machine Learning algorithms
  • Write programs to learn and evaluate supervised learning models
  • Write programs for clustering data
  • Write programs for finding mixed memberships to clusters (topic modeling)
  • Write programs to learn and evaluate recommender system models
  • Design a typical analytics/machine learning pipeline

Course Prerequisites

“Machine Learning on Big Data” is the core course for the Data Scientist learning track. In addition to an appreciation of what Machine Learning is capable of, the attendees are expected to have an understanding of how Big Data Processing technologies work in general.

The attendees should be able to write simple programs either in Scala or Python, but the amount of programming is minimal.

Course Coverage

PART I – ESSENTIALS & ECOSYSTEM

Machine Learning Essentials & Big Learning

  1. Learning from Data
  2. Common Machine Learning Tasks
  3. Example Use Cases
  4. Machine Learning in the Big Data Era
  5. Machine Learning on Big Data: Challenges
  6. Machine Learning Pipelines:
    1. Data Preparation/Transformation
    2. Learning
    3. Evaluation
    4. Deployment

 

Big Data Science Ecosystem

  1. Apache Spark
    1. Apache Spark Basics
    2. RDD APIs: Scala API and PySpark
    3. Spark DataFrames (and Datasets)
    4. Spark ML Pipelines
  2. Apache Zeppelin for Interactive Data Analysis Notebooks

 

PART II – ALGORITHMS & INTERNALS

Data Munging

  1. Summarizing Large Datasets
  2. Common Data Transformation Tasks
  3. Data Structures for Machine Learning
  4. Working with Text Data

 

Supervised Learning

  1. Learning & Evaluation using Spark APIs:
    1. Linear Regression
    2. Logistic Regression
    3. Naive Bayes
    4. Decision & Regression Trees
    5. Tree Ensembles: Random Forests
  2. Evaluation
  3. Making Predictions

 

Unsupervised Learning

  1. Clustering:
    1. K-Means
    2. Gaussian Mixture
  2. Mixed Membership: Latent Dirichlet Allocation
  3. Online Clustering from Data Streams: Streaming K-Means

 

Recommender Systems

  1. Similarity Based Collaborative Filtering
  2. Matrix Factorization Based Collaborative Filtering
  3. Evaluating Recommender Systems

 

Large Scale Machine Learning Internals

  1. Distributed Optimization for Large Scale Supervised Learning
  2. K-Means/Streaming K-Means at Large Scale
  3. Variational EM for Learning & Inference in LDA
  4. Alternating Least Squares (ALS) and Implicit ALS for Matrix Factorization based Collaborative Filtering

Request Info

Download Brochure

Details

Start:
March 28
End:
March 29
Cost:
$690
Event Category:

Venue

Istanbul Venue
Adres
Istanbul, 34345 Turkey
+ Google Map
Phone:
+90 212 217 63 88

Organizer

Analytics Center
Phone:
+90 212 217 63 88