From data munging to evaluating models, “Machine Learning on Big Data” is a 3-days course covering the entire Data Science pipeline: converting collected Big Data into mathematical data structures, algorithms for learning distributed regression, classification and recommender system models, implementing such models using Apache Mahout and Apache Spark, assessing how the models work. In this course several data mining/machine learning models and scalable learning algorithms are covered, including:
The participants will get familiar with the cutting-edge open source Machine Learning libraries, and run Machine Learning pipelines on pre-installed Hadoop/Spark clusters provided by Analytics Center.
With hands-on labs and demonstrations, the participants will utilize ecosystem tools to:
“Machine Learning on Big Data” is the core course for the Data Scientist learning track. In addition to an appreciation of what Machine Learning is capable of, the attendees are expected to have an understanding of how Big Data Processing technologies work in general.
The attendees should be able to write simple programs either in Scala or Python, but the amount of programming is minimal.
PART I – ESSENTIALS & ECOSYSTEM
Machine Learning Essentials & Big Learning
Recap of Big Data Essentials
Big Data Science Ecosystem