CDOSS Certificate
Big Data Machine Learning with Apache Spark
Profiles that can prepare this certification contents:
Data scientist, Data engineer, Full stack developer, Web developer, Business Intelligence Consultant, Big Data Consultant and more with acceptable algorithmic capability
Global knowledge to be acquired to pass this certification:
+ Understand Apache Spark architecture and data management
+ Using basic Apache Spark functionality with python:
– Extract Transform Load (ETL) with pyspark
– Spark SQL
– Scalable Data Science
– Machine learning (basic notions) with Mllib et ML
Detailed plan of preparation:
+ Hadoop Architecture and MapReduce
+ Apache Spark scalability
+ Apache Spark architecture
+ Resilient Distributed Dataset (RDD) and Dataframe
+ Spark SQL
+ Extract Transform Load with Spark
+ Basic notions of machine learning (supervised learning (example: decision tree) and unsupervised learning (example: K-means)
- Machine Learning with RDD (MLlib with pyspark)
- Machine Learning with Dataframe (ML with pyspark)