About Course
This is the last module in the Data Science Track.
- This course will teach you the core concepts, processes, and tools of data engineering.
- You will learn about the modern data ecosystem and the roles of data engineers, data scientists, and data analysts.
- The data engineering ecosystem includes data pipelines, data repositories, and data integration platforms.
- You will learn about each of these components and about Big Data and Big Data processing tools.
Here is a breakdown of what you will cover in this course:
- Week 1: Big Data Introduction
- Week 2 : Hadoop, HDFS and Map Reduce Fundamentals
- Week 3 : Apache Spark and PySpark
- Week 4 : Hive and Kafka
- Week 5 : Capstone and Conclusion
Acknowledgements and Attribution
This course is attributed to 1) IBM’s Introduction to Data Engineering taught by Rav Ahuja 2) Spark and PySpark Udemy Course by Jose’ Portilla. We have added videos to the course to help make harder concepts simpler to understand. Finally, you have notes by Chris Aloo and Zindua technical team shared on Slack or on the resources
Course Content
1.0 Introduction to Big Data
-
Foundations of Big Data
05:22 -
Roles in Data Engineering
05:36 -
Skills in Data Engineering
08:20 -
The Modern Data Ecosystem
04:51
1.1 Storing Big Data – Data Formats
1.2 Databases
1.3 Big Data Characteristics
1.4 Week 1 ETL Project
2.0 Moving Big Data – Data Pipelines
2.1 Data Streaming – Apache Kafka
2.2 Workflow Orchestration – Apache Airflow
2.3 Week 2 Project
3.0 Processing Big Data 1 – Introduction to the Hadoop Ecosystem
3.1 HDFS architecture and Features
3.2 MapReduce
3.3 Week 3 Project
4.0 Processing Big Data 2 – Fundamentals of Spark and PySpark
4.1 Entry Points, RDDs and DataFrames
4.2 SparkSQL
4.3 Pyspark -Data Transformations
4.4 Optimising Spark
4.5 Test Your Understanding
Student Ratings & Reviews
No Review Yet