Big Data Engineering

Categories: Data Science
Wishlist Share
Share Course
Page Link
Share On Social Media

About Course

This is the last module in the Data Science Track.

  • This course will teach you the core concepts, processes, and tools of data engineering.
  • You will learn about the modern data ecosystem and the roles of data engineers, data scientists, and data analysts.
  • The data engineering ecosystem includes data pipelines, data repositories, and data integration platforms.
  • You will learn about each of these components and about Big Data and Big Data processing tools.

Here is a breakdown of what you will cover in this course:

  1. Week 1:  Big Data Introduction
  2. Week 2 : Hadoop, HDFS and Map Reduce Fundamentals
  3. Week 3 : Apache Spark and PySpark
  4. Week 4 : Hive and Kafka
  5. Week 5 : Capstone and Conclusion

Acknowledgements and Attribution

This course is attributed to 1) IBM’s Introduction to Data Engineering taught by Rav Ahuja 2) Spark and PySpark Udemy Course by Jose’ Portilla. We have added videos to the course to help make harder concepts simpler to understand. Finally, you have notes by Chris Aloo and Zindua technical team shared on Slack or on the resources

Show More

Course Content

1.0 Introduction to Big Data
Here you will learn the following concepts : - Data engineer role, technologies, and responsibility - Evolution of Big Data, Examples, Characteristics, Challenges - Big Data Characteristics, Sources, OLTP and OLAP, Operational vs Analytical Big Data - Scaling - Types of Databases: RDBMS, Data Lakes, Data Warehouse

  • Foundations of Big Data
    05:22
  • Roles in Data Engineering
    05:36
  • Skills in Data Engineering
    08:20
  • The Modern Data Ecosystem
    04:51

1.1 Storing Big Data – Data Formats

1.2 Databases

1.3 Big Data Characteristics

1.4 Week 1 ETL Project

2.0 Moving Big Data – Data Pipelines

2.1 Data Streaming – Apache Kafka

2.2 Workflow Orchestration – Apache Airflow

2.3 Week 2 Project

3.0 Processing Big Data 1 – Introduction to the Hadoop Ecosystem

3.1 HDFS architecture and Features

3.2 MapReduce

3.3 Week 3 Project

4.0 Processing Big Data 2 – Fundamentals of Spark and PySpark

4.1 Entry Points, RDDs and DataFrames

4.2 SparkSQL

4.3 Pyspark -Data Transformations

4.4 Optimising Spark

4.5 Test Your Understanding

Student Ratings & Reviews

No Review Yet
No Review Yet