Modernizing Data Lakes and Data Warehouses with Google Cloud (MDLDW) – Outline

Detailed Course Outline

Module 1 - Introduction to Data Engineering

Topics:

  • The role of a data engineer
  • Data engineering challenges
  • Introduction to BigQuery
  • Data lakes and data warehouses
  • Transactional databases versus data warehouses
  • Partnering effectively with other data teams
  • Managing data access and governance
  • Build production-ready pipelines
  • Google Cloud customer case study

Objectives:

  • Discuss the role of a data engineer.
  • Discuss benefits of doing data engineering in the cloud.
  • Discuss challenges of data engineering practice and how building data pipelines in the cloud helps to address these.
  • Review and understand the purpose of a data lake versus a data warehouse, and when to use which.

Module 2 - Building a Data Lake

Topics:

  • Introduction to data lakes
  • Data storage and ETL options on Google Cloud
  • Building a data lake by using Cloud Storage
  • Securing Cloud Storage
  • Storing all sorts of data types
  • Cloud SQL as your OLTP system

Objectives:

  • Discuss why Cloud Storage is a great option to build a data lake on Google Cloud.
  • Explain how to use Cloud SQL for a relational data lake.

Module 3 - Building a Data Warehouse

Topics:

  • The modern data warehouse
  • Introduction to BigQuery
  • Getting started with BigQuery
  • Loading data into BigQuery
  • Exploring schemas
  • Schema design
  • Nested and repeated fields
  • Optimizing with partitioning and clustering

Objectives:

  • Discuss the requirements of a modern warehouse.
  • Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
  • Discuss the core concepts of BigQuery and review options of loading data into BigQuery.