Detailed Course Outline
Module 1 - Introduction to Building Batch Data Pipelines
Topics:
- EL, ELT, ETL
 - Quality considerations
 - How to conduct operations in BigQuery
 - Shortcomings
 - ETL to solve data quality issues
 
Objectives:
- Review different methods of loading data into your data lakes and warehouses: EL, ELT and ETL
 
Module 2 - Executing Spark on Dataproc
Topics:
- The Hadoop ecosystem
 - Run Hadoop on Dataproc
 - Cloud Storage instead of HDFS
 - Optimizing Dataproc
 
Objectives:
- Review the Hadoop ecosystem.
 - Discuss how to lift and shift your existing Hadoop workloads to the cloud using Dataproc.
 - Explain when to use Cloud Storage instead of HDFS storage.
 - Explain how to optimize your Dataproc jobs.
 
Module 3 - Serverless Data Processing with Dataflow
Topics:
- Introduction to Dataflow
 - Why customers value Dataflow
 - Dataflow pipelines
 - Aggregate with GroupByKey and Combine
 - Side inputs and windows
 - Dataflow templates
 
Objectives:
- Identify the features that customers value in Dataflow.
 - Discuss core concepts in Dataflow.
 - Review the use of Dataflow templates and SQL.
 - Write a simple Dataflow pipeline and run it both locally and on the cloud.
 - Identify map and reduce operations, execute the pipeline, and use command line parameters.
 - Read data from BigQuery into Dataflow and use the output of a pipeline as a sideinput to another pipeline
 
Module 4 - Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
Topics:
- Building batch data pipelines visually with Cloud Data Fusion
- Components
 - UI overview
 - Building a pipeline
 - Exploring data using Wrangler
 
 - Orchestrating work between Google Cloud services with Cloud Composer
- Apache Airflow environment
 - DAGs and operators
 - Workflow scheduling
 - Monitoring and logging
 
 
Objectives:
- Discuss how to manage your data pipelines with Data Fusion and Cloud Composer.
 - Summarize how Cloud Data Fusion allows data analysts and ETL developers to wrangle data and build pipelines in a visual way.
 - Describe how Cloud Composer can help to orchestrate the work across multiple Google Cloud services.