OpenHack – DevOps for Data Science (OHDTSC) – Outline

Detailed Course Outline

Challenge 1: Local Model Build

  • Understand how the model build process works on a local machine/notebook for training and evaluation

Challenge 2: Portable execution

  • Refactor experimentation notebook to run as an experiment in ML service
  • Add metrics/parameter logging

Challenge 3: Scripted ML Pipeline Creation

  • Extend notebook from challenge 2 to use Azure ML API to set up ML pipeline that trains and registers the model
  • Retrieve the registered model, deploy it as an inferencing service to ACI, and test the deployed service

Challenge 4: Automated ML Pipeline Creation

  • DevOps pipeline integration incorporating MLOpsPython
  • Train, evaluate, and register a model via Azure DevOps and Azure ML Service Pipelines
  • Deploy and serve a model automatically after model is registered in model registry service
  • Implement basic acceptance testing against API endpoint in the pipeline after model deployment

Challenge 5: Observability: ML Training

  • Introduce a change in the model which breaks the model during training in the pipeline. Be able to understand why it broke utilizing centralized instrumented logging.
  • Have a dashboard which reports on one of CPU/Memory/GPU utilization and fires an alert when below or above a set threshold. This is to ensure there is optimal use of the hardware for cost and performance.

Challenge 6: Observability: ML Inference / Serving

  • Implement a custom metric in the model code which outputs the score, have a dashboard to see the results of this metric over time
  • Have a dashboard which reports on one of CPU/Memory/GPU utilization and fires an alert when below or above a set threshold. This is to ensure there is optimal use of the hardware for cost and performance.

Challenge 7: Data Ops for ML: Data Ingestion & Pre-processing

  • Understand the pros and cons of various Data Ingestion options available with Azure services.
  • Implement a solution when new data drops (i.e. file to a blob store) it automatically triggers a data ingestion pipeline to run a Python notebook to process the data, dump the file to a different folder on the blob store and invokes model training pipeline.

Challenge 8: Data Ops for ML: CI/CD Pipelines as Code

  • Store the source code of the data ingestion solution in SCM repository.
  • Implement branching policy for the data pipeline and the notebooks.
  • Implement CI/CD pipelines deploying the data ingestion pipeline and the notebooks to the target environment. Have the solution parametrized so it can be deployed to multiple environments.
  • Have CI/CD pipelines stored in SCM repository

Challenge 9: Advanced: Canary deployment

  • Setup a canary deployment pipeline of the model being served to production users
  • Analyze results of score on dashboard for the canary versus production