Skip to content

techiescamp/mlops-for-devops

Repository files navigation

MLOps for DevOps Engineers

A hands-on, project-based guide to Machine Learning Operations built specifically for DevOps, Platform, and SRE engineers.

No ML background required. Every concept is explained through DevOps analogies you already understand.

If you are completely new to MLOps, read our DevOps to MLOps guide first.


Table of Contents


Who This Is For

Most MLOps resources are written for data scientists learning infrastructure. This repo flips that.

You do not need to become a data scientist. But just like understanding how a Java application is built makes you a better DevOps engineer, understanding how an ML model is built, trained, and served makes you effective at operating ML workloads in production.


What We Build

Track What You Learn
πŸ€– Traditional ML Train, serve, automate, and monitor a real ML model on Kubernetes
🧠 Foundational Models Serve LLMs in production using vLLM, TGI, and Ollama
βš™οΈ LLM-Powered DevOps Monitor K8s clusters, build RAG pipelines and agents with LLMs

Everything runs on Kubernetes, Docker, and tools you already use.


Prerequisites

Skill Level
Linux CLI Intermediate
Docker Intermediate
Kubernetes Intermediate
AWS Basic to Intermediate
Python Basic- read and run scripts
Git Intermediate

No ML experience needed. That is what this repo teaches.


Learning Path

Phase Track Title Status
1 πŸ€– Traditional ML Local Dev & Pipelines βœ… Done
1 πŸ€– Traditional ML K8s Deploy & Model Serving βœ… Done
3 πŸ€– Traditional ML Enterprise Orchestration πŸ”„ In Progress
4 πŸ€– Traditional ML Monitor & Observe πŸ”œ Planned
5 🧠 Foundational Models Foundational Models πŸ”œ Planned
6 🧠 Foundational Models LLM Serving & Scaling πŸ”œ Planned
7 βš™οΈ LLM-Powered DevOps LLM-Powered DevOps πŸ”œ Planned
8 βš™οΈ LLM-Powered DevOps Emerging AI Ops πŸ”œ Planned

Phase 1: Local Development & Data Pipelines

Goal: Build the full ML foundation on your local machine β€” from raw data to a trained, tested model.

Use case throughout: Employee attrition prediction for a large organisation (~500,000 employees). One problem, end to end. Keeps the focus on infrastructure and operations, not data science theory.

Step Title Guide
1 Project Dataset Pipeline Read the Guide
2 Data Preparation Stages Read the Guide
3 Training & Building the Prediction Model Read the Guide
4 From Model to Live API with KServe Read the Guide

Code: phase-1-local-dev/

Phase 2: Enterprise Orchestration for ML

Goal: Replace local, manual ML workflows with production-grade orchestration. Versioned data, automated pipelines, experiment tracking, and scalable training.

Step Title What it Covers Guide
1 Data Versioning Fundamentals Understanding Data Drift, Model Decay, and Dataset Versioning Read the Guide
2 Hands-On Data Version Control with AWS S3 Working with DVC and AWS s3 to Version the Dataset required for ML Read the Guide
3 Data Versioning using Airflow on Kubernetes. ETL pipeline that produces fresh employee_attrition.csv dataset and versions in on s3 using DVC πŸ”œ Coming This Saturday

Tech Stack

Category Tools
Data Pipeline Python, Pandas
Model Training scikit-learn, XGBoost
API / Serving FastAPI, Flask, Docker, KServe
Orchestration Airflow, Kubeflow, MLflow Pipelines
Monitoring Prometheus, Grafana, Evidently AI
Infrastructure Kubernetes, Helm, GitHub Actions
LLM Serving vLLM, TGI, Ollama

Recommended Reading

Certifications


Tools

  • Ray: Open-source distributed computing framework For Python & AI Workloads
  • rtk: High-performance CLI proxy that reduces LLM token consumption.

License

Dual licensed:

  • Code (scripts, configs, manifests) β€” Apache 2.0
  • Content (README, guides, docs) β€” All Rights Reserved

For commercial licensing: contact@devopscube.com

About

MLOps for DevOps Engineers - A hands-on, project-based guide to Machine Learning Operations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors