🚀 Hands-On MLOps: Deploying a Production-Ready Diabetes Prediction API on AWS
End-to-End Guide with FastAPI, Docker, Kubernetes, and Real-Time AWS Deployment Tips

🧩 1. Introduction: Bridging the Gap with MLOps
Machine Learning models, however sophisticated, are useless if they remain confined to a Jupyter Notebook. This is where MLOps steps in. MLOps is the critical bridge connecting the agility of software development (DevOps) with the iterative nature of machine learning, ensuring models are not just trained, but deployed, managed, and monitored reliably at scale.
Deploying ML models in a real-world, automated environment requires robust infrastructure, continuous integration/delivery (CI/CD), version control, and comprehensive monitoring.
This blog post is a deep dive into my hands-on project: building an end-to-end MLOps pipeline on AWS for a diabetes prediction system. I’ll walk you through how I leveraged powerful cloud and open-source tools to create a scalable, automated, and cost-efficient ML workflow.
Machine Learning models are powerful — but deploying and managing them efficiently in production is a whole different challenge.
That’s where MLOps (Machine Learning Operations) steps in. It combines Machine Learning + DevOps practices to automate model training, deployment, and monitoring.
This blog walks through how I connected tools like Docker, Kubernetes (EKS), MLflow, GitHub Actions, Prometheus, and Grafana to automate the ML lifecycle.
Highlights :
Full MLOps Pipeline – from ML model training → Docker container → local Kubernetes → AWS EKS → HTTPS production deployment.
Hands-on with Industry Tools – FastAPI, Docker, Kind, EKS, ALB, ACM, Route 53.
Security & Best Practices – HTTPS, IAM roles, least privilege, Kubernetes secrets, health checks.
Portfolio-Ready – Shows your ability to deploy ML models at scale in a production-grade environment.
Beginner-Friendly – Step-by-step guide, all scripts, and manifests included.
☁️ 2. Project Overview: Scalability Meets Automation
The goal:
To design a fully automated MLOps pipeline that predicts whether a person is diabetic based on health parameters — and deploy it on AWS.
The goal of this project was clear: move beyond a simple static deployment and build a scalable, automated, and self-healing ML workflow capable of handling production traffic on AWS.
The core challenge was orchestrating a multi-stage pipeline—from model training and experiment tracking to containerization and deployment on a dynamic Kubernetes cluster—all managed by an automated CI/CD system.
Challenges:
Orchestrating multi-stage pipeline: model training → containerization → deployment → CI/CD → monitoring
Handling real-time traffic, scalability, and cloud cost optimization
Technologies Used:
| Category | Tools & Services | Purpose |
| Cloud | AWS (EKS, EC2, S3, Lambda, ECR, IAM, CloudWatch) | Infrastructure, Container Registry, Compute, Cost Management |
| Orchestration | Kubernetes (EKS), Docker | Container Orchestration and Runtime |
| ML & Tracking | MLflow | Experiment Tracking, Model Registry |
| CI/CD | GitHub Actions | Automated Workflow Triggers |
| Monitoring | Prometheus, Grafana | Metrics Collection and Visualization |
| Application | Flask | Serving the Prediction Model via API |
🔧 Tools & Technologies
AWS Services: EC2, EKS, S3, ECR, Lambda, IAM, CloudWatch
DevOps Tools: Docker, Kubernetes, GitHub Actions
ML Tools: MLflow, Scikit-learn
Monitoring: Prometheus, Grafana
Web Framework: Flask
🚀 Phases Covered
Local Development – Clone repo, setup venv, train model, run API locally.
Containerization – Dockerfile creation, multi-stage build, run container locally.
Local Kubernetes – Kind cluster, deploy with deployment.yaml, service.yaml, ingress.yaml.
AWS Infrastructure – Configure CLI, create EKS cluster, VPC, IAM roles, ALB controller.
Production Deployment – Push Docker image to ECR, deploy on EKS, configure LoadBalancer.
Domain & HTTPS – ACM SSL certificate, Route 53 A record, validate HTTPS endpoint.
📚 What You'll Learn
By following this guide, you'll gain hands-on experience in:
| Category | Skills |
| Machine Learning | Model training, evaluation, scikit-learn, feature engineering |
| API Development | FastAPI, RESTful APIs, async programming, data validation |
| Containerization | Docker, multi-stage builds, image optimization, ECR |
| Kubernetes | Pods, Deployments, Services, Ingress, scaling, health checks |
| Cloud Infrastructure | AWS EKS, VPC, IAM, security groups, networking |
| DevOps | CI/CD concepts, GitOps, infrastructure as code |
| Networking | DNS (Route 53), load balancing (ALB), SSL/TLS (ACM) |
| Security | HTTPS, IAM roles, secrets management, least privilege |
⚙️ 3. Architecture Diagram (Text Description)
The end-to-end ML pipeline is structured as follows, embodying the principles of DevOps for machine learning:
Data Ingestion: Raw Pima Indian Diabetes dataset is fetched from a secure source (e.g., S3 or a local file) as the starting point.
Model Training & Tracking: The ML model (Logistic Regression/Random Forest) is trained. MLflow is used meticulously to track all experiments, parameters, and the final serialized model file.
Containerization: The Flask-based inference application, which loads the best MLflow-tracked model, is packaged into a Docker image.
Continuous Integration (CI): On every code push to the main branch, GitHub Actions automatically builds the Docker image and pushes the newly tagged image to AWS ECR (Elastic Container Registry).
Continuous Delivery (CD): GitHub Actions then triggers the deployment by applying updated Kubernetes manifests (YAML files) to the AWS EKS (Elastic Kubernetes Service) cluster.
Model Deployment: EKS pulls the new image from ECR, updates the Deployment, and serves the prediction API via a Kubernetes Service (LoadBalancer).
Monitoring & Alerting: Prometheus scrapes metrics (like request latency, error rates, etc.) from the deployed application, and Grafana provides customizable dashboards for real-time visibility.
Cost Optimization (Auto Shutdown): To prevent running up high EKS costs during idle periods, an AWS Lambda function is scheduled to automatically scale down the EKS worker nodes to zero, adhering to cost optimization best practices.
⚙️ Architecture Workflow
Data Source (Kaggle / S3)
↓
Data Preprocessing & Training (MLflow)
↓
Containerization (Docker)
↓
Image Push to AWS ECR
↓
Deployment on AWS EKS Cluster
↓
CI/CD Automation via GitHub Actions
↓
Monitoring using Prometheus + Grafana
↓
Auto Shutdown via AWS Lambda
This architecture ensures:
Continuous integration and delivery
Centralized model tracking
Real-time monitoring
Cost optimization

🛠️ Tech Stack
| Category | Technology | Purpose |
| Language | Python 3.9+ | ML model and API development |
| ML Libraries | scikit-learn, pandas, numpy | Model training & data processing |
| API Framework | FastAPI, Uvicorn | REST API and async server |
| Containerization | Docker | Application packaging |
| Container Registry | AWS ECR | Private Docker image storage |
| Orchestration | Kubernetes (Kind, EKS) | Container orchestration |
| Package Manager | Helm 3 | Kubernetes app management |
| Cloud Provider | AWS | Infrastructure & services |
| Load Balancing | AWS ALB | Application load balancing |
| DNS | Route 53 | Domain management |
| SSL/TLS | ACM | Free SSL certificates |
🧩 Step-by-Step Implementation
Let’s understand how every script and configuration file works together to create a fully automated ML deployment system. Each component in this repository has a specific role — from data preprocessing to cloud deployment.
Let’s explore them one by one 👇
1️⃣ main.py — FastAPI API
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
import os
app = FastAPI(title="Diabetes Prediction API")
MODEL_PATH = "diabetes_model.pkl"
if not os.path.exists(MODEL_PATH):
raise FileNotFoundError("❌ Model file not found! Train it first using train.py")
model = joblib.load(MODEL_PATH)
class DiabetesInput(BaseModel):
Pregnancies: int
Glucose: float
BloodPressure: float
SkinThickness: float
Insulin: float
BMI: float
DiabetesPedigreeFunction: float
Age: int
@app.get("/")
def read_root():
return {"message": "Diabetes Prediction API is live 🎯"}
@app.get("/health")
def health_check():
return {"status": "healthy"}
@app.post("/predict")
def predict(data: DiabetesInput):
input_data = np.array([[
data.Pregnancies,
data.Glucose,
data.BloodPressure,
data.SkinThickness,
data.Insulin,
data.BMI,
data.DiabetesPedigreeFunction,
data.Age
]])
prediction = model.predict(input_data)[0]
return {"diabetic": bool(prediction)}
🧩 Explanation
This file is the core of your API service — it turns your trained ML model into a real-time, production-ready web API using FastAPI.
When you send patient data (like glucose level, insulin, BMI, etc.) through a POST request to /predict,
the model evaluates it and returns whether the patient is diabetic (True) or not diabetic (False).
2️⃣ train.py — Mode
Let’s now explore each important file of the MLOps Diabetes Prediction Project.
This section explains what each script does, how it contributes to the overall workflow, and why it’s important.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import joblib
# Load dataset (replace with your local or Kaggle path)
df = pd.read_csv("/app/data/diabetes.csv")
print("✅ Columns:", df.columns.tolist())
# Prepare data
X = df[[
"Pregnancies", "Glucose", "BloodPressure",
"SkinThickness", "Insulin", "BMI",
"DiabetesPedigreeFunction", "Age"
]]
y = df["Outcome"]
# Split the data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Save model
joblib.dump(model, "diabetes_model.pkl")
print("✅ Model saved as diabetes_model.pkl")
🧩 Explanation
This script teaches the AI model how to detect diabetes using the dataset.
Once trained, it exports a .pkl file that acts as the brain for your prediction API.
3️⃣ requirements.txt — Requirements
Lists all the Python packages required to run the project. Ensures consistent setup in local, Docker, or cloud environments.
fastapi
uvicorn[standard]
scikit-learn
pandas
joblib
🧩 Explanation
FastAPI: Web framework used to create the prediction API.
Uvicorn [standard]: ASGI server to run FastAPI applications.
Scikit-learn: Provides machine learning algorithms (Random Forest used here).
Pandas: Handles and processes tabular data from the diabetes dataset.
Joblib: Used for saving (
dump) and loading (load) the trained ML model efficiently.
4️⃣ download.sh— Dataset download script
Automates downloading the Pima Indians Diabetes dataset from Kaggle and places it in the project’s /app/data folder.
This ensures reproducibility for training the model across different environments, including Docker or cloud.
#!/bin/bash
set -e
echo "📥 Downloading Pima Indians Diabetes dataset from Kaggle..."
# Create a data folder if it doesn't exist
mkdir -p /app/data
cd /app/data
# Download from Kaggle
kaggle datasets download -d uciml/pima-indians-diabetes-database -p .
# Unzip and clean up
unzip -o pima-indians-diabetes-database.zip
rm pima-indians-diabetes-database.zip
echo "✅ Dataset downloaded to /app/data"
🧩 Explanation
This script allows one-command dataset setup, making the project ready for ML training on any machine or Docker container.
5️⃣ Dockerfile—
Containerizing the Application
Defines how to package your FastAPI diabetes prediction app and all dependencies into a Docker container.
This makes the project portable, consistent, and ready for deployment anywhere — locally, on Kubernetes, or in the cloud.
FROM python:3.10-slim
# Working directory
WORKDIR /app
# Copy files
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Optional: Run training before starting API (if needed)
# RUN python train.py
# Expose port
EXPOSE 8000
# Start FastAPI
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
🧩 Explanation
This Dockerfile allows you to package your ML API in a container that can run consistently across different environments, making deployment easy and scalable.
6️⃣ deployment.yaml—Kubernetes Deployment Configuration
Defines how your FastAPI diabetes prediction app runs inside a Kubernetes cluster.
It specifies the number of replicas, container image, ports, health checks, and resource limits to ensure a scalable and reliable deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: diabetes-api
labels:
app: diabetes-api
spec:
replicas: 2
selector:
matchLabels:
app: diabetes-api
template:
metadata:
labels:
app: diabetes-api
spec:
containers:
- name: diabetes-api
image: 323997748732.dkr.ecr.ap-south-1.amazonaws.com/mlops-project:latest
ports:
- containerPort: 8000
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
🧩 Explanation:
it ensures your API runs reliably, scales automatically, and integrates seamlessly with Kubernetes monitoring and load balancing.
7️⃣ service.yaml— Kubernetes Service Configuration
Exposes your FastAPI diabetes prediction app to the outside world.
It defines how the pods created by the deployment can be accessed via a LoadBalancer in AWS.
apiVersion: v1
kind: Service
metadata:
name: diabetes-api-service
labels:
app: diabetes-api
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
selector:
app: diabetes-api
ports:
- port: 80
targetPort: 8000
type: LoadBalancer
🧩 Explanation:
it allows your Kubernetes deployment to be reachable from the internet through a managed AWS NLB, while directing traffic safely to your FastAPI pods.
8️⃣ k.yaml— Combined Kubernetes Deployment & Service
This single YAML file combines the Deployment and Service definitions for your FastAPI diabetes prediction app.
It simplifies deployment by allowing Kubernetes to create pods and expose them through a LoadBalancer using a single command:
kubectl apply -f k.yaml
📜 Code:
apiVersion: apps/v1
kind: Deployment
metadata:
name: diabetes-api
labels:
app: diabetes-api
spec:
replicas: 2
selector:
matchLabels:
app: diabetes-api
template:
metadata:
labels:
app: diabetes-api
spec:
containers:
- name: diabetes-api
image: abhishek8056/mlops-project:latest
ports:
- containerPort: 8000
imagePullPolicy: Always
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: diabetes-api-service
spec:
selector:
app: diabetes-api
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: LoadBalancer
🧩 Explanation:
It provides a one-file deployment solution for your API, making it easier to deploy, update, and scale your application on Kubernetes.
9️⃣ ingress.yaml— Kubernetes Ingress for External Routing
Manages external access to your FastAPI diabetes prediction API.
Routes HTTP/HTTPS traffic from a friendly domain to your Kubernetes service.
This allows users to access the API via a custom domain with SSL encryption.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: diabetes-ingress
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
alb.ingress.kubernetes.io/certificate-arn: "arn:aws:acm:ap-south-1:323997748732:certificate/fe8d7e6e-7e47-46b8-bcf8-bc6a353b787a"
alb.ingress.kubernetes.io/ssl-redirect: '443'
spec:
ingressClassName: alb
rules:
- host: mlops.abhimishra-devops.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: diabetes-api-service
port:
number: 80
🧩 Explanation:
It makes your API securely accessible via a custom domain with HTTPS, handled by AWS ALB.
It is the entry point for all external users, providing SSL security, routing, and scalability.
Sample Test Cases :-
Test Case 1 — Healthy Individual
Input JSON:
{
"Pregnancies": 1,
"Glucose": 85,
"BloodPressure": 66,
"SkinThickness": 29,
"Insulin": 80,
"BMI": 26.6,
"DiabetesPedigreeFunction": 0.351,
"Age": 25
}
Expected Output:
{"diabetic": false}
✅ Explanation: Typical healthy profile, should be classified as non-diabetic.
Test Case 2 — Diabetic Individual
Input JSON:
{
"Pregnancies": 4,
"Glucose": 180,
"BloodPressure": 85,
"SkinThickness": 35,
"Insulin": 140,
"BMI": 35.0,
"DiabetesPedigreeFunction": 0.627,
"Age": 45
}
Expected Output:
{"diabetic": true}
✅ Explanation: High glucose, BMI, and age — should be classified as diabetic.
📥 Download All Project Files
For convenience, you can download all the project files, scripts, and configurations directly from Google Drive:
Download MLOps Diabetes Prediction Project Files
requirements.txt— Python dependencies
🚀 Real-Time Production Tip: Handling .pem Key Issues in AWS
While deploying your MLOps project on AWS, you might face a situation where your EC2 instance’s .pem key is lost or not working. Instead of terminating the instance and losing all your setup, here’s a live method to recover access:
Problem
You try to SSH into your EC2 instance using the old
.pemfile.AWS denies access because the key is missing or invalid.
You risk losing work if you recreate the instance.
Solution: Replace the Key Live
Stop the instance (do not terminate).
Detach the root EBS volume and attach it to a temporary instance.
Access the filesystem of the detached volume.
Replace the old public key in
/home/ec2-user/.ssh/authorized_keyswith a new public key.Detach and reattach the volume to the original instance.
Start the instance and SSH with your new .pem file.
💡 This method ensures no downtime or data loss, keeping your live system operational.
Why This Matters
In production MLOps workflows, instances often handle critical workloads and live traffic.
Losing SSH access shouldn’t stop your model predictions or autoscaling tests.
This tip is part of real-world cloud reliability best practices.
🧰 5. Challenges & Learnings
The journey was challenging, especially regarding cloud governance and security:
IAM Permissions: Debugging the IAM permissions required by EKS to create the LoadBalancer (via the service account role) was intensive. The solution solidified the importance of using least privilege IAM roles and IRSA (IAM Roles for Service Accounts) within Kubernetes.
EKS Cost Management: Realizing the true cost of running the EKS control plane and worker nodes 24/7 during development led directly to the Lambda automation, making this a truly cost-aware AWS MLOps project.
YAML Configuration: Troubleshooting deployment failures often boiled down to tiny errors in Kubernetes YAML files. The fix was adopting rigorous practices: always validate YAML syntax and check
kubectl describe podoutput first.
🔐 Key AWS & DevOps Best Practices
This project successfully implemented several critical production principles:
IaC Principles: All deployment configurations are versioned in Git, moving towards complete Infrastructure as Code.
Security by Default: Reliance on IAM OIDC and IRSA over static credentials for all cloud interactions.
Continuous Monitoring: Integrating Prometheus and Grafana for real-time dashboards is fundamental for logging & monitoring best practices.
Cost Efficiency: The Lambda function is a powerful example of applying automation for financial governance.
🚀 Results & Impact
The final result was a resounding success:
Model Deployed: The best-performing model was successfully serving predictions via a public LoadBalancer endpoint.
Automated Pipeline: The end-to-end pipeline was fully automated. Any code change in the main branch triggers a complete re-build, containerization, and deployment to EKS—achieving near real-time model updates.
Rapid Deployment: The time-to-deployment, or the "training-to-serving" loop, was reduced from a manual, hour-long process to an automated, 5-minute GitHub Actions workflow.
This project validated the entire end-to-end ML pipeline concept in a production-like environment.
🌟 Future Improvements
To achieve MLOps Level 2 maturity, the next steps include:
Continuous Training (CT): Implement a scheduled GitHub Action or use an orchestrator (like Airflow) to automatically trigger model retraining and version promotion in MLflow based on a weekly schedule or a data drift alert.
Full Infrastructure as Code (IaC): Migrate all AWS resource creation (VPC, EKS cluster, ECR) to Terraform to ensure the entire environment is version-controlled and instantly reproducible.
Advanced Canary Deployments: Utilize Kubernetes Ingress controllers (like Istio or Nginx) to enable Canary deployments or A/B testing before shifting 100% of traffic to a new model version.🚀
🏁 Conclusion
Building this project helped me understand how DevOps meets Ma
chine Learning in real-world systems.
By automating every step — from data to deployment — I built a production-ready MLOps pipeline that’s scalable, efficient, and cost-optimized.
The depth of understanding gained in AWS cloud skills, Kubernetes orchestration, and the overall MLOps methodology has made me significantly more confident and job-ready.
If you are a junior-to-mid level engineer looking to truly master DevOps and MLOps, my advice is simple: Don't just read about it. Build it. Clone the repository, try to deploy it, and experience the satisfaction of seeing your code go from a local file to a production-ready, highly-available service.
This project helped me bridge ML and DevOps practically.
By building a production-ready MLOps pipeline, I gained real-world AWS, Kubernetes, and ML deployment skills.
Pro Tip: Don’t just read about MLOps — build it, deploy it, and learn from real traffic.
🔗 GitHub Repo: mlops-diabetes-prediction-aws
✍️ About the Author
Abhishek Mishra
DevOps & AI Engineer | Building Production-Ready Systems | Passionate About Human-Centered Intelligence
Abhishek is a hands-on DevOps and MLOps engineer who loves turning ideas into scalable, automated, and cloud-native systems. With experience in AWS, Kubernetes, CI/CD, and ML deployment frameworks, he focuses on building practical, production-grade solutions that solve real-world problems.
He believes deeply in learning by doing — breaking things, fixing them, and pushing boundaries to understand how modern systems operate at scale. His projects reflect a blend of strong engineering fundamentals, cost-efficient design, and automation-driven thinking.
When he’s not deploying applications or optimizing pipelines, Abhishek enjoys exploring the intersection of AI and human reasoning, sharing learnings with the community, and helping engineers grow in their DevOps journey.
🌐 Connect With Abhishek
🔗 Portfolio: abhimishra-devops.com
📝 Blog: blog.abhimishra-devops.com
💻 GitHub: github.com/Abhi-mishra998
💼 LinkedIn: linkedin.com/in/abhishek-mishra-49888123b



