Skip to main content

Command Palette

Search for a command to run...

🚀 Hands-On MLOps: Deploying a Production-Ready Diabetes Prediction API on AWS

End-to-End Guide with FastAPI, Docker, Kubernetes, and Real-Time AWS Deployment Tips

Published
16 min read
🚀 Hands-On MLOps: Deploying a Production-Ready Diabetes Prediction API on AWS
A
Hi, I’m Abhishek Mishra — a passionate Cloud & DevOps Engineer in the making, certified by GUVI (IIT-M), with over 28+ IIT and Oracle certifications, AWS. I specialize in automating and securing cloud infrastructure using AWS, Terraform, Jenkins, Docker, and Kubernetes, with a strong focus on DevSecOps and real-world cloud deployment projects. 🧠 My mission is to bridge DevOps and Cybersecurity to build reliable, scalable, and secure cloud systems. 🧠 I share hands-on projects, cloud architecture guides, and DevOps insights to help others learn, grow, and build reliable systems. 📬 Let’s collaborate or connect: abhishekmishra09896@gmail.com

🧩 1. Introduction: Bridging the Gap with MLOps

Machine Learning models, however sophisticated, are useless if they remain confined to a Jupyter Notebook. This is where MLOps steps in. MLOps is the critical bridge connecting the agility of software development (DevOps) with the iterative nature of machine learning, ensuring models are not just trained, but deployed, managed, and monitored reliably at scale.

Deploying ML models in a real-world, automated environment requires robust infrastructure, continuous integration/delivery (CI/CD), version control, and comprehensive monitoring.

This blog post is a deep dive into my hands-on project: building an end-to-end MLOps pipeline on AWS for a diabetes prediction system. I’ll walk you through how I leveraged powerful cloud and open-source tools to create a scalable, automated, and cost-efficient ML workflow.

Machine Learning models are powerful — but deploying and managing them efficiently in production is a whole different challenge.
That’s where MLOps (Machine Learning Operations) steps in. It combines Machine Learning + DevOps practices to automate model training, deployment, and monitoring.
This blog walks through how I connected tools like Docker, Kubernetes (EKS), MLflow, GitHub Actions, Prometheus, and Grafana to automate the ML lifecycle.

Highlights :

  1. Full MLOps Pipeline – from ML model training → Docker container → local Kubernetes → AWS EKS → HTTPS production deployment.

  2. Hands-on with Industry Tools – FastAPI, Docker, Kind, EKS, ALB, ACM, Route 53.

  3. Security & Best Practices – HTTPS, IAM roles, least privilege, Kubernetes secrets, health checks.

  4. Portfolio-Ready – Shows your ability to deploy ML models at scale in a production-grade environment.

  5. Beginner-Friendly – Step-by-step guide, all scripts, and manifests included.


☁️ 2. Project Overview: Scalability Meets Automation

The goal:

To design a fully automated MLOps pipeline that predicts whether a person is diabetic based on health parameters — and deploy it on AWS.

The goal of this project was clear: move beyond a simple static deployment and build a scalable, automated, and self-healing ML workflow capable of handling production traffic on AWS.

The core challenge was orchestrating a multi-stage pipeline—from model training and experiment tracking to containerization and deployment on a dynamic Kubernetes cluster—all managed by an automated CI/CD system.

Challenges:

  • Orchestrating multi-stage pipeline: model training → containerization → deployment → CI/CD → monitoring

  • Handling real-time traffic, scalability, and cloud cost optimization

Technologies Used:

CategoryTools & ServicesPurpose
CloudAWS (EKS, EC2, S3, Lambda, ECR, IAM, CloudWatch)Infrastructure, Container Registry, Compute, Cost Management
OrchestrationKubernetes (EKS), DockerContainer Orchestration and Runtime
ML & TrackingMLflowExperiment Tracking, Model Registry
CI/CDGitHub ActionsAutomated Workflow Triggers
MonitoringPrometheus, GrafanaMetrics Collection and Visualization
ApplicationFlaskServing the Prediction Model via API

🔧 Tools & Technologies

  • AWS Services: EC2, EKS, S3, ECR, Lambda, IAM, CloudWatch

  • DevOps Tools: Docker, Kubernetes, GitHub Actions

  • ML Tools: MLflow, Scikit-learn

  • Monitoring: Prometheus, Grafana

  • Web Framework: Flask


🚀 Phases Covered

  1. Local Development – Clone repo, setup venv, train model, run API locally.

  2. Containerization – Dockerfile creation, multi-stage build, run container locally.

  3. Local Kubernetes – Kind cluster, deploy with deployment.yaml, service.yaml, ingress.yaml.

  4. AWS Infrastructure – Configure CLI, create EKS cluster, VPC, IAM roles, ALB controller.

  5. Production Deployment – Push Docker image to ECR, deploy on EKS, configure LoadBalancer.

  6. Domain & HTTPS – ACM SSL certificate, Route 53 A record, validate HTTPS endpoint.


📚 What You'll Learn

By following this guide, you'll gain hands-on experience in:

CategorySkills
Machine LearningModel training, evaluation, scikit-learn, feature engineering
API DevelopmentFastAPI, RESTful APIs, async programming, data validation
ContainerizationDocker, multi-stage builds, image optimization, ECR
KubernetesPods, Deployments, Services, Ingress, scaling, health checks
Cloud InfrastructureAWS EKS, VPC, IAM, security groups, networking
DevOpsCI/CD concepts, GitOps, infrastructure as code
NetworkingDNS (Route 53), load balancing (ALB), SSL/TLS (ACM)
SecurityHTTPS, IAM roles, secrets management, least privilege

⚙️ 3. Architecture Diagram (Text Description)

The end-to-end ML pipeline is structured as follows, embodying the principles of DevOps for machine learning:

  1. Data Ingestion: Raw Pima Indian Diabetes dataset is fetched from a secure source (e.g., S3 or a local file) as the starting point.

  2. Model Training & Tracking: The ML model (Logistic Regression/Random Forest) is trained. MLflow is used meticulously to track all experiments, parameters, and the final serialized model file.

  3. Containerization: The Flask-based inference application, which loads the best MLflow-tracked model, is packaged into a Docker image.

  4. Continuous Integration (CI): On every code push to the main branch, GitHub Actions automatically builds the Docker image and pushes the newly tagged image to AWS ECR (Elastic Container Registry).

  5. Continuous Delivery (CD): GitHub Actions then triggers the deployment by applying updated Kubernetes manifests (YAML files) to the AWS EKS (Elastic Kubernetes Service) cluster.

  6. Model Deployment: EKS pulls the new image from ECR, updates the Deployment, and serves the prediction API via a Kubernetes Service (LoadBalancer).

  7. Monitoring & Alerting: Prometheus scrapes metrics (like request latency, error rates, etc.) from the deployed application, and Grafana provides customizable dashboards for real-time visibility.

  8. Cost Optimization (Auto Shutdown): To prevent running up high EKS costs during idle periods, an AWS Lambda function is scheduled to automatically scale down the EKS worker nodes to zero, adhering to cost optimization best practices.

⚙️ Architecture Workflow

Data Source (Kaggle / S3)
        ↓
Data Preprocessing & Training (MLflow)
        ↓
Containerization (Docker)
        ↓
Image Push to AWS ECR
        ↓
Deployment on AWS EKS Cluster
        ↓
CI/CD Automation via GitHub Actions
        ↓
Monitoring using Prometheus + Grafana
        ↓
Auto Shutdown via AWS Lambda

This architecture ensures:

  • Continuous integration and delivery

  • Centralized model tracking

  • Real-time monitoring

  • Cost optimization


🛠️ Tech Stack

CategoryTechnologyPurpose
LanguagePython 3.9+ML model and API development
ML Librariesscikit-learn, pandas, numpyModel training & data processing
API FrameworkFastAPI, UvicornREST API and async server
ContainerizationDockerApplication packaging
Container RegistryAWS ECRPrivate Docker image storage
OrchestrationKubernetes (Kind, EKS)Container orchestration
Package ManagerHelm 3Kubernetes app management
Cloud ProviderAWSInfrastructure & services
Load BalancingAWS ALBApplication load balancing
DNSRoute 53Domain management
SSL/TLSACMFree SSL certificates

🧩 Step-by-Step Implementation

Let’s understand how every script and configuration file works together to create a fully automated ML deployment system. Each component in this repository has a specific role — from data preprocessing to cloud deployment.
Let’s explore them one by one 👇

1️⃣ main.py — FastAPI API

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
import os

app = FastAPI(title="Diabetes Prediction API")

MODEL_PATH = "diabetes_model.pkl"
if not os.path.exists(MODEL_PATH):
    raise FileNotFoundError("❌ Model file not found! Train it first using train.py")

model = joblib.load(MODEL_PATH)

class DiabetesInput(BaseModel):
    Pregnancies: int
    Glucose: float
    BloodPressure: float
    SkinThickness: float
    Insulin: float
    BMI: float
    DiabetesPedigreeFunction: float
    Age: int

@app.get("/")
def read_root():
    return {"message": "Diabetes Prediction API is live 🎯"}

@app.get("/health")
def health_check():
    return {"status": "healthy"}

@app.post("/predict")
def predict(data: DiabetesInput):
    input_data = np.array([[ 
        data.Pregnancies,
        data.Glucose,
        data.BloodPressure,
        data.SkinThickness,
        data.Insulin,
        data.BMI,
        data.DiabetesPedigreeFunction,
        data.Age
    ]])
    prediction = model.predict(input_data)[0]
    return {"diabetic": bool(prediction)}

🧩 Explanation

This file is the core of your API service — it turns your trained ML model into a real-time, production-ready web API using FastAPI.

When you send patient data (like glucose level, insulin, BMI, etc.) through a POST request to /predict,
the model evaluates it and returns whether the patient is diabetic (True) or not diabetic (False).


2️⃣ train.py — Mode

Let’s now explore each important file of the MLOps Diabetes Prediction Project.
This section explains what each script does, how it contributes to the overall workflow, and why it’s important.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import joblib

# Load dataset (replace with your local or Kaggle path)
df = pd.read_csv("/app/data/diabetes.csv")

print("✅ Columns:", df.columns.tolist())

# Prepare data
X = df[[
    "Pregnancies", "Glucose", "BloodPressure",
    "SkinThickness", "Insulin", "BMI",
    "DiabetesPedigreeFunction", "Age"
]]
y = df["Outcome"]

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Save model
joblib.dump(model, "diabetes_model.pkl")
print("✅ Model saved as diabetes_model.pkl")

🧩 Explanation

This script teaches the AI model how to detect diabetes using the dataset.
Once trained, it exports a .pkl file that acts as the brain for your prediction API.


3️⃣ requirements.txt — Requirements

Lists all the Python packages required to run the project. Ensures consistent setup in local, Docker, or cloud environments.

fastapi
uvicorn[standard]
scikit-learn
pandas
joblib

🧩 Explanation

  • FastAPI: Web framework used to create the prediction API.

  • Uvicorn [standard]: ASGI server to run FastAPI applications.

  • Scikit-learn: Provides machine learning algorithms (Random Forest used here).

  • Pandas: Handles and processes tabular data from the diabetes dataset.

  • Joblib: Used for saving (dump) and loading (load) the trained ML model efficiently.


4️⃣ download.sh— Dataset download script

Automates downloading the Pima Indians Diabetes dataset from Kaggle and places it in the project’s /app/data folder.
This ensures reproducibility for training the model across different environments, including Docker or cloud.

#!/bin/bash
set -e

echo "📥 Downloading Pima Indians Diabetes dataset from Kaggle..."

# Create a data folder if it doesn't exist
mkdir -p /app/data
cd /app/data

# Download from Kaggle
kaggle datasets download -d uciml/pima-indians-diabetes-database -p .

# Unzip and clean up
unzip -o pima-indians-diabetes-database.zip
rm pima-indians-diabetes-database.zip

echo "✅ Dataset downloaded to /app/data"

🧩 Explanation

This script allows one-command dataset setup, making the project ready for ML training on any machine or Docker container.


5️⃣ Dockerfile

Containerizing the Application

Defines how to package your FastAPI diabetes prediction app and all dependencies into a Docker container.
This makes the project portable, consistent, and ready for deployment anywhere — locally, on Kubernetes, or in the cloud.

FROM python:3.10-slim

# Working directory
WORKDIR /app

# Copy files
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Optional: Run training before starting API (if needed)
# RUN python train.py

# Expose port
EXPOSE 8000

# Start FastAPI
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

🧩 Explanation

This Dockerfile allows you to package your ML API in a container that can run consistently across different environments, making deployment easy and scalable.


6️⃣ deployment.yaml—Kubernetes Deployment Configuration

Defines how your FastAPI diabetes prediction app runs inside a Kubernetes cluster.
It specifies the number of replicas, container image, ports, health checks, and resource limits to ensure a scalable and reliable deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: diabetes-api
  labels:
    app: diabetes-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: diabetes-api
  template:
    metadata:
      labels:
        app: diabetes-api
    spec:
      containers:
      - name: diabetes-api
        image: 323997748732.dkr.ecr.ap-south-1.amazonaws.com/mlops-project:latest
        ports:
        - containerPort: 8000
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 10
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

🧩 Explanation:

it ensures your API runs reliably, scales automatically, and integrates seamlessly with Kubernetes monitoring and load balancing.


7️⃣ service.yaml— Kubernetes Service Configuration

Exposes your FastAPI diabetes prediction app to the outside world.
It defines how the pods created by the deployment can be accessed via a LoadBalancer in AWS.

apiVersion: v1
kind: Service
metadata:
  name: diabetes-api-service
  labels:
    app: diabetes-api
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb   
spec:
  selector:
    app: diabetes-api
  ports:
    - port: 80
      targetPort: 8000
  type: LoadBalancer

🧩 Explanation:

it allows your Kubernetes deployment to be reachable from the internet through a managed AWS NLB, while directing traffic safely to your FastAPI pods.


8️⃣ k.yaml— Combined Kubernetes Deployment & Service

This single YAML file combines the Deployment and Service definitions for your FastAPI diabetes prediction app.
It simplifies deployment by allowing Kubernetes to create pods and expose them through a LoadBalancer using a single command:

kubectl apply -f k.yaml

📜 Code:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: diabetes-api
  labels:
    app: diabetes-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: diabetes-api
  template:
    metadata:
      labels:
        app: diabetes-api
    spec:
      containers:
      - name: diabetes-api
        image: abhishek8056/mlops-project:latest   
        ports:
        - containerPort: 8000
        imagePullPolicy: Always
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: diabetes-api-service
spec:
  selector:
    app: diabetes-api
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8000
  type: LoadBalancer

🧩 Explanation:

It provides a one-file deployment solution for your API, making it easier to deploy, update, and scale your application on Kubernetes.


9️⃣ ingress.yaml— Kubernetes Ingress for External Routing

Manages external access to your FastAPI diabetes prediction API.
Routes HTTP/HTTPS traffic from a friendly domain to your Kubernetes service.
This allows users to access the API via a custom domain with SSL encryption.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: diabetes-ingress
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
    alb.ingress.kubernetes.io/certificate-arn: "arn:aws:acm:ap-south-1:323997748732:certificate/fe8d7e6e-7e47-46b8-bcf8-bc6a353b787a"
    alb.ingress.kubernetes.io/ssl-redirect: '443'
spec:
  ingressClassName: alb
  rules:
    - host: mlops.abhimishra-devops.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: diabetes-api-service
                port:
                  number: 80

🧩 Explanation:

It makes your API securely accessible via a custom domain with HTTPS, handled by AWS ALB.
It is the entry point for all external users, providing SSL security, routing, and scalability.


Sample Test Cases :-

Test Case 1 — Healthy Individual

Input JSON:

{
  "Pregnancies": 1,
  "Glucose": 85,
  "BloodPressure": 66,
  "SkinThickness": 29,
  "Insulin": 80,
  "BMI": 26.6,
  "DiabetesPedigreeFunction": 0.351,
  "Age": 25
}

Expected Output:

{"diabetic": false}

✅ Explanation: Typical healthy profile, should be classified as non-diabetic.


Test Case 2 — Diabetic Individual

Input JSON:

{
  "Pregnancies": 4,
  "Glucose": 180,
  "BloodPressure": 85,
  "SkinThickness": 35,
  "Insulin": 140,
  "BMI": 35.0,
  "DiabetesPedigreeFunction": 0.627,
  "Age": 45
}

Expected Output:

{"diabetic": true}

✅ Explanation: High glucose, BMI, and age — should be classified as diabetic.


📥 Download All Project Files

For convenience, you can download all the project files, scripts, and configurations directly from Google Drive:

Download MLOps Diabetes Prediction Project Files

This folder includes:


🚀 Real-Time Production Tip: Handling .pem Key Issues in AWS

While deploying your MLOps project on AWS, you might face a situation where your EC2 instance’s .pem key is lost or not working. Instead of terminating the instance and losing all your setup, here’s a live method to recover access:

Problem

  • You try to SSH into your EC2 instance using the old .pem file.

  • AWS denies access because the key is missing or invalid.

  • You risk losing work if you recreate the instance.

Solution: Replace the Key Live

  1. Stop the instance (do not terminate).

  2. Detach the root EBS volume and attach it to a temporary instance.

  3. Access the filesystem of the detached volume.

  4. Replace the old public key in /home/ec2-user/.ssh/authorized_keys with a new public key.

  5. Detach and reattach the volume to the original instance.

  6. Start the instance and SSH with your new .pem file.

💡 This method ensures no downtime or data loss, keeping your live system operational.

Why This Matters

  • In production MLOps workflows, instances often handle critical workloads and live traffic.

  • Losing SSH access shouldn’t stop your model predictions or autoscaling tests.

  • This tip is part of real-world cloud reliability best practices.


🧰 5. Challenges & Learnings

The journey was challenging, especially regarding cloud governance and security:

  • IAM Permissions: Debugging the IAM permissions required by EKS to create the LoadBalancer (via the service account role) was intensive. The solution solidified the importance of using least privilege IAM roles and IRSA (IAM Roles for Service Accounts) within Kubernetes.

  • EKS Cost Management: Realizing the true cost of running the EKS control plane and worker nodes 24/7 during development led directly to the Lambda automation, making this a truly cost-aware AWS MLOps project.

  • YAML Configuration: Troubleshooting deployment failures often boiled down to tiny errors in Kubernetes YAML files. The fix was adopting rigorous practices: always validate YAML syntax and check kubectl describe pod output first.


🔐 Key AWS & DevOps Best Practices

This project successfully implemented several critical production principles:

  • IaC Principles: All deployment configurations are versioned in Git, moving towards complete Infrastructure as Code.

  • Security by Default: Reliance on IAM OIDC and IRSA over static credentials for all cloud interactions.

  • Continuous Monitoring: Integrating Prometheus and Grafana for real-time dashboards is fundamental for logging & monitoring best practices.

  • Cost Efficiency: The Lambda function is a powerful example of applying automation for financial governance.


🚀 Results & Impact

The final result was a resounding success:

  • Model Deployed: The best-performing model was successfully serving predictions via a public LoadBalancer endpoint.

  • Automated Pipeline: The end-to-end pipeline was fully automated. Any code change in the main branch triggers a complete re-build, containerization, and deployment to EKS—achieving near real-time model updates.

  • Rapid Deployment: The time-to-deployment, or the "training-to-serving" loop, was reduced from a manual, hour-long process to an automated, 5-minute GitHub Actions workflow.

This project validated the entire end-to-end ML pipeline concept in a production-like environment.


🌟 Future Improvements

To achieve MLOps Level 2 maturity, the next steps include:

  • Continuous Training (CT): Implement a scheduled GitHub Action or use an orchestrator (like Airflow) to automatically trigger model retraining and version promotion in MLflow based on a weekly schedule or a data drift alert.

  • Full Infrastructure as Code (IaC): Migrate all AWS resource creation (VPC, EKS cluster, ECR) to Terraform to ensure the entire environment is version-controlled and instantly reproducible.

  • Advanced Canary Deployments: Utilize Kubernetes Ingress controllers (like Istio or Nginx) to enable Canary deployments or A/B testing before shifting 100% of traffic to a new model version.🚀


🏁 Conclusion

Building this project helped me understand how DevOps meets Ma

chine Learning in real-world systems.
By automating every step — from data to deployment — I built a production-ready MLOps pipeline that’s scalable, efficient, and cost-optimized.

The depth of understanding gained in AWS cloud skills, Kubernetes orchestration, and the overall MLOps methodology has made me significantly more confident and job-ready.

If you are a junior-to-mid level engineer looking to truly master DevOps and MLOps, my advice is simple: Don't just read about it. Build it. Clone the repository, try to deploy it, and experience the satisfaction of seeing your code go from a local file to a production-ready, highly-available service.

This project helped me bridge ML and DevOps practically.
By building a production-ready MLOps pipeline, I gained real-world AWS, Kubernetes, and ML deployment skills.

Pro Tip: Don’t just read about MLOps — build it, deploy it, and learn from real traffic.

🔗 GitHub Repo: mlops-diabetes-prediction-aws


✍️ About the Author

Abhishek Mishra
DevOps & AI Engineer | Building Production-Ready Systems | Passionate About Human-Centered Intelligence

Abhishek is a hands-on DevOps and MLOps engineer who loves turning ideas into scalable, automated, and cloud-native systems. With experience in AWS, Kubernetes, CI/CD, and ML deployment frameworks, he focuses on building practical, production-grade solutions that solve real-world problems.

He believes deeply in learning by doing — breaking things, fixing them, and pushing boundaries to understand how modern systems operate at scale. His projects reflect a blend of strong engineering fundamentals, cost-efficient design, and automation-driven thinking.

When he’s not deploying applications or optimizing pipelines, Abhishek enjoys exploring the intersection of AI and human reasoning, sharing learnings with the community, and helping engineers grow in their DevOps journey.


🌐 Connect With Abhishek

🔗 Portfolio: abhimishra-devops.com
📝 Blog: blog.abhimishra-devops.com
💻 GitHub: github.com/Abhi-mishra998
💼
LinkedIn: linkedin.com/in/abhishek-mishra-49888123b