Skip to content

Hyperion101010/kubeflow_mlops

Repository files navigation

Citrus Leaf Disease Classification with Kubeflow

This project demonstrates a complete MLOps pipeline for citrus leaf disease classification using Kubeflow on GCP. The system covers model training, evaluation, and deployment via a Flask-based inference service running in a Kubernetes cluster.


Project Structure

.
├── README.md                    # Project overview and documentation
├── inference/                   # Flask app for inference (Dockerized)
│   ├── Dockerfile
│   ├── app.py
│   ├── requirements.txt
│   └── templates/
├── jupiter_notebooks
│   ├── CNN.ipynb                # Jupiter notebook for CNN
│   └── MobileNetV2.ipynb        # Jupiter notebook for MobileNetV2
├── manifests/                   # Kubeflow manifests for installation
│   ├── LICENSE, README.md, etc.
│   ├── apps/, common/, scripts/, etc.
├── pipeline_code/               # Kubeflow pipeline components and compiler code
│   ├── kubeflow-pipeline-cnn.py        # Pipeline that trains a CNN model
│   ├── kubeflow-pipeline-save.py       # Extended pipeline with model saving
│   ├── citrus_pipeline.yaml             # Compiled pipeline yaml file
│   ├── citrus_pipeline-save.yaml        # Compiled pipeline with model saving yaml file
│   └── requirements.txt
└── service_yaml_files/          # Kubernetes service configurations
    ├── lb.yaml                         # Load balancer for inference service
    ├── inference-deployment.yaml       # Deployment for Flask inference pod
    ├── svc-disk.yaml                   # PVC service configuration
    ├── citrus-shell.yaml               # Shell job/service
    └── default_pod_config.yaml         # Pod config to mount PVC in Kubeflow

How to Run

1. Kubeflow Setup

Use the files under manifests/ to install Kubeflow on a GCP Standard GKE cluster.

2. Compile Pipeline

Navigate to pipeline_code/ and compile the pipeline:

python3 kubeflow-pipeline-cnn.py            # For training-only version
python3 kubeflow-pipeline-save.py           # For version with model saving

Upload the resulting YAML files to the Kubeflow Pipelines UI.

3. Deploy Inference Service

Build and deploy the Flask-based inference service:

cd inference/
docker build -t image_name .

Apply the Kubernetes configs from service_yaml_files/:

kubectl apply -f inference-deployment.yaml
kubectl apply -f lb.yaml

Jupyter Notebooks

The jupyter_notebooks/ directory contains exploratory and prototyping notebooks:

  • CNN.ipynb: Implements a basic convolutional neural network for citrus leaf disease classification.
  • MobileNetV2.ipynb: Uses a transfer learning approach leveraging the MobileNetV2 architecture for improved accuracy and training efficiency.

These notebooks were used to prototype models before integrating them into the Kubeflow pipeline.


Notes

  • default_pod_config.yaml ensures all Kubeflow components have access to the mounted PVC.
  • svc-disk.yaml defines the Persistent Volume Claim for saving model artifacts.
  • lb.yaml exposes the Flask app externally using a LoadBalancer service.
  • The trained model is stored in the PVC and read by the inference pod at runtime.

Dependencies

  • Python 3.12.0
  • TensorFlow 2.11
  • Kubeflow Pipelines SDK V2 (2.7.0)
  • Kubernetes (GKE Standard Cluster)
  • Flask

About

MLOps on cloud using Kubeflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors