Leukemia Diagnosis System

Leukemia Detection System.

Inspiration

This project was inspired by a desire to use technology to directly help people in medicine, especially in highly skilled and demanding fields like radiology and pathology. Diagnosing leukemia from blood smear images is time-intensive and requires significant expertise, and we wanted to explore how machine learning could act as a reliable decision-support tool to reduce cognitive load and improve consistency in early triage.

What I Learned

Through this project, I gained hands-on experience with machine learning and deep learning, particularly neural networks for image classification. I learned how model architecture, data preprocessing, and training strategies affect performance, as well as how to evaluate models using probabilistic outputs rather than just raw accuracy. I also learned how to deploy models in a way that makes them usable outside of a notebook environment.

How We Built It

We built an end-to-end ML pipeline that analyzes blood smear images to diagnose leukemia subtypes and forecast disease progression. The system uses a convolutional neural network trained on labeled microscopy images, with preprocessing steps to standardize image size and normalize inputs. After training, the model outputs class probabilities and confidence scores, which are exposed through a backend API so the system can be used as a practical diagnostic aid rather than just a research prototype.

Challenges We Faced

One major challenge was finding high-quality, well-labeled medical image datasets. Early data sources were limited and inconsistent, but we eventually identified reliable datasets on Kaggle that met our needs. Training time was another issue due to the computational cost of deep learning models; we solved this by using virtual GPUs to significantly speed up experimentation and iteration.
We also faced challenges related to team collaboration, including coordinating responsibilities, resolving differing technical opinions, and managing timelines. Additional challenges included avoiding overfitting on limited medical data, tuning hyperparameters efficiently, and ensuring the model’s outputs were interpretable enough to be useful in a clinical context rather than a black box.