CardioGraph AI is an interactive machine learning system that predicts cardiovascular disease risk using patient health data. The system combines XGBoost prediction, SHAP explainability, and interactive visualizations to help users understand what factors contribute to heart disease risk and how lifestyle changes can reduce that risk.
This project was developed for the Hack4Health – Byte 2 Beat Hackathon, which focuses on building machine learning solutions using biomedical datasets to improve cardiovascular disease detection, understanding, and interpretability.
CardioGraph AI goes beyond simple prediction by integrating explainable AI, lifestyle simulation, and risk factor visualization, making the system easier to interpret and more useful for understanding cardiovascular health risks.
Cardiovascular disease (CVD) is the leading cause of death worldwide. Risk factors such as high blood pressure, elevated cholesterol, obesity, smoking, diabetes, and lack of physical activity significantly increase the likelihood of developing heart disease.
Although machine learning models can predict disease risk, many operate as black boxes, meaning users cannot easily understand the reasoning behind a prediction.
The goal of this project is to build a system that:
- Predicts cardiovascular disease risk using machine learning
- Explains which health factors influence the prediction
- Provides lifestyle recommendations to reduce risk
- Visualizes relationships between cardiovascular risk factors
This allows users to not only see their predicted risk but also understand why the prediction was made and how they can reduce that risk.
This project uses a cardiovascular disease dataset containing approximately 70,000 patient records with demographic, medical, and lifestyle indicators.
The dataset is de-identified and contains no personally identifiable information.
| Feature | Description |
|---|---|
| age | Age of the patient |
| gender | Biological sex |
| height | Height of the patient |
| weight | Weight of the patient |
| BMI | Body Mass Index (derived feature) |
| ap_hi | Systolic blood pressure |
| ap_lo | Diastolic blood pressure |
| cholesterol | Cholesterol level |
| gluc | Glucose level |
| smoke | Smoking status |
| alco | Alcohol consumption |
| active | Physical activity level |
cardio
0 = No cardiovascular disease
1 = Cardiovascular disease
The dataset originally includes height and weight values. These are converted into Body Mass Index (BMI), which is a more meaningful indicator of obesity-related cardiovascular risk.
BMI is calculated as:
BMI = weight / (height / 100)^2
BMI is widely used in medical research to evaluate obesity and cardiovascular risk.
The system uses XGBoost (Extreme Gradient Boosting), a powerful ensemble learning algorithm designed for structured datasets.
XGBoost builds multiple decision trees sequentially and combines them to produce highly accurate predictions.
Dataset
↓
Feature Engineering (BMI)
↓
StandardScaler Normalization
↓
Train/Test Split
↓
XGBoost Model Training
↓
Prediction
↓
Explainability with SHAP
The model is evaluated using two standard classification metrics.
Measures the proportion of correct predictions.
Accuracy = Correct Predictions / Total Predictions
Measures how well the model distinguishes between patients with and without cardiovascular disease.
ROC-AUC closer to 1 → better classification
ROC-AUC closer to 0.5 → random guessing
Example model performance:
Model Accuracy: 0.73
Model ROC-AUC: 0.79
To make predictions interpretable, the system uses SHAP (SHapley Additive Explanations).
SHAP calculates how much each feature contributes to the final prediction.
For each patient prediction the system shows:
- which features increase cardiovascular risk
- which features decrease cardiovascular risk
- the magnitude of each feature’s influence
Example explanation output:
Feature Impact on Prediction
Blood Pressure +0.28
Cholesterol +0.19
Smoking +0.11
BMI +0.07
Physical Activity -0.05
This helps users understand why the model predicted a certain risk level.
The system also provides lifestyle improvement suggestions and recalculates the predicted risk after applying those changes.
Example:
Current Risk: 26.41%
Suggested Improvements
• Stop smoking
• Increase physical activity
• Reduce BMI
New Predicted Risk: 12.17%
This transforms the model into a decision-support tool that demonstrates how behavioral changes influence cardiovascular risk.
CardioGraph AI includes two visualization systems to improve interpretability.
An interactive Plotly bar chart visualizes how each feature contributes to the prediction.
Users can quickly identify the most influential health factors affecting cardiovascular risk.
A NetworkX + PyVis graph visualizes relationships between cardiovascular risk factors.
Example relationships:
Smoking → Blood Pressure → Heart Disease
Physical Activity → BMI → Blood Pressure
Cholesterol → Heart Disease
Alcohol → Heart Disease
| Color | Meaning |
|---|---|
| 🔴 Red | Major active risk factor |
| 🟠 Orange | Moderate risk |
| 🟢 Green | Protective factor |
| 🔵 Blue | Neutral factor |
This visualization helps users understand how different health variables interact and contribute to disease risk.
User Health Input
↓
Data Preprocessing
↓
XGBoost Prediction
↓
Risk Probability
↓
SHAP Explanation
↓
Lifestyle Recommendations
↓
Interactive Visualizations
The entire system is deployed as an interactive dashboard using Streamlit.
Python
- XGBoost
- Scikit-Learn
- SHAP
- Plotly
- PyVis
- NetworkX
- Pandas
- NumPy
- Streamlit