Inspiration

According to World Health Organization (WHO), cardiovascular diseases, including heart disease and stroke, are the leading cause of death globally, accounting for approximately 31% of all deaths worldwide. In India, heart disease is also a significant health concern, and according to the Indian Heart Association, it is the leading cause of death in the country, accounting for over 2.4 million deaths annually. To contribute to this issue we thought of integrating Machine Learning to predict the health of our heart on the basis of some values like Cholesterol level, Age, Chest pain type, etc.

How we built CardioInsight

  1. The very first step was collecting the dataset for predicting heart disease. Once the dataset was selected, since the raw data is often not suitable for modeling, also the algorithms don't work on categorical data we performed some preprocessing steps such as removing null values and label encoding categorical data.

  2. With this, we moved on to our second step where we cleaned and transformed the data by handling missing values, converting categorical variables to numerical variables, and normalizing the data to prepare it for machine learning models.

  3. The third step comprised feature selection which is a critical part in building a predictive model as it involves selecting the most relevant and informative features. In our case, we have used feature selection algorithms such as SelectKBest, XGBoost, and Fisher's score to select the best features out of all features from the dataset. With the processed data and selected features, we started building an ensemble classifier model that combines multiple models to improve performance. We have used two efficient machine learning techniques as SVM (Support Vector Machine) classifier, and the XGBoost classifier to build the ensemble model.

  4. After building the model, we decided to deploy it for practical use for which we used Streamlit, a popular Python library, to build a web application that allows users to input data and get predictions for heart disease. We connected our model to the web application using Streamlit's user interface, due to which we can now share the web application with others to use it for practical purposes.

Challenges we ran into

Following were the challenges we faced while developing the project:

  1. Finding the best feature selection and classification algorithms other than the legacy or traditional methods.
  2. To find the right combination of classifiers in ensembling to improve accuracy.

Accomplishments that we're proud of

The accuracy that we were getting by simply feeding the data to the algorithm without feature selection was between 70 - 80%. But after applying the ensemble feature selection algorithm along with the ensemble classifier we got an accuracy of 84% - 90%. It was an exciting moment for us when our algorithm gave almost 5% - 8% more enhanced output as compared to previous methods.

Our Learnings:

During the development span of this project, we learned about various feature selection algorithms along with some classifiers like XGBoost, SVM, AdaBoost,, etc. Our main focus was on developing an efficient algorithm to enhance the accuracy of the existing algorithms i.e., without feature selection and ensembling. To provide the front end to our project we used Streamlit.

Built With

  • data-preprocessing
  • ensemble-classification
  • ensemble-feature-selection
  • lab-encoding
  • python
  • sklearn
  • streamlit
Share this project:

Updates