Predicting Heart Disease with Machine Learning

Inspiration

The inspiration behind this project came from our common desire to build something that helps the people around us and utilizes our current computer science skills.

What it does

The program takes the data set, cleans it in various ways, splits the data into training and testing sets, train the data with various algorithms and creates visualizations of the data before and after the modeling.

How we built it

The project was built using Google Colab and various libraries in Python such as MatplotLib, Pandas, Numpy, Scikit-Learn.

Challenges we ran into

While we were very eager to learn, since machine learning is very broad and there are a lot of things to learn about, the amount of information found when researching was sometimes overwhelming. Another difficulty we ran into was that some resampling methods and models that we wanted to try, such as SMOTETomek and GradientBoostingClassifier, took extremely long to run, so we decided that other methods would be better suited given our time constraints. Furthermore, because our goal was lofty considering we had very little background knowledge, the amount that we had to do in only 3 weeks was a lot, and although we finished the project on time, there were times when we had to do much additional work in order to meet our own deadlines. We had a lot more planned to get done during the length of the program, but we were not able to get to some of the things we were interested in doing.

Accomplishments that we're proud of

We were able to successfully choose a project idea and topic, diving into Machine Learning and gaining exposure to the field all within three weeks. During this process, we were able to try several types of resampling methods and successfully complete many different types of models in time for the expo, combining our results to create a project that explores many different approaches to analyzing the data.

What we learned

Through tutorials and a lot of research, we learned how to conduct exploratory data analysis to have a better understanding of our dataset, preprocess the data, and create varying models. We learned that there are so many different methods to solve any given problem, and there often isn't a right or wrong way to solve a particular problem. The chosen approach often has its own advantages and disadvantages, and it is important to explain this in terms of the dataset. In addition, we learned the basic processes necessary for training a dataset and we learned about the many algorithms available to train the data. Having very little experience with the subject, we learned much of the content from scratch and we learned as we went.

What's next for Predicting Heart Disease with Machine Learning

Create a user interface where users can input their information and a prediction can be generated based on our model
Include data from other diseases to create similar predictions
Improve models (e.g. tune hyperparameters, try other models and neural networks)
Potentially look into quantum computing and how that might help us