Inspiration

Throughout observing the different topics that teams could have done their data project on, our team decided to participate in the challenge of analyzing the sepsis data set. Upon preliminary review, we found that the dataset showcased numerous amounts of information on sepsis and the patients that would get it. Many of us were not well-versed on the topic of sepsis, the condition or the risk factors that caused them, thus we felt excited to dive deep and gain a better understanding of the topic as a whole. Moreover, we felt that this case study has a really strong impact as it has the probability of affecting everyone and may inspire individuals to help fund further research in sepsis prevention or medications that could help when there is an emergent sepsis attack within an individual.

What it does

The project showcases various types of visualizations of key factors--like age, race, and gender--within a patient and their comparison with those that had sepsis. We would then develop a machine learning model to attempt to predict the likelihood that certain individuals would contract sepsis given information of certain factors.

How we built it

We mainly used SQL to clean and process the dataset. Once the dataset was processed, we mainly used python to develop the machine learning model as well as create the visualization. For the machine learning model, we used SKLearn classification models, and other libraries like pandas in order to do one hot encoding and train, test, splits. We also referred to tableau to create other simple visualizations throughout the entire presentation. With the graphs and the data tables, the information was posted onto a powerpoint that would showcase the final results including the code and other tools that were used to develop the results.

Challenges we ran into

Multiple challenges were encountered when pre-processing the data. Many times the data that was given was extremely complicated to clean. Additionally, with certain tables that were seen as "empty" within the data set some variables that may have been useful had to have been redirected to a similar variable that may have not told the whole story or completely removed entirely from the analysis. Processing power was also a challenge encountered when diving deeper to the table and analyzing more complex relationships between certain key factors and their relationship with sepsis. Indeed, when needing to display and compute more complicated relationships of the data set it will take longer to compute and process that data.

Accomplishments that we're proud of

The amount of pre-processing and processing amounted to the majority of the project. While the work was extremely extensive, the outcome was extremely rewarding as the final tables and graphs showcased the link between the various amounts of tables and information that was given from the beginning. The machine learning model also was a large achievement that was showcased on the project. For us, this project was our first submission to a Datathon, thus all of the development from visualizations to the machine learning model was an achievement itself.

What we learned

A better understanding of SQL and python libraries that utilize graphs and other statistical visualizations. A better understanding of machine learning processes and how to train machine learning models. A better understanding of the field of data science as a whole.

What's next for Sepsis: A Case Study

With the time constraint we would hope to investigate further variables such as change in blood pressure and body temperature. There were also qualitative variables that we would like to investigate further such as the diseases and symptoms of the patients.

Built With

Share this project:

Updates