Data Cleanser

This is a simple data cleansing tool that parses information from a hospital's CSV file and converts it into useful information for machine learning algorithms aimed to predict the probability of a patient dying.

Unnecessary Columns

First, the tool removes unnecessary columns that should not effect the outcome of the prediction. These columns include (but are not limited to):

Cost associated with administrator time (dollars)
Size of dose of medicine, since this data was the same for every patient
Education and income of patient
Day of announcement of important updates

Reformatting and Parsing

Then, the tool reforms some columns, such as the sex column which has mixed data (male, Male, M, female, and 1).

The tool also judges values that are out of the standard expected range, such as a heart rate in the thousands, and removes invalid data.

It also parses categorical data into numeric formats. This data comes from the R^2 test performed on the specific data against the death of a patient.

Statistical Analysis

Finally, the tool also performs statistical analysis and sees if specific columns have a correlation with the death of a patient. This step is done manually and the specific columns must be analyzed by a human to see if they are relevant to the prediction. Columns with bad correlation can be removed in the first step mentioned above.

For example, when faced with data that a wide range, we decided to use a hyperbolic tangent function with some transformations in order to accurately provide our machine learning model with precise data. We also utilized R squared values in order to predict correlation between datapoints and effectively remove them in order to offer better data to our model.

Usage

This tool is written in Python and requires the following libraries:

pandas
numpy
matplotlib
scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
TD_Hospital_Model_Train.py		TD_Hospital_Model_Train.py
data_cleanser.py		data_cleanser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Cleanser

Unnecessary Columns

Reformatting and Parsing

Statistical Analysis

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Cleanser

Unnecessary Columns

Reformatting and Parsing

Statistical Analysis

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages