Modeling with patient biomarker data. This repository is a self-contained demonstration of my approach to exploring a dataset and building a machine learning model for a binary classification task with missing data.
Author: Zachary Levonian
A good entrypoint to this analysis is the Jupyter notebook that trains and evaluates models to predict the binary outcome. Initial exploration and description of the data is in this Jupyter notebook.
Synthetic patient data provided by Tempus. I don't have permission to share the data, although you can see excerpts in the analysis notebooks.
Data is assumed to be present in the data folder.
Just make install. Requires Python 3.10 or greater.
Poetry is used for managing Python dependencies, and will be installed if it isn't already available.
The directory layout is:
notebookcontains the analysis notebooks.srccontains thebcsPython package with helper functions and classes to support the analysis.datais presumed to be the location of the input data... see the Data section for more details.figurescontains any images produced within the analysis notebooks.