Context
Human cardiomyocytes have potential for use in therapeutic cell therapy and high-throughput drug screening. As part of a prediction of human induced pluripotent stem cell cardiac differentiation outcome (sufficient VS insufficient), we developed a machine learning model.
Approach
In this approach, contrary to the reference article (https://www.frontiersin.org/articles/10.3389/fbioe.2020.00851/full), we do not separate the dataset in a split of "up to dd7" and "up to dd5". Let's try a different approach! Instead, we add inferred data to the dataset and remove the initial raw data to improve its performance by manipulating its complexity. We only used the summary statistics of the 7 days to simplify the dataset. Engaging in data exploration, including bar and violin plots, we know the sample size is small & variance is high plus the dataset contains non-linear data with missing interpolation. For that reason, we engage in the use SMOTETomek and a combination of classifiers such as RandomForestClassifier, LogisticRegressor, KNN and GradientBoostClassifier with a Randomized Search (with cross-validations) to optimise hyperparamaters.
WIP
We also have a work-in-progress building the domain knowledge to reduce the initial dataset, as there is a presence of waves, before starting an improved feature selection!
Performance
Our final performance is:
Accuracy : 0.8333 Precision : 0.7500 Recall : 0.6000 MCC : 0.5635

Log in or sign up for Devpost to join the conversation.