Results (if run locally)
Correlation Heatmap

Context

Human cardiomyocytes have potential for use in therapeutic cell therapy and high-throughput drug screening. As part of a prediction of human induced pluripotent stem cell cardiac differentiation outcome (sufficient VS insufficient), we developed a machine learning model.

Approach

In this approach, contrary to the reference article (https://www.frontiersin.org/articles/10.3389/fbioe.2020.00851/full), we do not separate the dataset in a split of "up to dd7" and "up to dd5". Let's try a different approach! Instead, we add inferred data to the dataset and remove the initial raw data to improve its performance by manipulating its complexity. We only used the summary statistics of the 7 days to simplify the dataset. Engaging in data exploration, including bar and violin plots, we know the sample size is small & variance is high plus the dataset contains non-linear data with missing interpolation. For that reason, we engage in the use SMOTETomek and a combination of classifiers such as RandomForestClassifier, LogisticRegressor, KNN and GradientBoostClassifier with a Randomized Search (with cross-validations) to optimise hyperparamaters.

WIP

We also have a work-in-progress building the domain knowledge to reduce the initial dataset, as there is a presence of waves, before starting an improved feature selection!