Challenge 0

Abstract:

Challenge 0

We use Random Forest regression and RFE for feature selection. We use 40-fold MC cross-validation, not LOO cross-validation because Grid-Search CV tuning is too slow, while 40x MC CV should be excellent.

We employ a Random Forest Classifier and an XGBoost Classifier, depending on the dataset. Higher test performance as compared to cross-validation is explained by increased data, as well as partially by luck for dd5, but the performance is better either way.

We can outperform the models developped in the paper in classification up to dd5, and we are able to match the already excellent performance up to dd7. We also use more flexible and general feature selection methods, as well as SMOTE to handle unbalanced datasets.

We have tried adding the log of the data, to no appreciable improvement, as well a kNN dynamic-time-warping timeseries classification (knn-dtw.py). We also tried SVM classifiers, with poorer performance. We believe this is due to poor extrapolation performance of SVMs in high-dimensional, low data situations.

Due to feature selection, it may sometimes be the case (though rare) that very poor models are obtained. In the case of very poor performance, it is sufficient to simply run the data again.

RF, up to dd7:

Average cross-val acc: 0.90 Average cross-val precision: 0.87 Average cross-val recall: 0.93 Average cross-val MCC: 0.81 (Monte-carlo, 40-fold)

Test:
              precision    recall  f1-score   support

       False       0.87      1.00      0.93        13
        True       1.00      0.60      0.75         5

    accuracy                           0.89        18
   macro avg       0.93      0.80      0.84        18
weighted avg       0.90      0.89      0.88        18
MCC: 0.72

XGB, up to dd5: (Using 40-fold Monte-Carlo cross-validation) XGB class Average cross-val acc: 0.82 Average cross-val precision: 0.81 Average cross-val recall: 0.89 Average cross-val MCC: 0.66

Test:
              precision    recall  f1-score   support

       False       0.87      1.00      0.93        13
        True       1.00      0.60      0.75         5

    accuracy                           0.89        18
   macro avg       0.93      0.80      0.84        18
weighted avg       0.90      0.89      0.88        18

MCC: 0.72

Built With

imblearn
numpy
python
scikit-learn
xgboost

Updates

Rasmy Samy started this project — Mar 26, 2023 12:24 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.