Joyanne Ma, Akhila Raj, Dorra Tray
We predict rat oral LD50 from molecular SMILES strings using:
- Morgan fingerprints + physicochemical descriptors (RDKit)
- XGBoost + Random Forest ensemble
- SHAP analysis for interpretability
- Test R² = 0.6368
- Test MAE = 0.4241
- Test RMSE = 0.5697
- Open in Google Colab
- Run all cells top to bottom
- Final test evaluation is in the last section
pyTDC, rdkit, xgboost, scikit-learn, shap, pandas, numpy, matplotlib, seaborn, pillow