Skip to content

PatWalters/resources_2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

Machine Learning in Drug Discovery Resources 2025

Books

Drug Design: From Structure and Mode-of-Action to Rational Design Concepts As cheminformatics practitioners, we need to understand the drug design process. This book, written by Prof. Gerhard Klebe, a pioneer in the field, provides an excellent overview of numerous drug design approaches.

Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter Programming and data science are critical elements of cheminformatics. This book, written by Wes McKinney, the author of the widely used Pandas library, provides a great starting point for learning Python and applying it in data science.

Data Science from Scratch: First Principles with Python This book provides another great introduction to data science. It provides an introduction to several critical topics, including Python, Statistics, Probability, Machine Learning, Clustering, and Databases.

Statistics in a Nutshell: A Desktop Quick Reference Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python To effectively apply cheminformatics, one needs a solid grasp of statistics. This book provides a good overview with code examples.

Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python Machine learning (ML) has become an integral component on cheminformatics. This book provides a fantastic introduction to more traditional ML approaches and recent advances in deep learning.

Datasets

You'll notice the conspicuous absence of two widely used datasets, MoleculeNet and the Therapeutic Data Commons (TDC), from this list. Both of these datasets are highly flawed and should not be used. For more on the reasons why, please consult this blog post.

OpenADMET seeks to proactively characterize the chemical space accessible to ADMET-associated proteins (“anti-targets”). By applying recent advances in experimental and computational techniques, a comprehensive open library of experimental and structural datasets will be generated. It's early days for OpenADMET, but knowing the folks involved, I'm highly optimistic.

AIRCHECK is a platform that provides access to a large collection of high-quality datasets for drug discovery and development. The datasets are curated from various sources and are available in a standardized format. The current focus appears to be on DNA-encoded library (DEL) data.

Polaris aims to improve the state of benchmarking so ML can have a more significant impact on real-world drug discovery scenarios. To start, Polaris hopes to provide a single source of truth that aggregates and provides simple access to datasets & benchmarks.

PLINDER is an academic-industry collaboration to collect and organize protein-ligand interaction data. The effort is driven by VantAI, NVIDIA, the Computational Structural Biology group at the University of Basel & SIB Swiss Institute of Bioinformatics (co-organizers of CASP), and MIT. PLINDER aims to provide a gold standard dataset and evaluations to push the field of computational protein-ligand interactions prediction forward.

Blogs

Eric J Ma's Website Eric's blog provides an excellent introduction to the application of cutting-edge informatics in drug discovery.

Oxford Protein Informatics Group (OPIG) This blog contains a lot of great [Bio|Chem]informatics content, chock-full of code.

Charlie’s Substack Charlie Harris writes about applications of AI in drug discovery. Most recently, his posts have focused on efforts to reproduce AlphaFold3.

Mogan Thomas' Cheminformatics Blog This one is new, but it looks promising based on the first post.

Jon Swain's Blog Jon Swain, a second-generation Cheinformatics blogger, has a great set of Jupyter notebooks demonstrating key concepts.

Practical Cheminformatics This is a blog where I post once a month or so. These posts typically contain code demonstrating various aspects of cheminformatics; clustering, machine learning, data visualization, etc. I occasionally post opinions on things like AI and getting a job.

Is Life Worth Living A great blog from Iwatobipen (aka pen), whose posts are chock-full of great code examples. Pen always seems to be up on the latest methods and posts interesting examples on various topics ranging from quantum chemistry to machine learning.

The RDKit Blog Greg Landrum is the primary contributor to and BDFL of the RDKit. In addition to the latest and greatest features in the RDKit, Greg's posts also touch on a number of key issues in Cheminformatics, such as dealing with unbalanced datasets and the impact of fingerprint folding on similarity searching.

Models to molecules A new blog by Dries Van Rompaey that is off to a great start.

Tutorials

Practical Cheminformatics Tutorials I put together this collection of Jupyter notebooks to demonstrate various aspects of cheminformatics and machine learning. The notebooks illustrate a range of topics from cheminformatics basics to more advanced machine learning. The tutorials all use open source software and can run on Google Colab without installing software locally.

TeachOpenCADD A great set of tutorials from Andrea Volkamer's group that use open-source software to teach Computer-Aided Drug Design concepts, including molecular similarity, applications of machine learning, and pharmacophore analysis.

The RDKit Cookbook A terrific resource that provides "recipes" for a number of common tasks.

Vina Colab Tutorials A tutorial set shows how to run Autodock Vina and the associated protein and ligand setup utilities on Google Colab.

GNNs for Chemists A great introduction to graph neural networks (GNNs) by Hosein Fooladi.

PDB-101 from the RCSB PDB The Protein Databank (PDB) has a wide range of tutorials available. The Python scripting tutorials are very good.

About

Machine Learning in Drug Discovery Resources 2024

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors