Artificial Intelligence (AI)-driven discovery of antimicrobial peptides (AMPs) has significant potential in combating multidrug-resistant organisms (MDROs). However, existing approaches often overlook the three-dimensional (3D) structural characteristics, species-specific antimicrobial activities, and underlying mechanisms of AMPs.
M3-CAD (Multimodal, Multitask, Multilabel, and Conditionally Controlled AMP Discovery) is a cutting-edge pipeline designed to address these challenges. Leveraging the comprehensive QLAPD database, which comprises the sequences, structures, and antimicrobial properties of 12,914 AMPs, M3-CAD facilitates the de novo design of multi-mechanism AMPs with enhanced efficacy and reduced toxicity. M3CAD is a deep learning framework for the design and identification of Antimicrobial Peptides (AMPs). It leverages multimodal learning by combining 1D sequence information with 3D structural (voxel) data to improve the generation of novel peptides and the prediction of their properties.
The repository is structured into two main modules:
- Generation: A Generative Model (VAE/WAE) to design novel AMP sequences conditioned on specific properties.
- Identification: A Classification/Regression Model to predict properties (e.g., antimicrobial activity, toxicity) of peptide sequences.
By integrating these modules, M3-CAD significantly enhances the structural characterization and functional prediction of AMPs, facilitating the discovery of candidates like QLX-3DV-1 TO 20, which exhibits four antimicrobial mechanisms with low toxicity and high efficacy against MDROs.
All project dependencies are specified in the requirement.yml file. To create the Conda environment, execute the following command:
conda env create -f requirement.yml
conda activate torch
To train the generation model and generate novel AMP sequences:
-
Training the Generation Model
Navigate to the
generationfolder and execute thetrain.shscript -
Generating Sequences
Once training is complete, use the sample.sh script to generate sequences
The Generation module focuses on creating new peptide sequences. It supports Variational Autoencoders (VAE) and Wasserstein Autoencoders (WAE) with support for both sequence-only and multimodal (sequence + voxel) inputs.
train.py: Main script for training the generative models.eval.py: Script for generating sequences using trained models.model.py: Definitions of model architectures (SEQVAE,MMVAE,SEQWAE,MMWAE).dataset.py: Data loading utilities for generation tasks.inference.ipynb: Jupyter Notebook for interactive generation and analysis.
To train a new generative model, use train.py.
Example: Train a Sequence-based VAE
cd generation
python train.py --gen_model vae --model seq --epoch 200Arguments:
--gen_model: Type of generative model (vaeorwae).--model: Architecture type (seqfor sequence-only,mm_unetormm_mtfor multimodal).--epoch: Number of training epochs.
To generate new peptides using a pre-trained model:
Option 1: Command Line
cd generation
python eval.py --gen_model vae --model seq --weight_path runs/checkpoints/seq1/weights/best.pthOption 2: Interactive Notebook
Open generation/inference.ipynb in Jupyter. This notebook allows you to:
- Load a trained model.
- Define target conditions (e.g., specific property values).
- Generate sequences and visualize the latent space.
The Identification module predicts the properties of peptides, such as whether they are antimicrobial or toxic. It uses 3D ResNets for structural data and RNNs/Transformers for sequence data.
train.py: Script to train classification or regression models.inference.py: Script for running predictions on new datasets.network.py: Definitions of prediction models (MMPeptide,VoxPeptide,ResNet3D).dataset.py: Data loading for identification tasks.inference.ipynb: Jupyter Notebook for interactive prediction.
To train an identification model (e.g., for antimicrobial activity prediction):
cd identification
python train.py --task anti --epoch 100Arguments:
--task: The target property to predict (e.g.,antifor antimicrobial,toxfor toxicity).--epoch: Number of training epochs.- (Check
train.pyarguments for more options like learning rate, batch size, etc.)
To predict properties for a list of sequences:
Option 1: Interactive Notebook
Open identification/inference.ipynb. This notebook demonstrates how to:
- Load the model and tokenizer.
- Input a list of peptide sequences.
- Obtain prediction scores (probabilities or regression values).
Option 2: Script
(If available) Use inference.py to process a CSV or FASTA file of sequences.
M3CAD/
├── README.md # This file
├── generation/ # Code for designing novel AMPs
│ ├── model.py # VAE/WAE architectures
│ ├── train.py # Training loop for generation
│ ├── eval.py # Generation/Evaluation script
│ └── ...
└── identification/ # Code for property prediction
├── network.py # Classification/Regression architectures
├── train.py # Training loop for identification
├── inference.py # Inference script
└── ...
Pre-trained model checkpoints are available for download to facilitate quick experimentation and deployment. Access them via the following Baidu Drive link:
- Download Link: Baidu Drive
- Extraction Code:
m3cd
Note: A Baidu account is required to access and download the files.
The M3-CAD pipeline has successfully designed QLX-3DV-1 to QLX-3DV-20 , a group of AMP with four distinct antimicrobial mechanisms. This peptide demonstrates:
- Low Toxicity: Minimal adverse effects on host cells.
- High Efficacy: Significant activity against MDROs.
- In Vivo Validation: In a skin wound infection model, QL-AMP-1 exhibited considerable antimicrobial effects with negligible toxicity, underscoring its therapeutic potential.
If you find this project useful, please consider cite the following paper:
@article{wang2024novo,
title={De novo multi-mechanism antimicrobial peptide design via multimodal deep learning},
author={Wang, Yue and Gong, Haifan and Li, Xiaojuan and Li, Lixiang and Zhao, Yinuo and Bao, Peijing and Kong, Qingzhou and Wan, Boyao and Zhang, Yumeng and Zhang, Jinghui and others},
journal={bioRxiv},
pages={2024--01},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}