This repository contains code, data and model weights for ICML 2025 paper PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design
The overall model architecture is shown below:
The dependencies can be set up using the following commands:conda env create -f PPDiff.yml
conda activate PPDiff
bash setup.sh We provide our curated protein-protein complex dataset PPBench at PPBench
Please download the dataset and put them in the data folder.
mkdir data
cd data
wget https://drive.google.com/file/d/1DmvVKvZIVxT4-bxIIZ4bRQtIrQJ2QwJN/view?usp=drive_link
wget https://drive.google.com/file/d/1LVP7j1KhmgnotB_N7WyrKY8eBU61d_Ht/view?usp=sharing
We provide the checkpoint of general protein-protein complex design task used in the paper at Model
Please download the checkpoints and put them in the models/PPDiff folder.
If you want to train your own model, please follow the training guidance below
If you want to train a model from scratch, please follow the script below:bash train_complex_data_diffusion.shbash generation.shThere are three items in the output directory:
- target.txt refers to the target protein sequences
- binder.true.txt refers to the input binder sequences
- binder.gen.txt refers to the designed binder sequences
We provide our the curated target protein-mini binder complex dataset at Binder_Design_Data
Please download the dataset and put them in the data folder.
cd data
wget https://drive.google.com/file/d/1-SWyf7WQz0UCilXjUgAU-rPIBYT-U8ZD/view?usp=drive_link
We provide the checkpoint of the target protein-mini binder complex design task used in the paper at Binder_Design_Model
Please download the checkpoints and put them in the models/binder_design folder.
If you want to finetune your own model using our curated data, please follow the finetuning guidance below:
If you want to finetune a binder design model, please follow the script below:bash fineune_target_protein_mini_binder_complex.shbash generation.shThere are three items in the output directory:
- target.txt refers to the target protein sequences
- binder.true.txt refers to the input binder sequences
- binder.gen.txt refers to the designed binder sequences
The order of the target protein category in the test set are ["EGFR", "FGFR2", "H3", "IL7Ra", "InsulinR", "PDGFR", "TGFb", "Tie2", "TrkA", "VirB8"] with size {"EGFR": 4, "FGFR2": 57, "H3": 38, "IL7Ra": 7, "InsulinR": 23, "PDGFR": 26, "TGFb": 9, "Tie2": 2, "TrkA": 4, "VirB8": 7}. You can get the category of the designed binder by mapping the corresponding ground truth target and binder proteins.
We provide our the curated antigen-antibody complex dataset through CDR-H1 cluster, CDR-H2 cluster, CDR-H3 cluster
Please download the dataset and put them in the data folder.
cd data
wget https://drive.google.com/file/d/1a5tIcoVfY95CpKnBnewQj96xu651vev_/view?usp=drive_link
wget https://drive.google.com/file/d/1yvIr4dkK2xWzKYm2qHidrE8eh-VO5ab2/view?usp=drive_link
wget https://drive.google.com/file/d/12CAB2eXSrg-8yYfoVyFOYde8HCwzUHQT/view?usp=drive_link
We provide the checkpoint of the antigen-antibody complex design task obtained by training data according to CDR-H1 cluster, CDR-H2 cluster, CDR-H3 cluster
Please download the checkpoints and put them in the models/antibody_design_cdrh1, models/antibody_design_cdrh2, models/antibody_design_cdrh3 folder.
If you want to finetune your own model using our curated data, please follow the finetuning guidance below. Here we taking data by CDR-H1 clustering as an example:
If you want to finetune an antibody design model, please follow the script below:bash finetune_antigen_antibody_complex_design_cdrh1.shbash design_antibody.shThere are three items in the output directory:
- antigen.true.txt refers to the antigen sequences
- heavy.chain.true.txt refers to the ground truth heavy chain sequences
- heavy.chain.gen.txt refers to the designed heavy chain sequences
- light.chain.true.txt refers to the ground truth light chain sequences
- light.chain.gen.txt refers to the designed light chain sequences
@article{song2025ppdiff,
title={PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design},
author={Song, Zhenqiao and Li, Tiaoxiao and Li, Lei and Min, Martin Renqiang},
journal={arXiv preprint arXiv:2506.11420},
year={2025}
}
