Skip to content

JocelynSong/PPDiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design

Model Architecture

This repository contains code, data and model weights for ICML 2025 paper PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design

The overall model architecture is shown below:

image

Environment

The dependencies can be set up using the following commands:
conda env create -f PPDiff.yml 
conda activate PPDiff 
bash setup.sh 

General Protein-Protein Complex Design

Download Data

We provide our curated protein-protein complex dataset PPBench at PPBench

Please download the dataset and put them in the data folder.

mkdir data 
cd data 
wget https://drive.google.com/file/d/1DmvVKvZIVxT4-bxIIZ4bRQtIrQJ2QwJN/view?usp=drive_link
wget https://drive.google.com/file/d/1LVP7j1KhmgnotB_N7WyrKY8eBU61d_Ht/view?usp=sharing

Download Model

We provide the checkpoint of general protein-protein complex design task used in the paper at Model

Please download the checkpoints and put them in the models/PPDiff folder.

If you want to train your own model, please follow the training guidance below

Training

If you want to train a model from scratch, please follow the script below:
bash train_complex_data_diffusion.sh

Inference

To design general protein-protein complexes, please use the following scripts:
bash generation.sh

There are three items in the output directory:

  1. target.txt refers to the target protein sequences
  2. binder.true.txt refers to the input binder sequences
  3. binder.gen.txt refers to the designed binder sequences

Target Protein-Mini Binder Complex Design

Download Data

We provide our the curated target protein-mini binder complex dataset at Binder_Design_Data

Please download the dataset and put them in the data folder.

cd data 
wget https://drive.google.com/file/d/1-SWyf7WQz0UCilXjUgAU-rPIBYT-U8ZD/view?usp=drive_link

Download Model

We provide the checkpoint of the target protein-mini binder complex design task used in the paper at Binder_Design_Model

Please download the checkpoints and put them in the models/binder_design folder.

If you want to finetune your own model using our curated data, please follow the finetuning guidance below:

Finetuning

If you want to finetune a binder design model, please follow the script below:
bash fineune_target_protein_mini_binder_complex.sh

Inference

To design target protein-mini binder complexes, please use the following scripts:
bash generation.sh

There are three items in the output directory:

  1. target.txt refers to the target protein sequences
  2. binder.true.txt refers to the input binder sequences
  3. binder.gen.txt refers to the designed binder sequences

The order of the target protein category in the test set are ["EGFR", "FGFR2", "H3", "IL7Ra", "InsulinR", "PDGFR", "TGFb", "Tie2", "TrkA", "VirB8"] with size {"EGFR": 4, "FGFR2": 57, "H3": 38, "IL7Ra": 7, "InsulinR": 23, "PDGFR": 26, "TGFb": 9, "Tie2": 2, "TrkA": 4, "VirB8": 7}. You can get the category of the designed binder by mapping the corresponding ground truth target and binder proteins.

Antigen-Antibody Complex Design

Download Data

We provide our the curated antigen-antibody complex dataset through CDR-H1 cluster, CDR-H2 cluster, CDR-H3 cluster

Please download the dataset and put them in the data folder.

cd data 
wget https://drive.google.com/file/d/1a5tIcoVfY95CpKnBnewQj96xu651vev_/view?usp=drive_link
wget https://drive.google.com/file/d/1yvIr4dkK2xWzKYm2qHidrE8eh-VO5ab2/view?usp=drive_link
wget https://drive.google.com/file/d/12CAB2eXSrg-8yYfoVyFOYde8HCwzUHQT/view?usp=drive_link

Download Model

We provide the checkpoint of the antigen-antibody complex design task obtained by training data according to CDR-H1 cluster, CDR-H2 cluster, CDR-H3 cluster

Please download the checkpoints and put them in the models/antibody_design_cdrh1, models/antibody_design_cdrh2, models/antibody_design_cdrh3 folder.

If you want to finetune your own model using our curated data, please follow the finetuning guidance below. Here we taking data by CDR-H1 clustering as an example:

Finetuning

If you want to finetune an antibody design model, please follow the script below:
bash finetune_antigen_antibody_complex_design_cdrh1.sh

Inference

To design antigen-antibody complexes, please use the following scripts:
bash design_antibody.sh

There are three items in the output directory:

  1. antigen.true.txt refers to the antigen sequences
  2. heavy.chain.true.txt refers to the ground truth heavy chain sequences
  3. heavy.chain.gen.txt refers to the designed heavy chain sequences
  4. light.chain.true.txt refers to the ground truth light chain sequences
  5. light.chain.gen.txt refers to the designed light chain sequences

Citation

If you find our work helpful, please consider citing our paper.
@article{song2025ppdiff,
  title={PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design},
  author={Song, Zhenqiao and Li, Tiaoxiao and Li, Lei and Min, Martin Renqiang},
  journal={arXiv preprint arXiv:2506.11420},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors