Skip to content

haddocking/shape-restrained-haddocking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shape restrained haddocking

Introduction

This repository contains all the code and data associated with our publication titled:

"Shape-restrained modelling of protein-small molecule complexes with HADDOCK"

immediately available online as a preprint and peer-reviewed publication.

A secondary addendum to the publication is the sbgrid deposition which in addition to the contents of this repository also contains all the models that were generated for the benchmarking of these protocols and are impossible to include in GitHub due to repository size limitations.

We have made our best efforts to organise the data as intuitively as possible so as to make our code and data as assesible to as many interested parties as possible. An explanation of the various folders and files in this repository follows.

Data layout overview [^]

This is the top-level git directory

LICENSE
README.md
code/
conformers/
data/
manuscript/
restraints/
results/
runs/
scripts/
setup/
shapes/
structures/
templates/
toppar/

Details [^]

LICENSE contains the license under which we are making our code and data available. README.md is this document.

Code [^]


This folder contains general-purpose scripts that can be applied beyond the specific confines of this project. In most cases the name of the code should suffice in terms of explanation as to what it does. In case it doesn't though, most of these files are very well documented.

The contents of the folder are:

add_atom_features.py
add_his_protonation.py
calc_mcs.py*
dist_to_izone.py*
generate_conformers.py*
get_similar_by_ligand.py*
get_similar_by_protein.py*
ligdist.py*
pdb_element.py*
pharm2D_Tc.py
plot_helpers.R
restrain_ion.py*

In summary, add_atom_features.py can be used to define the features on which the pharmacophore-based protocol depends on, add_his_protonation.py makes use of the output of molprobity to set the protonation state of HIS residues (this was only used for the pharmacophore-based protocol), calc_mcs.py calculates the Maximum Common Substructure-based similarity between a single reference and multiple template compounds using the Tanimoto and Tversky coefficients (the latter was used for the shape-based protocol), dist_to_izone.py converts atom-based distances as they are calculated by the ligdist.py script to the format that is accepted ProFit, generate_conformers.py generates 3D conformers while accepting multiple arguments that determine the process that is used for the torsional sampling, get_similar_by_ligand.py and get_similar_by_protein.py were used to fetch data from the RCSB while making use of the REST- and graphql-based APIs, ligdist.py calculates distances between a compound and its cognate receptor, pdb_element.py is a modified version of the repsective tool from the PDB-tools suite used for the analysis of the results, pharm2D_Tc.py calculates the Tanimoto coeeficient between compounds using the aforementioned pharmacophore fingerprints, plot_helpers.R is a ggplot-based library of graphs and plots and restrain_ion.py produces tbl-formatted (accepted by HADDOCK) restraints that maintain the correct geometry for ions and their coordinating sidechains throughout the simulation.

Details regarding the software requirements for this project can be found in a later section.

Conformers [^]


This is where individual PDB files for each target compound can be found. The conformers have been generated with RDKit using the settings described in the paper. There are at most 50 PDB files per directory as that is the number of conformers that was found to provide the best balance of docking results and sampling efficiency. Files are grouped by target.

Data [^]


This folder hosts two files, one per protocol and they are appropriately named as to reflect the protocol to which they correspond. The file for the shape protocol is in JSON format because of the presence of the low_sim templates whereas the file for the pharm protocol is in csv format. These files contain all the data pertaining to the targets, for example, template PDB ids, RMSD of the binding site, etc...

Manuscript [^]


This folder contains all the code, data and analysis that was undertaken in preparation for the submission of the manuscript. Additionally it contains all the figures of the main text as well as the tables and figures of the SI.

Restraints [^]


This is where the restraints used during the two protocols can be found. This is broken down by protocol and grouped by target.

Results [^]


This is where the results for both protocols are stored in 5-column, space-separated txt files with columns 1-5 corresponding to target, stage, model number, IL-RMSD and model rank, respectively, broken down by protocol and grouped by target.

Runs [^]


This folder is not version-controlled since it would be impossible to have the run folders under git and is therefore absent from the repository. It is the default location for the runs created by the scripts under scripts.

Scripts [^]


Broken down by protocol, this is where the scripts that orchestrate setting up, starting and analysing runs can be found. The names should be self-explanatory. They all assume that the runs are in the aforementioned runs folder and the run directories have already been created. The crontab that is running everything looks something like this:

$ crontab -l
*/5 * * * * cd .../runs; ../scripts/shape/start.sh >> .start.log 2>> .start.err
*/1 * * * * cd .../runs; ../scripts/shape/analyse.sh >> .analyse.log 2>> .analyse.err
1,16,31,46 * * * * cd .../runs; ../scripts/shape/check.sh &> .check.log

where ... is the path to the aforementioned runs directory.

Setup [^]


Broken down by protocol, this is where the run.param and run.cns used by the scripts under scripts are located. These files control the protocol that is being benchmarked.

Shapes [^]


Broken down by protocol, this is where the shapes extracted from the PDB files of the templates can be found. For the shape protocol, some targets also contain the low_sim shapes. The shapes are in the coordinate system of their respective template receptor, and by extension the reference as well.

Structures [^]


These are the reference structures of the benchmark. In addition to the reference complex each target folder also contains the reference compound, the original complex as it was downloaded from the PDB and a README file detailing all the modifications that were made in preparing the structure for docking/analysis.

Templates [^]


Broken down by protocol, this is where the template receptors and compounds (in separate files) can be found. The templates have been superimposed on their respective reference receptor.

Toppar [^]


This is where the topologies and parameters for all target compounds can be found.

Software requirements [^]

Regarding the software side of things, the code that can be found in scripts only depends on bash meaning there are no dependencies. The code under code is mostly python and does have a few dependencies. Namely:

requests
pandas
rdkit
openbabel

along with a reasonably recent version of Python. This code and all work on this project was performed with Python 3.8 installed managed by conda. Conda was chosen as it is the preferred way to install RDKit. All of the aforementioned modules can be found on the conda-forge channel and installed in one go with a command that looks like this:

conda install -c conda-forge requests pandas rkdit openbabel

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors