Skip to content

add automated toppar generation for unknown small molecules#1476

Merged
rvhonorato merged 53 commits intomainfrom
1475-automated-toppar-generation-for-unknown-ligands
Feb 27, 2026
Merged

add automated toppar generation for unknown small molecules#1476
rvhonorato merged 53 commits intomainfrom
1475-automated-toppar-generation-for-unknown-ligands

Conversation

@rvhonorato
Copy link
Copy Markdown
Member

@rvhonorato rvhonorato commented Feb 24, 2026

You are about to submit a new Pull Request. Before continuing make sure you read the contributing guidelines.

Checklist

Summary of the Pull Request

This PR adds a new libligand with the purpose of handling small molecules, in this case specifically the creation .top and .param files using PRODRG.

Currently when the user does not input ligand_param/top_fname and the input contains any unknown hetatms, there are just deleted during sanitization. I am not sure if this is the desired behaviour or not, but assuming it is - I added a new boolean parameter called autotoppar to ensure this pathway of removing unknown atoms can be preserved.

So to trigger the automated generation, the user needs to set autotoppar=true and if this parameter is defined together with ligand_param/top_fname, ligand_param/top_fname takes priority.

The automated topology/param generation of small molecules is done via prodrg - shipped as a binary with the code after this PR. I also added two sanitization methods to remove NBONds from the parameters generated by prodrg as they can interfer with haddock ones, and also found that prodrg will sometimes add a : to the topology which breaks CNS syntax - so there is another method to clean it.

And needles to say this only applies to things that are not known by haddock, which is defined in haddock.core.supported_molecules.

When we automate the toppar generation, what means is that we are effectively setting the ligand_param/top_fname for the user. This is easily done in the module that generates it, but this value needs to be propagated so that modules downstream on the workflow can also access it. To deal with this behaviour I added a new logic in libworkflow to handle this propagation.

Related Issue

#1475

Additional Info

We currently only have prodrg binaries for x86_64-linux and arm64-darwin - when we have others they should also be added here, but we need to handle this elsewhere, please look here: https://github.com/haddocking/prodrg

run_prodrg will fail graciously if autotoppar=true in an unsupported architecture.

@rvhonorato rvhonorato linked an issue Feb 24, 2026 that may be closed by this pull request
@rvhonorato rvhonorato self-assigned this Feb 24, 2026
@rvhonorato rvhonorato added enhancement Improving something in the codebase m|topoaa topoaa module labels Feb 24, 2026
@rvhonorato rvhonorato marked this pull request as ready for review February 24, 2026 17:37
@rvhonorato
Copy link
Copy Markdown
Member Author

rvhonorato commented Feb 25, 2026

I just realized that each module is expecting to have ligand_top/param_fname defined in case we are using unknown atoms. Marking this back as draft while I figure out a way to propagate the automatic toppar

Sorted it out (:

@rvhonorato rvhonorato marked this pull request as draft February 25, 2026 13:01
auto-merge was automatically disabled February 25, 2026 13:01

Pull request was converted to draft

@rvhonorato rvhonorato added the workflow All the general parts of HADDOCK3 not related to any module in particular label Feb 25, 2026
@rvhonorato rvhonorato marked this pull request as ready for review February 25, 2026 14:02
@rvhonorato rvhonorato enabled auto-merge February 25, 2026 14:07
amjjbonvin
amjjbonvin previously approved these changes Feb 26, 2026
Copy link
Copy Markdown
Member

@amjjbonvin amjjbonvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Works perfectly on my side

Would it be possible to also compile the prodrg exec for aarch64-linux?

@rvhonorato
Copy link
Copy Markdown
Member Author

added the binary for aarch64-linux and also updated the CI to run the tests suit on an linux arm machine

Copy link
Copy Markdown
Member

@amjjbonvin amjjbonvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very nice addition!
Works fine on my side - will should do some benchmarking on a protein-ligand dataset to check the performance (the shape docking set is a good one for this).

@rvhonorato rvhonorato merged commit 22d121f into main Feb 27, 2026
10 checks passed
@rvhonorato rvhonorato deleted the 1475-automated-toppar-generation-for-unknown-ligands branch February 27, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improving something in the codebase m|topoaa topoaa module workflow All the general parts of HADDOCK3 not related to any module in particular

Projects

None yet

Development

Successfully merging this pull request may close these issues.

automated toppar generation for unknown ligands

2 participants