Skip to content

CDK RGroups#1260

Merged
egonw merged 5 commits intomainfrom
cdk-rgroups
Mar 2, 2026
Merged

CDK RGroups#1260
egonw merged 5 commits intomainfrom
cdk-rgroups

Conversation

@johnmay
Copy link
Copy Markdown
Member

@johnmay johnmay commented Mar 2, 2026

Following on from some patches last year to make things work easier in JCP this adds better RGroup support to the core library. Specifically like with reactions with now have the ability to flatten/unflatten an RGroupQuery structure so you can work with and pass it around like a normal molecule. This is handled by the new RGroupQueryManipulator. Ideally this would be located along with the other manipulators but the use of the Query APIs makes this tricky - perhaps something to target for CDK 3.0.

Two of the existing RGroup test files are not actually valid (tested with BIOVIA tools) and I have added a strict mode for that.

I have also added support for reading/writing from CXSMILES and doing the layout depiction logic - this is already live on SiMolecule / cdkdepict. This requires handling implied attachment points and disambiguating nesting. For example consider this structure:

*c1ncccc1 |$R1$,RG:_R1={C(=O)* |$;;R2$,RG:_R2={OC},{N}|},{C(=N)* |$;;R2$,RG:_R2={OCC},{C(F)(F)F}|}|

Note there are 2 conflicting definitions for R2 depending on which substituent is loaded. Without care we would accidentally load the structure like this:

image

Instead what we do is relabel one of the R2's to allow an unambiguous depiction:

image

The existing RGroupQuery does not really support nesting like this but the depiction/layout is at least correct.

One slight quirk is due to dependency/packaging I can't currently access the RGroupManipulator from the SMILES writer so you need to manually convert it before generating. I think that is acceptable for now:

String dir = "/Users/john/Workspace/GitHub/cdk/storage/ctab/src/test/resources/org/openscience/cdk/io/";
SmilesGenerator smigen = new SmilesGenerator(SmiFlavor.Default+SmiFlavor.UseAromaticSymbols);
for (String fname : new String[]{"rgfile.1.mol", "rgfile.2.mol",
                                 "rgfile.3.mol", "rgfile.4.mol",
                                 "rgfile.5.mol", "rgfile.6.mol",
                                 "rgfile.7.mol"}) {
    try (RGroupQueryReader rdr = new RGroupQueryReader(new FileReader(new File(dir, fname)))) {
        // rdr.setReaderMode(IChemObjectReader.Mode.STRICT);
        RGroupQuery query = new RGroupQuery(SilentChemObjectBuilder.getInstance());
        query = rdr.read(query);
        System.out.println(smigen.create(query) + "\t" + fname);
    }
}

For reference here is the CXSMILES for our 7 rgfile tests:

C1CCCP(*)C1 |$;;;;;R1$,RG:_R1={CN* |$;;_AP1$|},{C(Br)O* |$;;;_AP1$|},{C(C)S* |$;;;_AP1$|}|	rgfile.1.mol
*[PbH]1C(C(C(C1)*)*N2C=CC=C2)* |$R11;;;;;;R2;R1;;;;;;R2$,LO:7:8.3,RG:_R1={C(SO*)* |$;;;_AP2;_AP1$|},{N(C)(O*)* |$;;;_AP2;_AP1$|},_R2={C1(CCS1)* |$;;;;_AP1$|},{[SiH]1(CC1)* |$;;;_AP1$|},_R11={[Pt](C)* |$;;_AP1$|}|	rgfile.2.mol
[Po]1*[Te]C2C1C(S*O2)* |$;R1;;;;;;R1;;R$,LO:1:2.0,7:6.8,RG:_R1={C(C(=O)*)(=O)* |$;;;_AP2;;_AP1$|},{C(N(CC)*)(=O)* |$;;;;_AP2;;_AP1$|}|	rgfile.3.mol
C1C=CC=C1.* |$;;;;;R1$,RG:_R1={[NH+]#[C-]},{N(=O)=O}|	rgfile.4.mol
N1([P@@]([Si@H]([N+]([Ge@@H]([GeH](N([Pb](P([SiH]1*)*)=*)*)*)*)=*)*)*)* |$;;;;;;;;;;R1;R1;R3;R2;R1;R2;R3;R4;R2;R1$,r,RG:_R1={CF},{CO},{[Pb]=O},_R2={Br},{P},{S},_R3={[SiH4]},{[SnH2]},{[BiH3]},_R4={O},{Cl}|	rgfile.5.mol (SOFT ERROR - attachments not defined in RGfile)
*C1C(C(C(C1)*)*N2C=CC=C2)* |$R11;;;;;;R2;R1;;;;;;R2$,LO:7:8.3,RG:_R1=_R2=_R11={[Pt](C)* |$;;_AP1$|}|	rgfile.6.mol
C1=S([Dy]*)*[Bi]1*(Cl)N |$;;;R32;R32;;R32$,LO:4:1.5,6:5.7.8,RG:_R32={[P](Br)(O)(*)* |$;;;_AP1;_AP2$|},{[GeH](F)(*)* |$;;_AP1;_AP2$|}|	rgfile.7.mol (ERROR - R group has 1,2, or 3 connections)

johnmay added 5 commits March 2, 2026 11:59
…ccepted by BIOVIA's tools - both of these have inconsistencies between the parent/root structure and defined attachments. I agree these are wrong. Here I have added a verify check to strict reading mode which checks and reports on these errors.
…rtain operations easier and will greatly simplify JChemPaint since we work in flattened mode and then turn it into a real RgroupQuery as needed.
…ent points where needed and disambiguating nested definitions.
The StructureDiagramGenerator has been rewritten to have a convenience function for depiction a grid of molecules. For Markush structures we layout the root structure(s) first then the definitions below.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Mar 2, 2026

@egonw egonw merged commit 046a9c3 into main Mar 2, 2026
12 of 13 checks passed
@egonw
Copy link
Copy Markdown
Member

egonw commented Mar 2, 2026

Nice!

@johnmay johnmay deleted the cdk-rgroups branch March 3, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants