Skip to content

Add InChI AuxInfo to structure functionality;#1208

Merged
egonw merged 6 commits intocdk:mainfrom
fbaensch-beilstein:auxInfoToStructure
Aug 20, 2025
Merged

Add InChI AuxInfo to structure functionality;#1208
egonw merged 6 commits intocdk:mainfrom
fbaensch-beilstein:auxInfoToStructure

Conversation

@fbaensch-beilstein
Copy link
Copy Markdown
Contributor

I noticed that the ‘AuxInfo to structure’ function was missing. So I implemented it, adapted the existing code to avoid duplicates, and added tests.
Please let me know what you think.

@johnmay
Copy link
Copy Markdown
Member

johnmay commented Aug 7, 2025

Please can you remove the add H option and MDL valence setting. Unless they are there or implied by the input they should not be added. Specifically you should not be use MDLValence it is non-public for a reason.

@fbaensch-beilstein
Copy link
Copy Markdown
Contributor Author

Yes, of course, I used the parameters that ared used and available in the jna-inchi library.
Could you please explain why MDLValence is not public?
In some cases, the ‘implicit hydrogen parameter’ returned by the jna-inchi library and the InChI API is ‘-1’, which is described as "has the special meaning of deriving the hydrogen number from the valence". That is why I used MDLValence, as this value is also used internally in InChI. What would you recommend in such cases?

@johnmay
Copy link
Copy Markdown
Member

johnmay commented Aug 8, 2025

It is not public because it should not be used outside of where it is.

There are multiple valence models that depends on the format being used. e.g. SMILES has an implicit valence model which is different to MDL and MOL2 (Tripos) a different one again. Even worse is MDL changed their valence model a few years ago (e.g. Na used to have 1 hydrogen by default but is now 0). Basically these things are unstable and so when you get implicit hydrogens like from InChI all you can do is guess. It is not correct apply the MDL valence model blindly.

I think the correct behaviour is for -1 you set null. Then if a user wants to set hydrogen counts they can use the CDK atom type matcher as is intended. @egonw sound ok?

@fbaensch-beilstein
Copy link
Copy Markdown
Contributor Author

I completely understand your concerns. I decided to use the MDL valence model, since the 'native' InChI input format is the mol file.

Just to avoid any ambiguity: do you mean 'null' as in null or 0?

@johnmay
Copy link
Copy Markdown
Member

johnmay commented Aug 9, 2025

Null as in null, atom.setImplicitHydrogenCount(null); this indicates unknown information vs knowing the h count is 0 like how InChI uses -1

@johnmay
Copy link
Copy Markdown
Member

johnmay commented Aug 9, 2025

I should also clarify my view and that of the original InChI autors (Igor V. Pletnev and Dmitrii V. Tchekhovskoi) that you should not read InChI's as input. It is a method for linking information and not to be used for primary storage. It is an identifier not a representation. Not all InChI's can be read back in.

Alas the project now has new directions and that has probably been forgotten/ignored so users will do it/want to do it anyways but it is strictly bad practice and as all sorts of issues - for example unknown hydrogen counts.

@egonw egonw self-assigned this Aug 11, 2025
@fbaensch-beilstein
Copy link
Copy Markdown
Contributor Author

You're preaching to the choir with that opinion. An identifier is an identifier and not a representation. This reverse conversion does not make the further development of InChI any easier, as care must always be taken to ensure that the structure can be regenerated from an InChI. With further development towards molecular inorganics, this will become even more interesting.

I will add a passage to the Javadoc along the lines of: ‘Use at your own risk’.

@johnmay
Copy link
Copy Markdown
Member

johnmay commented Aug 11, 2025

Looks good, happy to merge but just musing if it makes more sense to fold this into the InChIToStructure class as a method? i.e. avoid adding the new public Class AuxInfoToStructure.

Might need a bit of rewiring but the code paths should be the same just if you come from the InChI or the AuxInfo?

@@ -0,0 +1,125 @@
package org.openscience.cdk.inchi;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a copyright header here.

* and Implicit Hydrogens taking liberties with valence</a>
*/
static int implicitValence(int elem, int q, int val) {
static int implicitValence(int elem, int q, int val) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, remove this space.

@@ -0,0 +1,355 @@
package org.openscience.cdk.inchi;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the missing copyright header.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to also copy the original copyright info from the file this was copied from, Copyright (C) 2006-2007 Sam Adams <sea36@users.sf.net>, I believe.

import java.util.List;
import java.util.Map;

final class InChIOutputToStructure {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not have class JavaDoc?

@@ -0,0 +1,109 @@
package org.openscience.cdk.inchi;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the missing copyright header.

/**
* Test case for {@link AuxInfoToStructure} class.
*/
public class AuxInfoToStructureTest extends CDKTestCase {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extends CDKTestCase is historic. If you do not use any of the functionality of the class, you can leave this out.

void testGetAtomContainer_IChemObjectBuilder() throws CDKException {
InChIToStructure parser = new InChIToStructure("InChI=1S/CH5/h1H4", DefaultChemObjectBuilder.getInstance());
parser.generateAtomContainerFromInchi(SilentChemObjectBuilder.getInstance());
InChIToStructure parser = new InChIToStructure("InChI=1S/CH5/h1H4", SilentChemObjectBuilder.getInstance());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@egonw
Copy link
Copy Markdown
Member

egonw commented Aug 11, 2025

@egonw sound ok?

Yeah, that sounds right. Assumptions should always be explicit and as much as possible left to the user, instead of hardcode that in the CDK. That has indeed caused problems.

@egonw egonw assigned fbaensch-beilstein and unassigned egonw Aug 11, 2025
@fbaensch-beilstein
Copy link
Copy Markdown
Contributor Author

Looks good, happy to merge but just musing if it makes more sense to fold this into the InChIToStructure class as a method? i.e. avoid adding the new public Class AuxInfoToStructure.

Might need a bit of rewiring but the code paths should be the same just if you come from the InChI or the AuxInfo?

@johnmay: I had the same idea. From the InChI side it would make sense, but coming from the jna-inchi lib the methods are separated. And I did not want to touch that much in the exisiting code base. But if you like, I can give it a try, so there is only one class that converts both.

@egonw: Thanks for the review.

@johnmay
Copy link
Copy Markdown
Member

johnmay commented Aug 12, 2025

Looks good, happy to merge but just musing if it makes more sense to fold this into the InChIToStructure class as a method? i.e. avoid adding the new public Class AuxInfoToStructure.
Might need a bit of rewiring but the code paths should be the same just if you come from the InChI or the AuxInfo?

@johnmay: I had the same idea. From the InChI side it would make sense, but coming from the jna-inchi lib the methods are separated. And I did not want to touch that much in the exisiting code base. But if you like, I can give it a try, so there is only one class that converts both.

@egonw: Thanks for the review.

Happy with whatever you think makes most sense. Personally I think adding a new method to InChIToStructure is cleaner but will need a little more refactoring. Now you have splitout the JNA output processing it shouldn't be too bad - I am happy to do it after merging if needed.

I always thought it was a bit odd you use the InChI generator to get the inchi to structure,
InChIGeneratorFactory. It doesn't really need a factory class and this would be cleaner:

public static InChIToStructure InChIToStructure::fromInChI(String inchi, IChemObjectBuilder builder, InChIOptions opts);
public static InChIToStructure InChIToStructure::fromAuxInfo(String auxinfo, IChemObjectBuilder builder, InChIOptions opts);

@fbaensch-beilstein
Copy link
Copy Markdown
Contributor Author

fbaensch-beilstein commented Aug 13, 2025

@johnmay If you are fine with it, I will refactor it the way you've described it above. I also think it is cleaner and more intuitive.

I was wondering if you've got any plans for when a new release is scheduled?

@fbaensch-beilstein
Copy link
Copy Markdown
Contributor Author

fbaensch-beilstein commented Aug 19, 2025

@johnmay Should I remove the methods in the InChIGeneratorFactory or mark as deprecated?

@johnmay
Copy link
Copy Markdown
Member

johnmay commented Aug 19, 2025

Super, do you use the chiral flag anywhere? The way I (and BIOVIA) handle this is to read in the molecule with "AND" stereo groups set. Might be worth considering but I am OK to merge this - @egonw all good?

@egonw
Copy link
Copy Markdown
Member

egonw commented Aug 19, 2025

Yes. I think we had code to read it in the past too. So, let's see what happens

@egonw egonw self-requested a review August 20, 2025 04:37
@egonw egonw merged commit bbda1cd into cdk:main Aug 20, 2025
6 of 7 checks passed
@fbaensch-beilstein
Copy link
Copy Markdown
Contributor Author

The chiral flag is as far as I know just a flag to show if the structure was marked as chiral or not in the InChI. The corresponding line of code from the InChI code:
int bChiral; /* 1 => the structure was marked as chiral, 2=> not chiral, 0=> not marked */

So far the enhanced stereochemistry is not implemented in the InChI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants