Add InChI AuxInfo to structure functionality;#1208
Conversation
|
Please can you remove the add H option and MDL valence setting. Unless they are there or implied by the input they should not be added. Specifically you should not be use MDLValence it is non-public for a reason. |
|
Yes, of course, I used the parameters that ared used and available in the jna-inchi library. |
|
It is not public because it should not be used outside of where it is. There are multiple valence models that depends on the format being used. e.g. SMILES has an implicit valence model which is different to MDL and MOL2 (Tripos) a different one again. Even worse is MDL changed their valence model a few years ago (e.g. Na used to have 1 hydrogen by default but is now 0). Basically these things are unstable and so when you get implicit hydrogens like from InChI all you can do is guess. It is not correct apply the MDL valence model blindly. I think the correct behaviour is for -1 you set null. Then if a user wants to set hydrogen counts they can use the CDK atom type matcher as is intended. @egonw sound ok? |
|
I completely understand your concerns. I decided to use the MDL valence model, since the 'native' InChI input format is the mol file. Just to avoid any ambiguity: do you mean 'null' as in |
|
Null as in null, |
|
I should also clarify my view and that of the original InChI autors (Igor V. Pletnev and Dmitrii V. Tchekhovskoi) that you should not read InChI's as input. It is a method for linking information and not to be used for primary storage. It is an identifier not a representation. Not all InChI's can be read back in. Alas the project now has new directions and that has probably been forgotten/ignored so users will do it/want to do it anyways but it is strictly bad practice and as all sorts of issues - for example unknown hydrogen counts. |
|
You're preaching to the choir with that opinion. An identifier is an identifier and not a representation. This reverse conversion does not make the further development of InChI any easier, as care must always be taken to ensure that the structure can be regenerated from an InChI. With further development towards molecular inorganics, this will become even more interesting. I will add a passage to the Javadoc along the lines of: ‘Use at your own risk’. |
|
Looks good, happy to merge but just musing if it makes more sense to fold this into the InChIToStructure class as a method? i.e. avoid adding the new public Class AuxInfoToStructure. Might need a bit of rewiring but the code paths should be the same just if you come from the InChI or the AuxInfo? |
| @@ -0,0 +1,125 @@ | |||
| package org.openscience.cdk.inchi; | |||
There was a problem hiding this comment.
Please add a copyright header here.
| * and Implicit Hydrogens taking liberties with valence</a> | ||
| */ | ||
| static int implicitValence(int elem, int q, int val) { | ||
| static int implicitValence(int elem, int q, int val) { |
| @@ -0,0 +1,355 @@ | |||
| package org.openscience.cdk.inchi; | |||
There was a problem hiding this comment.
Please add the missing copyright header.
There was a problem hiding this comment.
Please make sure to also copy the original copyright info from the file this was copied from, Copyright (C) 2006-2007 Sam Adams <sea36@users.sf.net>, I believe.
| import java.util.List; | ||
| import java.util.Map; | ||
|
|
||
| final class InChIOutputToStructure { |
There was a problem hiding this comment.
Should this not have class JavaDoc?
| @@ -0,0 +1,109 @@ | |||
| package org.openscience.cdk.inchi; | |||
There was a problem hiding this comment.
Please add the missing copyright header.
| /** | ||
| * Test case for {@link AuxInfoToStructure} class. | ||
| */ | ||
| public class AuxInfoToStructureTest extends CDKTestCase { |
There was a problem hiding this comment.
The extends CDKTestCase is historic. If you do not use any of the functionality of the class, you can leave this out.
| void testGetAtomContainer_IChemObjectBuilder() throws CDKException { | ||
| InChIToStructure parser = new InChIToStructure("InChI=1S/CH5/h1H4", DefaultChemObjectBuilder.getInstance()); | ||
| parser.generateAtomContainerFromInchi(SilentChemObjectBuilder.getInstance()); | ||
| InChIToStructure parser = new InChIToStructure("InChI=1S/CH5/h1H4", SilentChemObjectBuilder.getInstance()); |
Yeah, that sounds right. Assumptions should always be explicit and as much as possible left to the user, instead of hardcode that in the CDK. That has indeed caused problems. |
@johnmay: I had the same idea. From the InChI side it would make sense, but coming from the jna-inchi lib the methods are separated. And I did not want to touch that much in the exisiting code base. But if you like, I can give it a try, so there is only one class that converts both. @egonw: Thanks for the review. |
Happy with whatever you think makes most sense. Personally I think adding a new method to InChIToStructure is cleaner but will need a little more refactoring. Now you have splitout the JNA output processing it shouldn't be too bad - I am happy to do it after merging if needed. I always thought it was a bit odd you use the InChI generator to get the inchi to structure, |
|
@johnmay If you are fine with it, I will refactor it the way you've described it above. I also think it is cleaner and more intuitive. I was wondering if you've got any plans for when a new release is scheduled? |
|
@johnmay Should I remove the methods in the |
|
Super, do you use the chiral flag anywhere? The way I (and BIOVIA) handle this is to read in the molecule with "AND" stereo groups set. Might be worth considering but I am OK to merge this - @egonw all good? |
|
Yes. I think we had code to read it in the past too. So, let's see what happens |
|
The chiral flag is as far as I know just a flag to show if the structure was marked as chiral or not in the InChI. The corresponding line of code from the InChI code: So far the enhanced stereochemistry is not implemented in the InChI. |
I noticed that the ‘AuxInfo to structure’ function was missing. So I implemented it, adapted the existing code to avoid duplicates, and added tests.
Please let me know what you think.