Skip to content

Utilities for selecting a substructure of a molecule.#889

Merged
egonw merged 15 commits intomasterfrom
extraSubstructure
Sep 10, 2022
Merged

Utilities for selecting a substructure of a molecule.#889
egonw merged 15 commits intomasterfrom
extraSubstructure

Conversation

@johnmay
Copy link
Copy Markdown
Member

@johnmay johnmay commented Sep 5, 2022

Draft: needs tests.

@uli-f asked on the mailing list about this, makes sense to have the util functions to do it.

@uli-f
Copy link
Copy Markdown
Member

uli-f commented Sep 5, 2022

I haven't worked (enough) with stereo chemistry in the CDK to say anything useful about your change to base/core/src/main/java/org/openscience/cdk/stereo/AbstractStereo.java.

When it comes to your changes in base/standard/src/main/java/org/openscience/cdk/tools/manipulator/AtomContainerManipulator.java:

  • Overall: love it. Especially all the method references, very easy to read :)
  • I don't know the method signature of other copy operations in CDK, but I somehow find the order dest, source counter-intuitive. Java itself does source, dest - at least in System.arraycopy.
  • There are a few typos in the javadocs (dest'ination).

@johnmay Do you want me to provide test cases for this?

@johnmay
Copy link
Copy Markdown
Member Author

johnmay commented Sep 5, 2022

Ah it's because I'm a C programmer :-) memcpy(dest, src) makes most sense to me because it's like an assignment, dest = src.

Yeah I was putting the implementation out there to show how it could be, feel free to write some tests if not I'll do it at some point.

@uli-f
Copy link
Copy Markdown
Member

uli-f commented Sep 7, 2022

Just started on writing up a test case or two for the added methods and came across the method

public static IAtomContainer extractSubstructure(IAtomContainer atomContainer, int... atomIndices)

in AtomContainerManipulator which seems a bit sub-optimal when looking at the implementation:

  • I don't overly trust the clone method from Java, it is quite easy to get this wrong (shallow clone, deep clone)
  • it seems quite inefficient on several levels (removing bonds one by one instead of all at once, binary search for every single atom index...)

Given the new method copy it might be best to remove the implementation and just call the method copy instead.

@uli-f
Copy link
Copy Markdown
Member

uli-f commented Sep 7, 2022

@johnmay Regarding your implementation of copy:

How meaningful is it to carry over the bond properties to the subgraph:

destBond.setIsInRing(bond.isInRing());
destBond.setIsAromatic(bond.isAromatic());
destBond.setProperties(bond.getProperties());

The properties isInRing and isAromatic might not be true for the subgraph anymore.

It probably depends on the use case, i.e., query substructure or just a regular, non-query substructure.

uli-f and others added 2 commits September 7, 2022 17:15
- added a first very simply test for the copy method in AtomContainerManipulatorTest
- improved javadocs for new copy method in AtomContainerManipulator
@johnmay
Copy link
Copy Markdown
Member Author

johnmay commented Sep 7, 2022

The existing extractSubstructure can be deprecated, I would prefer to not break downstream code. It is technically possible to reimplement in terms of the existing method which would be a little more efficient.

The properties isInRing and isAromatic might not be true for the subgraph anymore.

Correct but the job here is to copy everything and make no assumptions/changes. If the uses wants to clear aromaticity/ring membership they can do that. I think it is useful to keep these in place and mirrors the addition of atoms (i.e. the atoms carry over these properties). I think technically the isInRing/Aromatic are implemented by the flags (which we should also copy) but this may not be the case in future.

@uli-f
Copy link
Copy Markdown
Member

uli-f commented Sep 7, 2022

Wrote a first simple test, PR submitted.

It tried testing it with AtomContainer and with AtomContainer2, both fail.

AtomContainer

AtomContainer has an issue with getting the indices of the atoms that participate in a bond in this line:

dest.addBond(beg.getIndex(), end.getIndex(), bond.getOrder(), bond.getStereo());

I fully understand why this is happening but wasn't quite sure how much work we should put in getting this working with the old AtomContainer 🤷🏼

AtomContainer2

Interestingly enough, AtomContainer2 gives a different exception:

java.lang.ClassCastException: class org.openscience.cdk.Bond cannot be cast to class org.openscience.cdk.interfaces.IAtom (org.openscience.cdk.Bond and org.openscience.cdk.interfaces.IAtom are in unnamed module of loader 'app')

on this line

IAtom beg = (IAtom) remap.get(bond.getBegin());

The reason for this is the put operation within the for loop:

remap.put(beg, destBond);

which - depending on the atom container - may replace (atom, atom) pairs with (atom, bond) pairs in the Map<IChemObject, IChemObject>.

A possible solution might be to put the (atom, bond) pairs in a separate map and then fuse the (atom,atom) and (atom,bond) maps before calling base/core/src/main/java/org/openscience/cdk/stereo/AbstractStereo.java#mapStrict(remap).

…bstructure(IAtomContainer,Collection) instead

- put an ignore on the method to test extractSubstructure(IAtomContainer,int[]) in AtomContainerManipulatorTest
@uli-f
Copy link
Copy Markdown
Member

uli-f commented Sep 7, 2022

The existing extractSubstructure can be deprecated, I would prefer to not break downstream code. It is technically possible to reimplement in terms of the existing method which would be a little more efficient.

I'd second both marking the existing method extractSubstructure as deprecated and re-implement it in terms of the existing method. Opened PR #891

Correct but the job here is to copy everything and make no assumptions/changes. If the uses wants to clear aromaticity/ring membership they can do that. I think it is useful to keep these in place and mirrors the addition of atoms (i.e. the atoms carry over these properties). I think technically the isInRing/Aromatic are implemented by the flags (which we should also copy) but this may not be the case in future.

Agreed. Just wanted to raise the issue, but I am fine with that.

@johnmay
Copy link
Copy Markdown
Member Author

johnmay commented Sep 7, 2022

Thanks should be fixed now, AtomContainer2 is going to become AtomContainer - it just to give people time to adjust we have the option to use the old one but the builder (correct) always gives you a new one. It is probably time to make that rename, given your code the other day I expect a lot of people are creating them wrong and so haven't really had a chance to see the changes that happens (i.e. enforcing you can't add a bond to atoms which don't belong to the molecule).

@johnmay
Copy link
Copy Markdown
Member Author

johnmay commented Sep 7, 2022

Tweaked some things, maybe do want to keep the redundant exception as it is part of the signature.

@uli-f
Copy link
Copy Markdown
Member

uli-f commented Sep 8, 2022

Tweaked some things, maybe do want to keep the redundant exception as it is part of the signature.

Yes, I kept the throws CloneNotSupportedException on purpose as removing it would require changes to downstream code.

…ntainer,Predicate,Predicate)

- added more tests for the copy method to AtomContainerManipulatorTest
@uli-f
Copy link
Copy Markdown
Member

uli-f commented Sep 8, 2022

Added more tests, fixed a bug, all in #893

Thanks for fixing the order of actual, expected in the tests.
I haven't used Junit Asserts in quite a while, switched to AssertJ and am really happy with it.

@johnmay Happy for you to make the decision whether to bring back the CloneNotSupportedException to extractSubstructure(IAtomContainer, int[]); I'd opt for it.

- fixed a bug in AtomContainerManipulator.copy(IAtomContainer,IAtomCo…
@johnmay
Copy link
Copy Markdown
Member Author

johnmay commented Sep 8, 2022

I quite like the assertThat (blog post I wrote on it: http://efficientbits.blogspot.com/2013/04/assert-that-assert-this.html) but they've deprecated and you now have to use the one that comes in from JUnit's Hamcrest dependency.

@uli-f
Copy link
Copy Markdown
Member

uli-f commented Sep 9, 2022

I quite like the assertThat (blog post I wrote on it: http://efficientbits.blogspot.com/2013/04/assert-that-assert-this.html) but they've deprecated and you now have to use the one that comes in from JUnit's Hamcrest dependency.

Thanks for pointing that out.

I spend a few mins to migrate the tests I've written to Hamcrest's assertThat to get a first impression of how that lib works.

PR #895

re-wrote tests for methods copy and extractSubstructure based on Hamc…
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Sep 9, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@johnmay johnmay marked this pull request as ready for review September 9, 2022 16:05
@johnmay
Copy link
Copy Markdown
Member Author

johnmay commented Sep 9, 2022

OK to merge @egonw ?

@egonw
Copy link
Copy Markdown
Member

egonw commented Sep 9, 2022

Yes, I was waiting for it to be finished, but it looked good

@egonw egonw merged commit c7578d1 into master Sep 10, 2022
@johnmay johnmay deleted the extraSubstructure branch September 10, 2022 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants