You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A new API has been added to simplify hydrogen representation normalisation. In the past we had the ability to suppress all hydrogens, make them all explicit or remove all non-chiral hydrogens. We now have a single function which does all of this, the caller specifies what you want and it will add/remove hydrogens as needed keeping all internal state correct and in sync.
We already had some basic support for R Group Query files but this was not tied in nicely to the main toolkit. This release adds the ability to work with these files easier as well as support in CXSMILES. It is now possible to load a CXSMILES with R groups and depict it correctly.
Note: the link node is now also parsed and stored.
Variable Attachment Matching
Added support for matching variable attachments specified via CXSMARTS.
Query:
n1ccccc1.*Cl.*Br |m:6:0.1.2.3.4.5,8:0.1.2.3.4.5|
Hits:
Atropisomers (via RDKit extension)
We have supported atropisomers for some time but only via 2D coordinates with layouts and CIP naming etc working as expected. RDKit has an extension to allow loading from CXSMILES - although it is not perfect I was at a loss to think of something better and so it makes sense to support it. Currently this is read-only and mainly intended for depiction.
A new API has been added to allow setting of wedge bonds to a structure where the coordinates are already known. Note: This function is called automatically when a layout is generated.
The new AtomContainerimplementation is now the default after a gradual introduction. You can still use the old implementation but you must explicitly create an AtomContainerLegacy. This should be a seamless change for most but please notify if you have an unexpected error.
SMIRKS support with the ability to approximate other implementations (inc. Daylight and RDKit Reaction Smarts). It includes convenience APIs for applying a transform to all places at once (i.e. dt_xapply) and efficient support for hydrogen handling (explicit hydrogen are not required on the input). Overall the speed it good and a transform can be run over all of ChEMBL 35 in only ~30 seconds (see Appendix A1).
Faster ring membership and aromaticity assignment. The move to AtomContainer2 (see above) allows additional optimizations to these algorithms. The APIs will run faster however for aromaticity you must use Cycles.all() on it's own. There is also a new static method for convenience and improved aromatic model encoding.
// new way, no checked exceptionCycles.markRingAtomsAndBonds(molecule); // prerequisite if (!Aromaticity.apply(Aromaticity.Model.Daylight, molecule)) {
// return false = too many cycles to check
}
// old way (will still be faster)Aromaticityaromaticity = newAromaticity(ElectronDonation.daylight(),
Cycles.all());
IAtomContainercontainer = ...;
try {
if (aromaticity.apply(molecule)) {
//
}
} catch (CDKExceptione) {
// cycle computation was intractable
}
Improved inorganic stereochemistry
It is now possible to represent degenerate inorganic stereochemistry where one or more neighbours are missing/implicit. For example, we can describe a square pyramidal structure as an octahedral without a missing ligand. Support for implicit/explicit hydrogens around theses atoms has also been improved.
The API allows you generate the functional groups as fragments or my favorite which is fill an array with identifier numbers - this is then very easy to depict.
IChemObjectBuilderbldr = SilentChemObjectBuilder.getInstance();
SmilesParsersmipar = newSmilesParser(bldr);
Stringsmiles = "C2C(NC)=NC3=C(C(C1=CC=CC=C1)=N2=O)C=C(Cl)C=C3";
IAtomContainermol = smipar.parseSmiles(smiles);
FunctionalGroupsFinderfgFinder = FunctionalGroupsFinder.withNoEnvironment();
Cycles.markRingAtomsAndBonds(mol);
Aromaticity.apply(Aromaticity.Model.Daylight, mol);
// extract the groups as new fragmentsList<IAtomContainer> functionalGroupsList = fgFinder.extract(mol);
// fill an array with numbers that indicate which functional group something belongs toint[] fgrps = newint[mol.getAtomCount()];
fgFinder.find(fgrps, mol);
// Set the group as the atom map/class in SMILESfor (IAtomatom : mol.atoms())
atom.setMapIdx(1+fgrps[atom.getIndex()]);
System.out.println(newSmilesGenerator(SmiFlavor.AtomAtomMap).create(mol));
The Sugar Removal Utility (SRU) implements a generalized algorithm for automated detection of circular and linear sugars in molecular structures and their removal.
Creating atoms/bonds in the context of molecules with: mol.newAtom() and mol.newBond() and others.
Better IO error handling
Contributors
55 Egon Willighagen
8 Felix Bänsch
3 Jean Marois
245 John Mayfield
43 Jonas Schaub
2 Matthias Mailänder
5 Tyler Peryea
123 Uli Fechner
3 Valentyn Kolesnikov
3 Stefan Kuhn
Overview of Pull Requests
SonarCloud is not reporting test coverage correctly because it was no… by @johnmay in #1000
Improved the abbreviation handling over atom sets, this is useful for… by @johnmay in #996
Fix - avoid placing a wedge on the right-angled bond when a centre is… by @johnmay in #998
Quality of life API interfaces. The IAtomContainerSet and IReaction c… by @johnmay in #997
Sonar settings for aggregated test coverage. by @johnmay in #1001
An new entry point to the SMILES parser has been added to parse into a "multi-step" reaction where by the product of one step is the reactant the the next. The basic idea is to allow more than two '>'. Parts at even positions are reactants/products and odd positions are agents/catalysts/solvents.
The DepictionGenerator has been extended to depict reaction sets. If the product of the previous reaction is the same as the reactant in the next (object identity) it is omitted for a terser depiction:
More correct PubChemFingerprinter
Explicit hydrogens are not longer required and there is an option to use a more correct ring set definition matching closer the original CACTVS substructure keys. This is now on by default:
IChemObject builder = SilentChemObjectBuilder.getInstance();
new PubchemFingerprinter(builder); // new - default is to use "ESSSR-like" ring set
new PubchemFingerprinter(builder, false); // old - for backwards compatible with FP generated with older CDK versions
The InChI now supports > 999 atoms, we have the option to generate a SMILES using the InChI canonical labelling, it makes sense to use the larger molecules flag and support more.
Maygen is pure Java, if you need more speed consider Surge by the same author.
New Smallest Ring utilities for single atom/bond
if (Cycles.smallRingSize(atom, 7) != 0) {
// atom is in a ring 7 or smaller
}
if (Cycles.smallRingSize(bond, 7) != 0) {
// bond is in a ring 7 or smaller
}
Where possible "Re-inflate" convex rings on cyclcophanes:
Before: now:
New substructure/copy utility that allows a whole or part of a structure to be copied. Atoms are bonds are selected by providing a predicate:
IAtomContainerdst = builder.newAtomContainer();
AtomContainerManipulator.copy(dst, src, a -> a.isInRing(), b -> b.isInRing()); // select the cyclic part of a molecule// select atoms in a set, the bonds will also be selectedSet<IAtom> subset = ...
AtomContainerManipulator.copy(dst, src, a -> subset.contains(a));
New exclusive atoms filter that provides non-overlapping substructure matches, note the input order can determine which matches are selected.
for (int[] mapping : Pattern.findSubstructure(query).matchAll(mol).exclusiveAtoms()) {
// ...
}
Stereo perception corner-cases. Reject: , , ok:
Summary
Merged all PRs and resolved all open issues related to bugs
InChINumbersTools: Use JNA InChI options by @johnmay in #799
Fix issue with hose code nesting by @johnmay in #828
Authors
278 John Mayfield
13 Egon Willighagen
11 Uli Fechner
5 Mark Williamson
3 Valentyn Kolesnikov
2 MehmetAzizYirik
2 Marco Foscato
1 dependabot[bot]
1 Tim Dudgeon
1 Otto Brinkhaus
1 Christoph Steinbeck
1 Alex
This page documents the changes for CDK v2.7 and v2.7.1. The patch version was made after some minor issues with how the new InChI code was organised were discovered by downstream projects.
There are two main technologies for calling native code JNI (Java Native Interface) and JNA (Java Native Access). JNI requires writing a custom native wrapper which is then bound to Java code, JNA allows you to call the native methods of an existing SO/DYLIB directly. Essentially what this means is to expose the native InChI library in Java one needs to first write (and maintain) a native wrapper, with JNA we can just drop the InChI SO directly in. JNI InChI exposed InChI v1.03 and worked well for many years - unfortunately this project was no longer maintained and as newer more stable versions of InChI were released (now v1.06) an alternative was needed. A few years ago Daniel Lowe started JNA InChI and recently made it feature complete and released v1.0.
ChemAxon have also independently used the JNA path to integrated newer InChI libraries into their tools: (slides). It is not clear if this was made available, it is not listed on GitHub/ChemAxon.
Build on Java 17
The Maven plugins were updated to allow building on Java 17
Verify declared dependencies
The maven modules were checked for unused declared dependencies and used undeclared dependencies (mvn dependency:analyze).
Organise and restructure test-jar and testdata
CDK was originally built with the ant build tool, under this scheme there was a jar for the main/ code and one the test/ code. Test modules could share an inherit dependencies. To replicate this in maven we install and deploy "test-jar" artefacts. The project test code was restructured to put all common test code in the "cdk-test" module.
All test data was stored in a cdk-testdata module, this data has now been relocated to the test/resources of each module where it is used. This meant some data was duplicated but means the ~18MB test-jar no longer needs to be uplodaded to maven central.
Remove Guava dependency
We have removed the use of Guava, the functionality could mostly be directly replaced with newer JDK idioms (Function/Predicate/Stream) which were not available in the past.
Use XorShift PRNG in ShortestPathFingerprinter (different fingerprint)
Commons Math3 was used in a single place to hash paths (Mersenne Twister) in the ShortestPathFingerprinter. Since this fingerprint method is not widely used and the hashes do not need to be cryptographically secure a simple https://en.wikipedia.org/wiki/Xorshift random generate is now used instead. This allows us to remove the dependency on Commons Math3. This does mean the fingerprint bits have changed, note the CDK version description is accessible via the Fingerprinter.getVersionDescription() method.
Authors
137 John Mayfield
6 Egon Willighagen
1 dependabot[bot]
This is the most questionable change but believe to be a bug in InChI 1.03. Using JNI INCHI setting the chiral flag = on or off we get "rA:9n..." without it we get ""rA:9...". In JNA INCHI we always get "rA:9n..." - this molecule is not chiral so it seems odd that the setting would change anything. Since this is only a change in AuxInfo this is acceptable. John Mayfield on 2021-12-18
Some minor version/scope cleanups. Hamcrest should be a test dependency. Make sure we pull in Log4j2 2.17.0. In QSARCML log4j-core should only be in the tests John Mayfield on 2021-12-22
We don't need the Log4J 1.2 API in these locations - unfortunately it still comes in via CMLXOM and JENA but better for now. John Mayfield on 2021-12-22
Cleanup of the cdk/base modules - using dependency analzye to ensure all used undelcared dependencies are included and unused declared are removed. John Mayfield on 2021-12-22
JENA-CORE pulls in a very specific version XML-APIS. There may well be a conflict but a fix should be to leave it as a transient dependency in cdk-io John Mayfield on 2021-12-22