Skip to content

merge MDL V2000/V3000 readers #952

@adalke

Description

@adalke

The CDK has a MDLV2000Reader and a MDLV3000Reader. If I want to parse a generic "MDL" record, I either have to figure out if it's in v2000 or v3000 format and dispatch to the appropriate reader, or I need to use IteratingSDFReader and let it handle the dispatching for me, though that then requires using the iterator API to get the first of what should only be one record.

Neither is hard, but it seems there should be just a single MDLReader which handles both formats.

Well, there is an MDLReader, but it says:

 * @deprecated This reader is only for molfiles without a version tag, typically the most
 *             common molfile now encountered is V2000 and the {@link MDLV2000Reader} should be used
 *             instead. The V2000 reader can actually read files missing the version tag when
 *             in relaxed mode.

and IteratingSDFReader says it only uses MDLV2000Reader or MDLV3000Reader:

For parsing the molecules in the
 * SD file, it uses the <code>MDLV2000Reader</code> or
 * <code>MDLV3000Reader</code> reader; it does <b>not</b> work
 * for SDF files with MDL formats prior to the V2000 format.

Though IteratingSDFReader's getReader if given a MDLFormat instead of MDLV2000Format or MDLV3000Format will return a MDLReader? And it only copies settings for V2000 formats? With commit bf07702 in Feb 2022 the V3000 format now has settings.

There's also an inconsistency with the V2000 and V3000 readers. The V2000 reader has a customizeJob(), which as I understand it was used to propagate settings back in the ancient days, while the V3000 reader doesn't have that method.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions