-
Notifications
You must be signed in to change notification settings - Fork 194
Convert specification to schema format #540
Description
A long-term goal of the specification could be to make almost all of its content into a machine-readable schema, to facilitate automated use of the specification in other packages (e.g., pybids and bids-validator), as well as to propagate small changes across the specification.
This is related to #423, #466, and #475. In #423, @dbkeator discusses work in extracting schema-like information from the specification, converting the terms to JSON-LD format, and linking BIDS terms to similar terms in other ontologies (see BIDS_Terms). Many elements of this work should also be incorporated into the actual specification, as it explicitly defines associations between terms within the specification. This should, in turn, make extracting relevant information from BIDS for other efforts, like BIDS_Terms, much easier. Initial work toward doing this conversion with the YAML format and specifically limited to the entity table has been done in #475.
Here are some initial goals for the conversion:
- Distinguish different object types, and organize them into different folders: entities, modalities, datatypes, suffixes, extensions, and metadata fields, at minimum.
- In cases where order is important (e.g., entities), organize or label the objects in a manner conducive to this.
- Define top level files and associated data folders (e.g.,
sourcedata/). - The inheritance principle, somehow.
- Explicitly require jsons.
- Define each of the objects, possibly in its own file, with the following fields:
- Name
- Definition
- Format/allowed values (for entities and metadata fields)
- Mutually exclusive objects (at least for metadata fields)
- Citation (at least for modalities)
- Link associated object types:
- Required and optional metadata for each datatype.
- Required and optional columns for tabular files.
- Required and optional entities for each datatype, broken down by groups of suffixes.
- Required and optional suffixes for each datatype, broken down by groups of suffixes.
- Code to automatically compile the Markdown and PDF versions of the schema.
- Objects should either have their own pages or their definitions should be duplicated in any sections where they're applicable. For example, the
runentity is currently defined once, under Anatomy imaging data under Magnetic Resonance Imaging, even though the entity applies to many other datatypes, and is generally referenced or briefly defined in those other datatypes' sections, without a link to the main definition. - The code-formatted templates should also be compilable.
- Objects should either have their own pages or their definitions should be duplicated in any sections where they're applicable. For example, the
- Code to minimally validate the specification, ensuring that all files for a given object type have the required fields. In cases where those fields are supposed to have specific values, the validation should check that those fields are correct in all files.
I think that only sections of the specification that wouldn't be described in the schema would be the Appendix pages, introduction pages/sections, Common Principles, and specific examples.
Open questions:
- JSON-LD vs. YAML
- I think JSON-LD is more standard for this type of thing, but I personally find it harder to work with than YAML.
- Duplication vs. modularization
- In [INFRA] Convert entity table to yaml #475, we've leaned toward minimizing duplication and placing information in larger files, in order to make it easier to make changes across files. When there's automated validation, this may be less of an issue since developers will be aware when they introduce breaking changes. Generally in schemas, it seems like each object gets its own file.
- Conversion of past versions of the specification
- @yarikoptic created bids-schema, where the schema can be made available across versions, so it's possible to convert old versions of the specification to the new schema format, although it would be a lot of work.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status