Distilled in discussion with @dyf. They
- defined clear schema for various aspects of the experiment in pydantic models: https://github.com/AllenNeuralDynamics/aind-data-schema which is used across all experiments done at https://github.com/AllenNeuralDynamics/
- data is first grouped not at a level of a study, but at a level of the recording session
- in case of multimodal sessions separation is done also at that level across datatypes (or modalities)
- it is done so to facilitate QA/data analytics -- makes it easier to share, not to wait for entire dataset to be collected, etc and potentially include in different studies which might be interested in different subportions of the dataset
- heavily relies and benefits on hierarchical metadata structure within .json files
The ideas we touched upon:
TODOs:
Distilled in discussion with @dyf. They
The ideas we touched upon:
AIND might want to adopt BIDS principle in naming to use
<entity>-<value>[_<entity>-<value>][_<datatype>]instead of "hidden semantic" for naming folders and files, as the initial stepflexible hierarchy (Make it possible to specify folders layout to be other than sub-{label}/[ses-{label}/] #54) to allow e.g. for present/absent modality subfolder, or session subfolder
Allow composition of a BIDS dataset (dataset level) from smaller (subj or subj/ses) level #59
relates to inheritance principle: common metadata could still be inherited from the top folder, or top folder could provide a "unique summary" over metadata like we do in heudiconv (example: http://datasets.datalad.org/?dir=/dbic/QA)
extended/alternative metadata (schema) support: (TODO: file/link to an issue) to allow in .json (sidecar) file a more extended schema-driven record with clear association on what that schema is (may be we would all use linkml.io at some point). There it could be hierarchical
TODOs: