Skip to content

Decisions on BIDS derivatives structure #50

@Lestropie

Description

@Lestropie

While I have written a lot of text in various locations regarding core decisions that need to be made regarding the definitions of filesystem paths for DWI derivatives, they may be too verbose or DWI-specific and therefore not be appropriate for widespread community engagement.

It is my intention to first post what I believe to be the viable solutions to these issues. Others are free to comment and even make alternative suggestions. Once the set of viable solutions is established, I will then construct polls to evaluate the degree of community consensus.

The example

We have a hypothetical DWI model called ABC. This model is represented using parameters X and Y. X and Y are of fundamentally different data types, such that it is not possible to store both in a single NIfTI image, and they must be split across multiple images.

For metadata, there is information that is relevant to model ABC as a whole, and there is additionally information that is specific to parameter X and parameter Y separately.

Following fitting of the model to the empirical data, it is possible to derive from X and Y another parameter of interest Z. This may in and of itself require metadata to explain how it was calculated.

Decision 1: Directory structure

(For the sake of discussion of directory structure, I will assume the existence of a new entity with key "model", and two new suffixes: "model", and "mdp" (model-derived parameter). This corresponds to decision 2, option 1 "few suffixes", but is used for demonstrative purposes in the context of decision 1 only, and the two decisions should be considered independent)

Option 1: "Complex inheritance"

sub-01/
    dwi/
        sub-01_model-abc_param-x_model.nii.gz
        sub-01_model-abc_param-x_model.json
        sub-01_model-abc_param-y_model.nii.gz
        sub-01_model-abc_param-y_model.json
        sub-01_model-abc_param-z_mdp.nii.gz
        sub-01_model-abc_param-z_mdp.json
        sub-01_model-abc_model.json

Advantages:

  • No change to BIDS filesystem structure
  • Metadata relevant to ABC as a whole is centralised
  • Generalisation of inheritance principle has wider applicability
  • Supports even more advanced use cases. Eg. one could have, within a single model, multiple components, each of which has associated sidecar information; then, each of those components may themselves have multiple parameters necessitating their own individual sidecar information.

Disadvantages:

Option 2: "No inheritance"

sub-01/
    dwi/
        sub-01_model-abc_param-x_model.nii.gz
        sub-01_model-abc_param-x_model.json
        sub-01_model-abc_param-y_model.nii.gz
        sub-01_model-abc_param-y_model.json
        sub-01_model-abc_param-z_mdp.nii.gz
        sub-01_model-abc_param-z_mdp.json

Advantages:

  • No changes to specification whatsoever

Disadvantages:

  • Information regarding fit of model ABC (eg. fitting parameters, software references, publication URL) must be duplicated across multiple JSONs; any downstream application that needs to know these contents would ideally need to explicitly compare these contents across JSONs to verify consistency.
  • Same as option 1 in that number of files within any given directory could grow very large.

Option 3: Directory hierarchy

sub-01/
    dwi/
        sub-01_model-abc_model/
            sub-01_model-abc_param-x_model.nii.gz
            sub-01_model-abc_param-x_model.json
            sub-01_model-abc_param-y_model.nii.gz
            sub-01_model-abc_param-y_model.json
            sub-01_model-abc_param-z_mdp.nii.gz
            sub-01_model-abc_param-z_mdp.json
        sub-01_model-abc_model.json

Advantages:

  • Natural exploitation of hierarchical nature of filesystem to reflect hierarchical nature of model data

  • Sets precedent for expanding modality directories to include sub-directories, which is a core component of TRX for tractography data and will therefore be requisite in the future

Disadvantages:

  • Requires modification of specification to permit sub-directories within modality directories

  • Breaks current implicit convention whereby sub-directory names don't bother duplicating entities corresponding to parents (eg. "sub-01/ses-01/dwi/"), whereas file names do (eg. "sub-01_ses-01_dwi.nii.gz"). This is impossible to resolve as long as the JSON file and corresponding sub-directory must have the same name.

Option 4: Tarballs

Option 4a: Tarball with separate JSON

sub-01/
    dwi/
        sub-01_model-abc_model.tar
        sub-01_model-abc_model.json

Contents of file sub-01_model-abc_model.tar:

sub-01_model-abc_param-x_model.nii.gz
sub-01_model-abc_param-x_model.json
sub-01_model-abc_param-y_model.nii.gz
sub-01_model-abc_param-y_model.json
sub-01_model-abc_param-z_mdp.nii.gz
sub-01_model-abc_param-z_mdp.json

Advantages:

  • Very compact storage; multi-resolution view of data
  • Tarballing is also an appealing solution for integration of non-conforming derivatives in a way that is trivially validator-compatible

Disadvantages:

  • BIDS Apps would need to have capability to work with tarballs (eg. unpacking and storing in scratch prior to feeding to underlying commands)

  • Model-derived parameters cannot be trivially added alongside the core model parameters.

PS. Apparently there's been a prior discussion regarding tarballing of non-conforming derivatives in BIDS datasets; can anyone provide a link?

Option 4b: Tarball with embedded json

sub-01/
    dwi/
        sub-01_model-abc_model.tar

Contents of file sub-01_model-abc_model.tar:

sub-01_model-abc_param-x_model.nii.gz
sub-01_model-abc_param-x_model.json
sub-01_model-abc_param-y_model.nii.gz
sub-01_model-abc_param-y_model.json
sub-01_model-abc_param-z_mdp.nii.gz
sub-01_model-abc_param-z_mdp.json
sub-01_model-abc_model.json

Advantages (relative to 4a):

  • Prevents potentially risky separation of model data in tarball and model sidecar data in JSON (similar to the pre-NIfTI Analyze .img / .hdr file pairs)

Disadvantages (relative to 4a):

  • Primary model sidecar information is not accessible without going into the tarball

  • Still requires more complex inheritance principle in a way; just it only applies to the contents of the tarball

Option 5: Hierarchy restricted to JSON

sub-01/
    dwi/
        sub-01_model-abc_param-x_model.nii.gz
        sub-01_model-abc_param-y_model.nii.gz
        sub-01_model-abc_param-z_mdp.nii.gz
        sub-01_model-abc_model.json

Contents of file sub-01_model-abc_model.json:

{
    "param-x_model": {
        ...
    },
    "param-y_model": {
        ...
    },
    "param-z_mdp": {
        ...
    },
    "ModelURL": "...",
    ....
}

Advantages:

  • No complex inheritance necessary

  • All information relevant to a model is visible within a single file

Disadvantages:

  • Necessitates explicit cross-referencing between general model JSON and individual parameter files

  • If model-derived parameter is to be added, metadata relating to that parameter needs to be inserted into the whole-model JSON

  • Metadata specific to one parameter is not immediately visible via a paired JSON


Decision 2: File names

(Note that for the sake of these examples, decision 1 option 1 "complex inheritance" is utilised; this is however purely for the sake of generation of examples, and the two decisions should be considered independent)

Option 1: "Few suffixes"

sub-01/
    dwi/
        sub-01_model-abc_param-x_model.nii.gz
        sub-01_model-abc_param-x_model.json
        sub-01_model-abc_param-y_model.nii.gz
        sub-01_model-abc_param-y_model.json
        sub-01_model-abc_param-z_mdp.nii.gz
        sub-01_model-abc_param-z_mdp.json
        sub-01_model-abc_model.json

"MDP": "Model-derived parameter" (exact nomenclature can be up for debate)

Advantages:

  • Validator does not need to have a large number of novel suffixes added
  • Easy to store yet-unseen models with BIDS conformity, provided the appropriate data representations are in the specification

Disadvantages:

  • Not as human-readable

Option 2: "Many suffixes"

sub-01/
    dwi/
        sub-01_model-abc_x.nii.gz
        sub-01_model-abc_x.json
        sub-01_model-abc_y.nii.gz
        sub-01_model-abc_y.json
        sub-01_model-abc_z.nii.gz
        sub-01_model-abc_z.json
        sub-01_model-abc_model.json

Advantages:

  • Information content of individual files easily human-readable from suffix

Disadvantages:

  • Data from model ABC can only be stored with BIDS compatibility if model ABC is explicitly added to the specification, and the validator is updated accordingly
  • Appropriate filesystem path for parameter-agnostic metadata (ie. sub-01_model-xyz_model.json above) is uncertain (and could depend on decision 1 RE: directory structure)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions