While I have written a lot of text in various locations regarding core decisions that need to be made regarding the definitions of filesystem paths for DWI derivatives, they may be too verbose or DWI-specific and therefore not be appropriate for widespread community engagement.
It is my intention to first post what I believe to be the viable solutions to these issues. Others are free to comment and even make alternative suggestions. Once the set of viable solutions is established, I will then construct polls to evaluate the degree of community consensus.
The example
We have a hypothetical DWI model called ABC. This model is represented using parameters X and Y. X and Y are of fundamentally different data types, such that it is not possible to store both in a single NIfTI image, and they must be split across multiple images.
For metadata, there is information that is relevant to model ABC as a whole, and there is additionally information that is specific to parameter X and parameter Y separately.
Following fitting of the model to the empirical data, it is possible to derive from X and Y another parameter of interest Z. This may in and of itself require metadata to explain how it was calculated.
Decision 1: Directory structure
(For the sake of discussion of directory structure, I will assume the existence of a new entity with key "model", and two new suffixes: "model", and "mdp" (model-derived parameter). This corresponds to decision 2, option 1 "few suffixes", but is used for demonstrative purposes in the context of decision 1 only, and the two decisions should be considered independent)
Option 1: "Complex inheritance"
sub-01/
dwi/
sub-01_model-abc_param-x_model.nii.gz
sub-01_model-abc_param-x_model.json
sub-01_model-abc_param-y_model.nii.gz
sub-01_model-abc_param-y_model.json
sub-01_model-abc_param-z_mdp.nii.gz
sub-01_model-abc_param-z_mdp.json
sub-01_model-abc_model.json
Advantages:
- No change to BIDS filesystem structure
- Metadata relevant to ABC as a whole is centralised
- Generalisation of inheritance principle has wider applicability
- Supports even more advanced use cases. Eg. one could have, within a single model, multiple components, each of which has associated sidecar information; then, each of those components may themselves have multiple parameters necessitating their own individual sidecar information.
Disadvantages:
Option 2: "No inheritance"
sub-01/
dwi/
sub-01_model-abc_param-x_model.nii.gz
sub-01_model-abc_param-x_model.json
sub-01_model-abc_param-y_model.nii.gz
sub-01_model-abc_param-y_model.json
sub-01_model-abc_param-z_mdp.nii.gz
sub-01_model-abc_param-z_mdp.json
Advantages:
- No changes to specification whatsoever
Disadvantages:
- Information regarding fit of model ABC (eg. fitting parameters, software references, publication URL) must be duplicated across multiple JSONs; any downstream application that needs to know these contents would ideally need to explicitly compare these contents across JSONs to verify consistency.
- Same as option 1 in that number of files within any given directory could grow very large.
Option 3: Directory hierarchy
sub-01/
dwi/
sub-01_model-abc_model/
sub-01_model-abc_param-x_model.nii.gz
sub-01_model-abc_param-x_model.json
sub-01_model-abc_param-y_model.nii.gz
sub-01_model-abc_param-y_model.json
sub-01_model-abc_param-z_mdp.nii.gz
sub-01_model-abc_param-z_mdp.json
sub-01_model-abc_model.json
Advantages:
-
Natural exploitation of hierarchical nature of filesystem to reflect hierarchical nature of model data
-
Sets precedent for expanding modality directories to include sub-directories, which is a core component of TRX for tractography data and will therefore be requisite in the future
Disadvantages:
-
Requires modification of specification to permit sub-directories within modality directories
-
Breaks current implicit convention whereby sub-directory names don't bother duplicating entities corresponding to parents (eg. "sub-01/ses-01/dwi/"), whereas file names do (eg. "sub-01_ses-01_dwi.nii.gz"). This is impossible to resolve as long as the JSON file and corresponding sub-directory must have the same name.
Option 4: Tarballs
Option 4a: Tarball with separate JSON
sub-01/
dwi/
sub-01_model-abc_model.tar
sub-01_model-abc_model.json
Contents of file sub-01_model-abc_model.tar:
sub-01_model-abc_param-x_model.nii.gz
sub-01_model-abc_param-x_model.json
sub-01_model-abc_param-y_model.nii.gz
sub-01_model-abc_param-y_model.json
sub-01_model-abc_param-z_mdp.nii.gz
sub-01_model-abc_param-z_mdp.json
Advantages:
- Very compact storage; multi-resolution view of data
- Tarballing is also an appealing solution for integration of non-conforming derivatives in a way that is trivially validator-compatible
Disadvantages:
-
BIDS Apps would need to have capability to work with tarballs (eg. unpacking and storing in scratch prior to feeding to underlying commands)
-
Model-derived parameters cannot be trivially added alongside the core model parameters.
PS. Apparently there's been a prior discussion regarding tarballing of non-conforming derivatives in BIDS datasets; can anyone provide a link?
Option 4b: Tarball with embedded json
sub-01/
dwi/
sub-01_model-abc_model.tar
Contents of file sub-01_model-abc_model.tar:
sub-01_model-abc_param-x_model.nii.gz
sub-01_model-abc_param-x_model.json
sub-01_model-abc_param-y_model.nii.gz
sub-01_model-abc_param-y_model.json
sub-01_model-abc_param-z_mdp.nii.gz
sub-01_model-abc_param-z_mdp.json
sub-01_model-abc_model.json
Advantages (relative to 4a):
- Prevents potentially risky separation of model data in tarball and model sidecar data in JSON (similar to the pre-NIfTI Analyze
.img / .hdr file pairs)
Disadvantages (relative to 4a):
Option 5: Hierarchy restricted to JSON
sub-01/
dwi/
sub-01_model-abc_param-x_model.nii.gz
sub-01_model-abc_param-y_model.nii.gz
sub-01_model-abc_param-z_mdp.nii.gz
sub-01_model-abc_model.json
Contents of file sub-01_model-abc_model.json:
{
"param-x_model": {
...
},
"param-y_model": {
...
},
"param-z_mdp": {
...
},
"ModelURL": "...",
....
}
Advantages:
Disadvantages:
-
Necessitates explicit cross-referencing between general model JSON and individual parameter files
-
If model-derived parameter is to be added, metadata relating to that parameter needs to be inserted into the whole-model JSON
-
Metadata specific to one parameter is not immediately visible via a paired JSON
Decision 2: File names
(Note that for the sake of these examples, decision 1 option 1 "complex inheritance" is utilised; this is however purely for the sake of generation of examples, and the two decisions should be considered independent)
Option 1: "Few suffixes"
sub-01/
dwi/
sub-01_model-abc_param-x_model.nii.gz
sub-01_model-abc_param-x_model.json
sub-01_model-abc_param-y_model.nii.gz
sub-01_model-abc_param-y_model.json
sub-01_model-abc_param-z_mdp.nii.gz
sub-01_model-abc_param-z_mdp.json
sub-01_model-abc_model.json
"MDP": "Model-derived parameter" (exact nomenclature can be up for debate)
Advantages:
- Validator does not need to have a large number of novel suffixes added
- Easy to store yet-unseen models with BIDS conformity, provided the appropriate data representations are in the specification
Disadvantages:
Option 2: "Many suffixes"
sub-01/
dwi/
sub-01_model-abc_x.nii.gz
sub-01_model-abc_x.json
sub-01_model-abc_y.nii.gz
sub-01_model-abc_y.json
sub-01_model-abc_z.nii.gz
sub-01_model-abc_z.json
sub-01_model-abc_model.json
Advantages:
- Information content of individual files easily human-readable from suffix
Disadvantages:
- Data from model ABC can only be stored with BIDS compatibility if model ABC is explicitly added to the specification, and the validator is updated accordingly
- Appropriate filesystem path for parameter-agnostic metadata (ie.
sub-01_model-xyz_model.json above) is uncertain (and could depend on decision 1 RE: directory structure)
While I have written a lot of text in various locations regarding core decisions that need to be made regarding the definitions of filesystem paths for DWI derivatives, they may be too verbose or DWI-specific and therefore not be appropriate for widespread community engagement.
It is my intention to first post what I believe to be the viable solutions to these issues. Others are free to comment and even make alternative suggestions. Once the set of viable solutions is established, I will then construct polls to evaluate the degree of community consensus.
The example
We have a hypothetical DWI model called ABC. This model is represented using parameters X and Y. X and Y are of fundamentally different data types, such that it is not possible to store both in a single NIfTI image, and they must be split across multiple images.
For metadata, there is information that is relevant to model ABC as a whole, and there is additionally information that is specific to parameter X and parameter Y separately.
Following fitting of the model to the empirical data, it is possible to derive from X and Y another parameter of interest Z. This may in and of itself require metadata to explain how it was calculated.
Decision 1: Directory structure
(For the sake of discussion of directory structure, I will assume the existence of a new entity with key "model", and two new suffixes: "model", and "mdp" (model-derived parameter). This corresponds to decision 2, option 1 "few suffixes", but is used for demonstrative purposes in the context of decision 1 only, and the two decisions should be considered independent)
Option 1: "Complex inheritance"
Advantages:
Disadvantages:
Option 2: "No inheritance"
Advantages:
Disadvantages:
Option 3: Directory hierarchy
Advantages:
Natural exploitation of hierarchical nature of filesystem to reflect hierarchical nature of model data
Sets precedent for expanding modality directories to include sub-directories, which is a core component of TRX for tractography data and will therefore be requisite in the future
Disadvantages:
Requires modification of specification to permit sub-directories within modality directories
Breaks current implicit convention whereby sub-directory names don't bother duplicating entities corresponding to parents (eg. "
sub-01/ses-01/dwi/"), whereas file names do (eg. "sub-01_ses-01_dwi.nii.gz"). This is impossible to resolve as long as the JSON file and corresponding sub-directory must have the same name.Option 4: Tarballs
Option 4a: Tarball with separate JSON
Contents of file
sub-01_model-abc_model.tar:Advantages:
Disadvantages:
BIDS Apps would need to have capability to work with tarballs (eg. unpacking and storing in scratch prior to feeding to underlying commands)
Model-derived parameters cannot be trivially added alongside the core model parameters.
PS. Apparently there's been a prior discussion regarding tarballing of non-conforming derivatives in BIDS datasets; can anyone provide a link?
Option 4b: Tarball with embedded json
Contents of file
sub-01_model-abc_model.tar:Advantages (relative to 4a):
.img/.hdrfile pairs)Disadvantages (relative to 4a):
Primary model sidecar information is not accessible without going into the tarball
Still requires more complex inheritance principle in a way; just it only applies to the contents of the tarball
Option 5: Hierarchy restricted to JSON
Contents of file sub-01_model-abc_model.json:
Advantages:
No complex inheritance necessary
All information relevant to a model is visible within a single file
Disadvantages:
Necessitates explicit cross-referencing between general model JSON and individual parameter files
If model-derived parameter is to be added, metadata relating to that parameter needs to be inserted into the whole-model JSON
Metadata specific to one parameter is not immediately visible via a paired JSON
Decision 2: File names
(Note that for the sake of these examples, decision 1 option 1 "complex inheritance" is utilised; this is however purely for the sake of generation of examples, and the two decisions should be considered independent)
Option 1: "Few suffixes"
"MDP": "Model-derived parameter" (exact nomenclature can be up for debate)
Advantages:
Disadvantages:
Option 2: "Many suffixes"
Advantages:
Disadvantages:
sub-01_model-xyz_model.jsonabove) is uncertain (and could depend on decision 1 RE: directory structure)