Skip to content

Commit 91e332d

Browse files
committed
Merge branch 'master' into enh/point_to_data
2 parents 4b0a18f + 9a79dff commit 91e332d

410 files changed

Lines changed: 4947 additions & 760 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

pdf_build_src/process_markdowns.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -445,6 +445,7 @@ def process_macros(duplicated_src_dir_path):
445445

446446
# Replace code snippets in the text with their outputs
447447
matches = re.findall("({{.*?}})", contents)
448+
matches = re.findall(re.compile("({{.*?}})", re.DOTALL), contents)
448449
for m in matches:
449450
# Remove macro delimiters to get *just* the function call
450451
function_string = m.strip("{} ")

src/02-common-principles.md

Lines changed: 31 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,13 @@ misunderstanding we clarify them here.
3232
context, a session may also indicate a group of related scans,
3333
taken in one or more visits.
3434

35+
1. **Sample** - a sample pertaining to a subject such as tissue, primary cell
36+
or cell-free sample.
37+
The `sample-<label>` key/value pair is used to distinguish between different
38+
samples from the same subject.
39+
The label MUST be unique per subject and is RECOMMENDED to be unique
40+
throughout the dataset.
41+
3542
1. **Data acquisition** - a continuous uninterrupted block of time during which
3643
a brain scanning instrument was acquiring data according to particular
3744
scanning sequence/protocol.
@@ -156,6 +163,15 @@ correspond to a unique identifier of that subject, such as `01`.
156163
The same holds for the `session` entity with its `ses-` key and its `<label>`
157164
value.
158165

166+
The extra session layer (at least one `/ses-<label>` subfolder) SHOULD
167+
be added for all subjects if at least one subject in the dataset has more than
168+
one session.
169+
If a `/ses-<label>` subfolder is included as part of the directory hierarchy,
170+
then the same [`ses-<label>`](./99-appendices/09-entities.md#ses)
171+
key/value pair MUST also be included as part of the file names themselves.
172+
Acquisition time of session can
173+
be defined in the [sessions file](03-modality-agnostic-files.md#sessions-file).
174+
159175
A chain of entities, followed by a suffix, connected by underscores (`_`)
160176
produces a human readable file name, such as `sub-01_task-rest_eeg.edf`.
161177
It is evident from the file name alone that the file contains resting state
@@ -352,7 +368,7 @@ then Case 1 will be assumed for clarity in templates and examples, but removing
352368
Case 2.
353369
In both cases, every derivatives dataset is considered a BIDS dataset and must
354370
include a `dataset_description.json` file at the root level (see
355-
[Dataset description][dataset-description].
371+
[Dataset description][dataset-description]).
356372
Consequently, files should be organized to comply with BIDS to the full extent
357373
possible (that is, unless explicitly contradicted for derivatives).
358374
Any subject-specific derivatives should be housed within each subject’s directory;
@@ -694,14 +710,19 @@ Note that if a field name included in the data dictionary matches a column name
694710
then that field MUST contain a description of the corresponding column,
695711
using an object containing the following fields:
696712

697-
| **Key name** | **Requirement level** | **Data type** | **Description** |
698-
| ------------ | --------------------- | --------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
699-
| LongName | OPTIONAL | [string][] | Long (unabbreviated) name of the column. |
700-
| Description | RECOMMENDED | [string][] | Description of the column. |
701-
| Levels | RECOMMENDED | [object][] of [strings][] | For categorical variables: An object of possible values (keys) and their descriptions (values). |
702-
| Units | RECOMMENDED | [string][] | Measurement units. SI units in CMIXF formatting are RECOMMENDED (see [Units](./02-common-principles.md#units)). |
703-
| TermURL | RECOMMENDED | [string][] | URL pointing to a formal definition of this type of data in an ontology available on the web. |
704-
| HED | OPTIONAL | [object][] of [strings][] or [string][] | Hierarchical Event Descriptor (HED) information, see: [Appendix III](./99-appendices/03-hed.md) for details. |
713+
{{ MACROS___make_metadata_table(
714+
{
715+
"LongName": "OPTIONAL",
716+
"Description": (
717+
"RECOMMENDED",
718+
"The description of the column.",
719+
),
720+
"Levels": "RECOMMENDED",
721+
"Units": "RECOMMENDED",
722+
"TermURL": "RECOMMENDED",
723+
"HED": "OPTIONAL",
724+
}
725+
) }}
705726

706727
Please note that while both `Units` and `Levels` are RECOMMENDED, typically only one
707728
of these two fields would be specified for describing a single TSV file column.
@@ -890,6 +911,7 @@ individual files see descriptions in the next section:
890911

891912
```Text
892913
sub-control01/
914+
sub-control01_scans.tsv
893915
anat/
894916
sub-control01_T1w.nii.gz
895917
sub-control01_T1w.json
@@ -910,7 +932,6 @@ sub-control01/
910932
sub-control01_phasediff.nii.gz
911933
sub-control01_phasediff.json
912934
sub-control01_magnitude1.nii.gz
913-
sub-control01_scans.tsv
914935
code/
915936
deface.py
916937
derivatives/
@@ -944,12 +965,6 @@ to suppress warnings or provide interpretations of your file names.
944965

945966
[derived-dataset-description]: 03-modality-agnostic-files.md#derived-dataset-and-pipeline-description
946967

947-
[string]: https://www.w3schools.com/js/js_json_syntax.asp
948-
949-
[strings]: https://www.w3schools.com/js/js_json_syntax.asp
950-
951-
[object]: https://www.json.org/json-en.html
952-
953968
[deprecated]: ./02-common-principles.md#definitions
954969

955970
[uris]: ./02-common-principles.md#uniform-resource-indicator

src/03-modality-agnostic-files.md

Lines changed: 125 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -14,21 +14,23 @@ Templates:
1414
The file `dataset_description.json` is a JSON file describing the dataset.
1515
Every dataset MUST include this file with the following fields:
1616

17-
| **Key name** | **Requirement level** | **Data type** | **Description** |
18-
|--------------------|------------------------------------|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
19-
| Name | REQUIRED | [string][] | Name of the dataset. |
20-
| BIDSVersion | REQUIRED | [string][] | The version of the BIDS standard that was used. |
21-
| HEDVersion | RECOMMENDED | [string][] | If HED tags are used: The version of the HED schema used to validate HED tags for study. |
22-
| DatasetLinks | REQUIRED if [BIDS URIs][] are used | [object][] of [uris][] | Used to map a given `<dataset-name>` from a [BIDS URI][] of the form `bids:<dataset-name>:/absolute/path/within/dataset` to a local or remote location. The `<dataset-name>`: `local` is a reserved keyword that MUST NOT be a key in `DatasetLinks` |
23-
| DatasetType | RECOMMENDED | [string][] | The interpretation of the dataset. MUST be one of `"raw"` or `"derivative"`. For backwards compatibility, the default value is `"raw"`. |
24-
| License | RECOMMENDED | [string][] | The license for the dataset. The use of license name abbreviations is RECOMMENDED for specifying a license (see [Appendix II](./99-appendices/02-licenses.md)). The corresponding full license text MAY be specified in an additional `LICENSE` file. |
25-
| Authors | OPTIONAL | [array][] of [strings][] | List of individuals who contributed to the creation/curation of the dataset. |
26-
| Acknowledgements | OPTIONAL | [string][] | Text acknowledging contributions of individuals or institutions beyond those listed in Authors or Funding. |
27-
| HowToAcknowledge | OPTIONAL | [string][] | Text containing instructions on how researchers using this dataset should acknowledge the original authors. This field can also be used to define a publication that should be cited in publications that use the dataset. |
28-
| Funding | OPTIONAL | [array][] of [strings][] | List of sources of funding (grant numbers). |
29-
| EthicsApprovals | OPTIONAL | [array][] of [strings][] | List of ethics committee approvals of the research protocols and/or protocol identifiers. |
30-
| ReferencesAndLinks | OPTIONAL | [array][] of [strings][] | List of references to publications that contain information on the dataset. A reference may be textual or a [URI][uri]. |
31-
| DatasetDOI | OPTIONAL | [string][] | The Digital Object Identifier of the dataset (not the corresponding paper). DOIs SHOULD be expressed as a valid [URI][uri]; bare DOIs such as `10.0.2.3/dfjj.10` are [DEPRECATED][deprecated]. |
17+
{{ MACROS___make_metadata_table(
18+
{
19+
"Name": "REQUIRED",
20+
"BIDSVersion": "REQUIRED",
21+
"HEDVersion": "RECOMMENDED",
22+
"DatasetLinks": "REQUIRED if [BIDS URIs][] are used",
23+
"DatasetType": "RECOMMENDED",
24+
"License": "RECOMMENDED",
25+
"Authors": "OPTIONAL",
26+
"Acknowledgements": "OPTIONAL",
27+
"HowToAcknowledge": "OPTIONAL",
28+
"Funding": "OPTIONAL",
29+
"EthicsApprovals": "OPTIONAL",
30+
"ReferencesAndLinks": "OPTIONAL",
31+
"DatasetDOI": "OPTIONAL",
32+
}
33+
) }}
3234

3335
Example:
3436

@@ -70,10 +72,12 @@ In addition to the keys for raw BIDS datasets,
7072
derived BIDS datasets include the following REQUIRED and RECOMMENDED
7173
`dataset_description.json` keys:
7274

73-
| **Key name** | **Requirement level** | **Data type** | **Description** |
74-
|----------------|-----------------------|--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
75-
| GeneratedBy | REQUIRED | [array][] of [objects][] | Used to specify provenance of the derived dataset. See table below for contents of each object. |
76-
| SourceDatasets | RECOMMENDED | [array][] of [objects][] | Used to specify the locations and relevant attributes of all source datasets. Valid keys in each object include `URL`, `DOI` (see [URI][uri]), and `Version` with [string][] values. |
75+
{{ MACROS___make_metadata_table(
76+
{
77+
"GeneratedBy": "REQUIRED",
78+
"SourceDatasets": "RECOMMENDED",
79+
}
80+
) }}
7781

7882
Each object in the `GeneratedBy` list includes the following REQUIRED, RECOMMENDED
7983
and OPTIONAL keys:
@@ -252,13 +256,80 @@ to date of birth.
252256
}
253257
```
254258

259+
## Samples file
260+
261+
Template:
262+
263+
```Text
264+
samples.tsv
265+
samples.json
266+
```
267+
268+
The purpose of this file is to describe properties of samples, indicated by the `sample` entity.
269+
This file is REQUIRED if `sample-<label>` is present in any file name within the dataset.
270+
If this file exists, it MUST contain the three following columns:
271+
272+
- `sample_id`: MUST consist of `sample-<label>` values identifying one row
273+
for each sample
274+
275+
- `participant_id`: MUST consist of `sub-<label>`
276+
277+
- `sample_type`: MUST consist of sample type values, either `cell line`, `in vitro differentiated cells`,
278+
`primary cell`, `cell-free sample`, `cloning host`, `tissue`, `whole organisms`, `organoid` or
279+
`technical sample` from [ENCODE Biosample Type](https://www.encodeproject.org/profiles/biosample_type)
280+
281+
Other optional columns MAY be used to describe the samples.
282+
Each sample MUST be described by one and only one row.
283+
284+
Commonly used *optional* columns in `samples.tsv` files are `pathology` and
285+
`derived_from`. We RECOMMEND to make use of these columns, and in case that
286+
you do use them, we RECOMMEND to use the following values for them:
287+
288+
- `pathology`: string value describing the pathology of the sample or type of control.
289+
When different from `healthy`, pathology SHOULD be specified in `samples.tsv`.
290+
The pathology MAY instead be specified in [Sessions files](./03-modality-agnostic-files.md#sessions-file)
291+
in case it changes over time.
292+
293+
- `derived_from`: `sample-<label>` key/value pair from which a sample is derived from,
294+
for example a slice of tissue (`sample-02`) derived from a block of tissue (`sample-01`),
295+
as illustrated in the example below.
296+
297+
`samples.tsv` example:
298+
299+
```Text
300+
sample_id participant_id sample_type derived_from
301+
sample-01 sub-01 tissue n/a
302+
sample-02 sub-01 tissue sample-01
303+
sample-03 sub-01 tissue sample-01
304+
sample-04 sub-02 tissue n/a
305+
sample-05 sub-02 tissue n/a
306+
```
307+
308+
It is RECOMMENDED to accompany each `samples.tsv` file with a sidecar
309+
`samples.json` file to describe the TSV column names and properties of their values
310+
(see also the [section on tabular files](02-common-principles.md#tabular-files)).
311+
312+
`samples.json` example:
313+
314+
```JSON
315+
{
316+
"sample_type": {
317+
"Description": "type of sample from ENCODE Biosample Type (https://www.encodeproject.org/profiles/biosample_type)",
318+
},
319+
"derived_from": {
320+
"Description": "sample_id from which the sample is derived"
321+
}
322+
}
323+
```
324+
255325
## Phenotypic and assessment data
256326

257327
Template:
258328

259329
```Text
260-
phenotype/<measurement_tool_name>.tsv
261-
phenotype/<measurement_tool_name>.json
330+
phenotype/
331+
<measurement_tool_name>.tsv
332+
<measurement_tool_name>.json
262333
```
263334

264335
Optional: Yes
@@ -330,9 +401,10 @@ questionnaire).
330401
Template:
331402

332403
```Text
333-
sub-<label>/[ses-<label>/]
334-
sub-<label>[_ses-<label>]_scans.tsv
335-
sub-<label>[_ses-<label>]_scans.json
404+
sub-<label>/
405+
[ses-<label>/]
406+
sub-<label>[_ses-<label>]_scans.tsv
407+
sub-<label>[_ses-<label>]_scans.json
336408
```
337409

338410
Optional: Yes
@@ -380,6 +452,33 @@ meg/sub-control01_task-rest_split-01_meg.nii.gz 1877-06-15T12:15:27
380452
meg/sub-control01_task-rest_split-02_meg.nii.gz 1877-06-15T12:15:27
381453
```
382454

455+
## Sessions file
456+
457+
Template:
458+
459+
```Text
460+
sub-<label>/
461+
sub-<label>_sessions.tsv
462+
```
463+
464+
Optional: Yes
465+
466+
In case of multiple sessions there is an option of adding additional
467+
`sessions.tsv` files describing variables changing between sessions.
468+
In such case one file per participant SHOULD be added.
469+
These files MUST include a `session_id` column and describe each session by one and only one row.
470+
Column names in `sessions.tsv` files MUST be different from group level participant key column names in the
471+
[`participants.tsv` file](./03-modality-agnostic-files.md#participants-file).
472+
473+
`_sessions.tsv` example:
474+
475+
```Text
476+
session_id acq_time systolic_blood_pressure
477+
ses-predrug 2009-06-15T13:45:30 120
478+
ses-postdrug 2009-06-16T13:45:30 100
479+
ses-followup 2009-06-17T13:45:30 110
480+
```
481+
383482
## Code
384483

385484
Template: `code/*`
@@ -399,15 +498,11 @@ code organization of these scripts at the moment.
399498

400499
[bids uris]: ./02-common-principles.md#bids-uri-pointing-to-files-within-and-outside-of-bids-datasets
401500

402-
[objects]: https://www.json.org/json-en.html
403-
404501
[object]: https://www.json.org/json-en.html
405502

406-
[string]: https://www.w3schools.com/js/js_json_syntax.asp
407-
408-
[strings]: https://www.w3schools.com/js/js_json_syntax.asp
503+
[objects]: https://www.json.org/json-en.html
409504

410-
[array]: https://www.w3schools.com/js/js_json_arrays.asp
505+
[string]: https://www.w3schools.com/js/js_json_syntax.asp
411506

412507
[uri]: ./02-common-principles.md#uniform-resource-indicator
413508

0 commit comments

Comments
 (0)