-
Notifications
You must be signed in to change notification settings - Fork 194
Add sample entity and samples.tsv file #779
Description
Context and motivation
Hi BIDS community!
As part of the development of the Microscopy BEP (BEP031), we want to add a new sample entity to BIDS. This sample entity was introduced in order to distinguish different tissue samples from the same subject.
The sample entity may also be used by the Animal Ephys BEP (BEP032 @SylvainTakerkart) and could benefit other modalities as well.
This issue aims to start a discussion about the details of the sample entity between the 2 BEP groups and with the BIDS community. It will also facilitate the breaking down of BEPs in smaller modules by adding the sample entity as a separate PR.
Definition of the sample entity
To ensure compatibility with BIDS other modalities, the subject entity should correspond to the participant (e.g. a human, a mouse, etc). To identify multiple tissue samples from the same subject, we define the sample entity in BEP031 as:
A tissue sample, volume or slice pertaining to a subject.
It is positioned after the optional session entity in the filename:
sub-<label>[_ses-<label>]_sample-<label>_<modality_suffix>.<ext>
samples.tsv file
In BEP031, a samples.tsv file was added at the root of the dataset along with participants.tsv.
The samples.tsv file would have 2 required columns:
sample_id: corresponding tosample-<label>of the filenameparticipant_id: corresponding tosub-<label>of the filename
Another column sample_type was also suggested as required:
sample_type: kind of sample from ENCODE BiosampleType
We should also discuss if (and how) we want to encode an additional identifier when a sample is derived from another sample (e.g., a slice is derived from a block of tissue).
participants.tsv file
As part of the subject vs. sample definitions, we would also like to add 2 columns to the participants.tsv file:
-
species: string corresponding to the Binomial species name from NCBI Taxonomy, required when different from “Homo sapiens”
We think species should be in participants.tsv and not samples.tsv as it is an attribute of the subject and not the sample. -
pathology: required when different from “Healthy”
In that case, pathology could be in either participants.tsv or samples.tsv as appropriate (e.g. healthy and non-healthy biopsy samples from the same subject).
Examples
File hierarchy and naming:
├── dataset_description.json
├── participants.json
├── participants.tsv
├── samples.json
├── samples.tsv
├── sub-rat1
│ └── microscopy
│ ├── sub-rat1_sample-data1_SEM.json
│ └── sub-rat1_sample-data1_SEM.png
├── sub-rat2
│ └── microscopy
│ ├── sub-rat2_sample-data5_SEM.json
│ └── sub-rat2_sample-data5_SEM.png
├── sub-rat3
│ └── microscopy
│ ├── sub-rat3_sample-data10_SEM.json
│ ├── sub-rat3_sample-data10_SEM.png
│ ├── sub-rat3_sample-data11_SEM.json
│ ├── sub-rat3_sample-data11_SEM.png
│ ├── sub-rat3_sample-data9_SEM.json
│ └── sub-rat3_sample-data9_SEM.png
participants.tsv:
| participant_id | species |
|---|---|
| sub-rat1 | Rattus norvegicus |
| sub-rat2 | Rattus norvegicus |
| sub-rat3 | Rattus norvegicus |
samples.tsv:
| sample_id | participant_id | sample_type |
|---|---|---|
| sample-data1 | sub-rat1 | tissue |
| sample-data5 | sub-rat2 | tissue |
| sample-data9 | sub-rat3 | tissue |
| sample-data10 | sub-rat3 | tissue |
| sample-data11 | sub-rat3 | tissue |