Skip to content

Add sample entity and samples.tsv file #779

@mariehbourget

Description

@mariehbourget

Context and motivation

Hi BIDS community!

As part of the development of the Microscopy BEP (BEP031), we want to add a new sample entity to BIDS. This sample entity was introduced in order to distinguish different tissue samples from the same subject.

The sample entity may also be used by the Animal Ephys BEP (BEP032 @SylvainTakerkart) and could benefit other modalities as well.

This issue aims to start a discussion about the details of the sample entity between the 2 BEP groups and with the BIDS community. It will also facilitate the breaking down of BEPs in smaller modules by adding the sample entity as a separate PR.

Definition of the sample entity

To ensure compatibility with BIDS other modalities, the subject entity should correspond to the participant (e.g. a human, a mouse, etc). To identify multiple tissue samples from the same subject, we define the sample entity in BEP031 as:

A tissue sample, volume or slice pertaining to a subject.

It is positioned after the optional session entity in the filename:

sub-<label>[_ses-<label>]_sample-<label>_<modality_suffix>.<ext>

samples.tsv file

In BEP031, a samples.tsv file was added at the root of the dataset along with participants.tsv.

The samples.tsv file would have 2 required columns:

  • sample_id: corresponding to sample-<label> of the filename
  • participant_id: corresponding to sub-<label> of the filename

Another column sample_type was also suggested as required:

We should also discuss if (and how) we want to encode an additional identifier when a sample is derived from another sample (e.g., a slice is derived from a block of tissue).

participants.tsv file

As part of the subject vs. sample definitions, we would also like to add 2 columns to the participants.tsv file:

  1. species: string corresponding to the Binomial species name from NCBI Taxonomy, required when different from “Homo sapiens”
    We think species should be in participants.tsv and not samples.tsv as it is an attribute of the subject and not the sample.

  2. pathology: required when different from “Healthy”
    In that case, pathology could be in either participants.tsv or samples.tsv as appropriate (e.g. healthy and non-healthy biopsy samples from the same subject).

Examples

File hierarchy and naming:

├── dataset_description.json
├── participants.json
├── participants.tsv
├── samples.json
├── samples.tsv
├── sub-rat1
│   └── microscopy
│       ├── sub-rat1_sample-data1_SEM.json
│       └── sub-rat1_sample-data1_SEM.png
├── sub-rat2
│   └── microscopy
│       ├── sub-rat2_sample-data5_SEM.json
│       └── sub-rat2_sample-data5_SEM.png
├── sub-rat3
│   └── microscopy
│       ├── sub-rat3_sample-data10_SEM.json
│       ├── sub-rat3_sample-data10_SEM.png
│       ├── sub-rat3_sample-data11_SEM.json
│       ├── sub-rat3_sample-data11_SEM.png
│       ├── sub-rat3_sample-data9_SEM.json
│       └── sub-rat3_sample-data9_SEM.png

participants.tsv:

participant_id species
sub-rat1 Rattus norvegicus
sub-rat2 Rattus norvegicus
sub-rat3 Rattus norvegicus

samples.tsv:

sample_id participant_id sample_type
sample-data1 sub-rat1 tissue
sample-data5 sub-rat2 tissue
sample-data9 sub-rat3 tissue
sample-data10 sub-rat3 tissue
sample-data11 sub-rat3 tissue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions