TLDR
In the participants.tsv file, the age and sex columns are sometimes not well defined, and this leads to (unnecessary) issues on the side of tool developers (and thus eventually the users). We should improve either the spec or the validator, or both.
cc @jasmainak @agramfort @adam2392 @hoechenberger
came up in: mne-tools/mne-bids#396
Intro
The specification says the following about the participants.tsv file:
In case of single session studies this file has one compulsory column participant_id that consists of sub-, followed by a list of optional columns describing participants.
so strictly speaking, all columns that are not participant_id are OPTIONAL, and thus SHOULD be described in an accompanying participants.json.
For optional columns that are not described, the validator currently emits a warning such as this:
1: [WARN] Tabular file contains custom columns not described in a data dictionary (code: 82 - CUSTOM_COLUMN_WITHOUT_DESCRIPTION)
./participants.tsv
Evidence: Columns: group not defined, please define in: /participants.json
Yet, the validator treats some "optional" columns differently, i.e., these columns are accepted WITHOUT warning. Examples of these are:
However, the specification does not cover that these two variables are "expected optional columns". The expected behavior would be to raise a warning also for age and sex.
I could not pin down the exact part of the validator that is responsible for this behavior, but it may be this line:
https://github.com/bids-standard/bids-validator/blob/dfabbfb058daca406ed1d0897c3a25be059a5ad6/bids-validator/utils/summary/collectSubjectMetadata.js#L31
perhaps @nellh or @rwblair can help
The problem
The issue that arises from this (apart from inconsistency) is that users define their own levels for the sex column, and are NOT reminded by the validator to please define their levels further in a participant.json.
As a result, these values are hard (or impossible) to parse by software.
E.g., we may have the following participants.json:
participant_id age sex
sub-05 25 fem
sub-06 30 ma
sub-07 26 ma
what's fem? what's ma?
How to fix?
I think we should do one of the following:
- fix the validator so that it emits a warning if age and sex are columns in
participants.tsv but have no description in an accompanying participants.json
OR
- Amend the
participants.tsv part of specification and explicitly say that age and sex are "to-be-expected" columns ... and then also define the expected inputs:
- age MUST be a float (years since birth)
- if a user wants to specify age differently, they must make their own custom column, e.g.
age_in_months
- sex MUST be a string (here we need to discuss, which strings we accept. Most straight forward would perhaps be "male", "female",
"undefined", "other", but I would like somebody with a bit more experience in inclusive language to make a suggestion here.
- again: if a user wants to do their own sex column they can make their own custom column with a wide range of acceptable factor levels
TLDR
In the
participants.tsvfile, theageandsexcolumns are sometimes not well defined, and this leads to (unnecessary) issues on the side of tool developers (and thus eventually the users). We should improve either the spec or the validator, or both.cc @jasmainak @agramfort @adam2392 @hoechenberger
came up in: mne-tools/mne-bids#396
Intro
The specification says the following about the participants.tsv file:
so strictly speaking, all columns that are not
participant_idare OPTIONAL, and thus SHOULD be described in an accompanyingparticipants.json.For optional columns that are not described, the validator currently emits a warning such as this:
Yet, the validator treats some "optional" columns differently, i.e., these columns are accepted WITHOUT warning. Examples of these are:
However, the specification does not cover that these two variables are "expected optional columns". The expected behavior would be to raise a warning also for age and sex.
I could not pin down the exact part of the validator that is responsible for this behavior, but it may be this line:
https://github.com/bids-standard/bids-validator/blob/dfabbfb058daca406ed1d0897c3a25be059a5ad6/bids-validator/utils/summary/collectSubjectMetadata.js#L31
perhaps @nellh or @rwblair can help
The problem
The issue that arises from this (apart from inconsistency) is that users define their own levels for the
sexcolumn, and are NOT reminded by the validator to please define their levels further in aparticipant.json.As a result, these values are hard (or impossible) to parse by software.
E.g., we may have the following
participants.json:what's
fem? what'sma?How to fix?
I think we should do one of the following:
participants.tsvbut have no description in an accompanyingparticipants.jsonOR
participants.tsvpart of specification and explicitly say that age and sex are "to-be-expected" columns ... and then also define the expected inputs:age_in_months"undefined", "other", but I would like somebody with a bit more experience in inclusive language to make a suggestion here.