[ENH] BEP031 - New columns to participants.tsv file#816
[ENH] BEP031 - New columns to participants.tsv file#816effigies merged 13 commits intobids-standard:masterfrom
Conversation
src/03-modality-agnostic-files.md
src/03-modality-agnostic-files.md
Outdated
| and `handedness`. We RECOMMEND to make use of these columns, and | ||
| in case that you do use them, we RECOMMEND to use the following values | ||
| for them: | ||
| When different from `homo sapiens`, `participants.tsv` SHOULD include a `species` |
There was a problem hiding this comment.
shouldn't this be MUST, given that all of BIDS assumes humans at this point.
| When different from `homo sapiens`, `participants.tsv` SHOULD include a `species` | |
| When different from `homo sapiens`, `participants.tsv` MUST include a `species` |
There was a problem hiding this comment.
I don't know how we would validate a MUST, here. Also, there are rodent datasets that may not have this column in this form at this point, so we would be breaking backwards compatibility if we could validate. What about:
The RECOMMENDED `species` column MUST be a binomial species name from the
[NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi).
For backwards compatibility, if `species` is absent, the participant is assumed to be
`homo sapiens`.Also, REQUIRE-ing a species name from NCBI Taxonomy feels like it's going to be difficult to validate, as we will need to either query the database or maintain a list of accepted names, updating the validator as new use cases arise... Is there a validation plan?
There was a problem hiding this comment.
Thank you @effigies for the suggestion, I think assuming homo sapiens if the column is omitted is a strong incentive without breaking backward compatibility. I would be in favor of that.
However, I had not thought about the validation, querying the database seems like the best option to not have to maintain an up-to-date list in the validator but it may be difficult to implement. Are there similar requirements elsewhere in the spec? Would the alternative of “SHOULD” or “strongly RECOMMENDED” be advisable?
Also, thinking about it, I think I should add examples other than homo sapiens like mus musculus and rattus norvegicus in the description.
There was a problem hiding this comment.
@effigies - regarding validation, i will raise the same issue here as the other post. i'm not sure we actually validate values for example for sex or anything that could have levels or enumerations.
I don't know how we would validate a MUST
while one could at least detect presence, i agree that keeping with the current perspective of the participants.tsv being a recommended file, we can keep things recommended instead of required.
species does get a little complicated, especially for animals, as you start going into species + genotype notions. here is our generic participant at a timepoint model in dandi: https://github.com/dandi/dandischema/blob/master/dandischema/models.py#L642 (technically all of those properties could come into play, with some being more important for animal studies).
src/03-modality-agnostic-files.md
Outdated
| - `strain_rrid`: research resource identifier ([RRID](https://scicrunch.org/resources/Organisms/search)) | ||
| of the strain of the species | ||
|
|
||
| - `diagnosis`: string value describing the diagnosis of the participant. |
There was a problem hiding this comment.
i don't know if this has to be a string value. in many datasets on openneuro diagnosis/dx is present and can be an enumerated type. also, this is one place, where one can have multiple designations depending on the study. we should allow for some notion of that, or simply remove diagnosis from this file.
There was a problem hiding this comment.
The aim of this PR is to add new columns to describe animal properties. In that context, I agree that diagnosis may be out of scope.
For context, I added the columns following this discussion #779 (comment), #779 (comment) and #779 (comment) because we also introduced pathology in samples.tsv.
There was a problem hiding this comment.
Following your suggestion, I removed the diagnosis column in 7146144 as being out of scope for this PR (animal properties).
src/03-modality-agnostic-files.md
Outdated
| - for "ambidextrous", use one of these values: `ambidextrous`, `a`, `A`, | ||
| `AMBIDEXTROUS`, `Ambidextrous` | ||
|
|
||
| - `strain`: string value indicating the strain of the species |
There was a problem hiding this comment.
Examples for each of these would be useful.
There was a problem hiding this comment.
Just to clarify, do you mean example directly in the description, like above for handedness with ambidextrous, a, A, etc, or an example of particpants.tsv for an animal.
I did not change the example of participants.tsv below which is an example for human. I thought having complete examples for both human and animal would maybe be too much.
There was a problem hiding this comment.
perhaps a few examples from: https://www.jax.org/jax-mice-and-services/find-and-order-jax-mice/most-popular-jax-mice-strains
sappelhoff
left a comment
There was a problem hiding this comment.
I have a more general and perhaps controversial comment, that I'd like to hear opinions on.
Up until now BIDS has been mostly about humans and their brain+behavior data. I assume that >80% of users will continue to come to BIDS from this angle.
Adding coverage for animal ephys, and microscopy into BIDS is great, and I fully support it. However is it a good idea to introduce very specific concepts (such as "strain") that are not used in 80% of BIDS use-cases* into a very general section that discusses things like age, sex, and handedness (which are ubiquitous)? I feel like this would be better discussed in a dedicated, separate section of the spec (animal ephys / microscopy respectively), as it avoids clutter and potential confusion / annoyance with the spec.
*(this is my assumption/prediction, feel free to challenge).
src/03-modality-agnostic-files.md
Outdated
| [NCBI Taxonomy](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi). | ||
|
|
||
| Commonly used *optional* columns in `participants.tsv` files are `age`, `sex`, | ||
| `handedness`, `strain`, `strain_rrid` and `diagnosis`. We RECOMMEND to make use |
There was a problem hiding this comment.
Could group be added to this list as its used in the example below?
There was a problem hiding this comment.
Good question, I'm not sure we should because group is also used below to illustrate how to describe an additional column that is not part of the optional ones in the participants.json example. From my understanding, group may have different meanings depending on the study and it would be hard to define it with a general example
Thanks @sappelhoff for your feedback! The main goal of this PR is to promote We foresee that the BIDS-specification for microscopy will be adopted by many users, some have already started converting their datasets or working on compatible acquisition workflows, but I would not be able to quantify such adoption at this point. I'm not against moving those field into the dedicated microscopy and animal ephys sections if we want to keep the general section more general. The main drawback being duplication in the specification. |
|
Hi @effigies and @sappelhoff, I can move them to the microscopy part if it makes more sense to have them per modalities rather than in the “general” participants file. In that case, this PR would only add the species columns to participants. |
|
tagging @Remi-Gau to chime in instead of me :-) |
|
Tempted to keep those |
agreed! if we only put them in the modality-specific part, we will have to repeat it for several future modalities and it's going to get more complex to handle the specs... |
|
Thank you everyone for your feedback! |
|
Hi @bids-standard/maintainers, I also wanted to mention that it may have an impact on another opened PR (#827) as it deals with TSV columns. |
|
I am still +1 for merge. Any seconds? |
tsalo
left a comment
There was a problem hiding this comment.
It looks good to me. I can handle adding the new columns to the TSV columns PR.
|
Since the last meaningful commit was in August, I think we can merge. @sappelhoff are there any blocks I'm forgetting? |
|
Nope, I think this can go in 👍 |
* Add template. * Add first column. * Add columns. * More terms. * Fill name field in all files. * More work. Note that there are three very different uses of "name" columns, and two of them are equally common, so I chose not to specify any of them as the "canonical" definition. * Add remaining definitions. * Add macro to render column tables. * Fix YAML file. * Consolidate suffixes file. * Remove old individual files. * Move columns file. * Fix things up a bit. * Add columns I missed for modality-agnostic TSV files. * Support n/a for duration. * Apply suggestions from code review Co-authored-by: Chris Markiewicz <effigies@gmail.com> Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org> * Code formatting in stim_file definition. * Allow numbers and strings for value. * Update src/schema/objects/columns.yaml Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org> * Allow n/a for "z" column. Addresses https://github.com/bids-standard/bids-specification/pull/827/files#r723280787. * Describe meanings of x, y, and z columns. Addresses https://github.com/bids-standard/bids-specification/pull/827/files#r723283314. * Allow n/a for status column. Addresses https://github.com/bids-standard/bids-specification/pull/827/files#r723269382. * Add participant_id to participants.tsv table and append info for other IDs. * Split type definitions into channels and electrodes versions. * Update definitions for group based on file type. * Split reference column definition. * Clean up name_channels definition. * Draft new columns from #816 * Add new columns to table. * Remove list items. * Update src/04-modality-specific-files/04-intracranial-electroencephalography.md Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org> * Apply suggestions from code review Co-authored-by: Chris Markiewicz <effigies@gmail.com> * Use two underscores to delineate multiply-defined columns. * Remove text that is now in table. * Update src/schema/objects/columns.yaml Co-authored-by: Chris Markiewicz <effigies@gmail.com> * Add sections to README on columns file and on reused terms. * Add EDF info to acq_time definition. * Remove hardcoded tables. * Remove unused links. Co-authored-by: Chris Markiewicz <effigies@gmail.com> Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
Dear BIDS community,
Context
As part of the development of the Microscopy BEP (BEP031 @mariehbourget, @jcohenadad) and Animal Ephys BEP (BEP032 @SylvainTakerkart, @JuliaSprenger), the “sample” entity was introduced along with a
samples.tsvfile describing properties of the samples in PR #812.To better describe animal participants, new properties of the subjects also need to be included in
participants.tsv.Contribution
The purpose of this PR is to add new recommended column (
species) and optional columns (strain,strain_rridanddiagnosis) to the participants.tsv file.See issue #779 for related discussions on this topic.