Skip to content

layout.get_collections drops user-added columns (and 2 bonus questions) #273

@toddt

Description

@toddt

As discussed on neurostars, here:
https://neurostars.org/t/replicable-scripts-bids-and-curating-data/2623

I'm trying to document and automate the runs that I'm including in first-level analyses by adding custom columns to the scans.tsv file in the BIDS subject directory.

I've added a few columns to document excluded runs, as seen in the attached scans file (renamed to .txt extension to make github happy).
sub-SAXEIB06_scans.txt

When I try to use

bvcSessList = layout.get_collections(level='session',subject=sub)
df = bvcSessList[0].to_df()
print(bvcSessList[0].variables)

one of the columns ("OtherExclusion") has been dropped from both the df and the variables list.

I'm pretty sure that's happening because the column is a duplicate of another exclusion column ("RepeatSubjectExclusion") that has the same values of False for all runs, and this line kills it:

_data = _data.T.drop_duplicates().T

I can code around this problem in a few ways, but it seems like maybe not the ideal behavior for get_collections().

Bonus questions:

  1. the scans.tsv filename is parsed into modality/run/type/subject/task correctly, and those columns show up in the dataframe, but I can't find them (or the original filename field) in the variables or entities dictionaries. I'd think that they should be available, no?

  2. get_collections(level='session') only seems to return the func modality, and omits the anat session in the scans.tsv file. Is this intended behavior?

Thanks!
Todd

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions