Skip to content

Adding an optional token to the dataset fetcher code to allow optional fetching from private repositories #9736

@adam2392

Description

@adam2392

Describe the new feature or enhancement

The dataset fetching code inside mne/datasets/utils.py, mne/utils/fetching.py are actually very general. I was hoping to leverage them without copy/pasting the code, so I can make use of upstream possible bug fixes / performance improvements (if they ever occur).

However, in some cases, I would like to unit test against private data I have stored on Github, and they require an API token with the HTTP request. Eventually, then some of that data would be made public after say a publication, but it's then nice to build into a CI for myself for a private research project in the meantime.

Is it possible to add an optional "token" into the dataset fetcher? This would also enable MNE to leverage private repos. In addition, it would lessen the code dependency for anyone trying to implement a data fetcher without copying every single function from MNE.

Describe your proposed implementation

Add optional token=None kwarg to the following functions:

  1. _download
  2. _fetch_file
  3. _get_http

Then one can easily add optional tokens in _data_path, depending on which dataset is being fetched. This would also enable any "mne" package, like mne-bids/connectivity/etc. to leverage private Github repo data that might get passed in via GH actions.

Describe possible alternatives

If we further refactor things, so that key, urls, archive_names, folder_origs, folder_names, md5_hashes are passed into _data_path, rather then set inside _data_path, then to create a MNE-fetcher, one simply needs to define a data_path that then passes these to _data_path, and they have a fully functional: mne_downstream_package.testing.data_path() that fetches their own datasets for testing without having to rely on MNE-Python for data fetching.

Additional Information

I think this also might be helpful in further cementing MNE-Python as a platform for developing neuroscience/clinical-neuroscience applications that sometimes might need data fetchers in their CI / testing pipeline for "private data".

Ref: https://chanzuckerberg.com/eoss/proposals/improving-usability-of-core-neuroscience-analysis-tools-with-mne-python/

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions