[MRG, MAINT] Fetching Datasets Public API "fetch_dataset" used by all internal MNE datasets now#9763
Conversation
adam2392
left a comment
There was a problem hiding this comment.
@drammock in your opinion what is the best way to pass in processor for the existing MNE datasets?
Option 1: Stored in the config.py1 file. See https://github.com/mne-tools/mne-python/pull/9763/files#r712535501, and see mne/datasets/testing/_testing.py`
Option 2: Stored directly in each dataset's data_path() function. See https://github.com/mne-tools/mne-python/pull/9763/files#r712536490
|
option 1 seems better; I like it being in the same place as the hash. With option 2 it's not clear how users would handle private datasets that also needed unzipping/untarring. However you'll need to figure out how to still make pooch an optional dependency since with option 1 it's used outside of a function. |
I played around with it, and there is no easy way to make it all option 1 because some of the datasets require some customized pathing anyways :/, so I went w/ option 2. e.g. We still need this code inside Lmk wyt. I can also make a quick example to demonstrate |
|
I guess I can actually just get rid of _data_path |
|
Pair PR in: #9764 |
|
CI Failure seems like its the recurring test in whitener issue. Here is the generated docs: https://34581-1301584-gh.circle-artifacts.com/0/dev/generated/mne.datasets.fetch.fetch_dataset.html#mne.datasets.fetch.fetch_dataset |
|
if we already have a clear usecase for fetch_dataset function then let's do it. 2 things:
thanks for the clarifications @adam2392 |
|
No strong opinion about that, so leave config.py as-is if you think that's cleaner.
Sent from ProtonMail mobile
…-------- Original Message --------
On Sep 22, 2021, 14:42, Adam Li wrote:
@adam2392 commented on this pull request.
---------------------------------------------------------------
In [mne/datasets/fetch.py](#9763 (comment)):
> + dataset_params : dict of dict
+ The dataset name and corresponding parameters to download the dataset.
+ The dataset parameters that contains the following keys:
+ ``archive_name``, ``url``, ``folder_name``, ``hash``,
+ ``config_key`` (optional). See Notes.
Makes sense to me.
Should I change the format in config.py, or handle internally the MNE datasets to go from dict of dict -> list of dict if needed when calling fetch_dataset?
I think the format in config.py is nice cuz it consolidates LOC per dataset.
—
You are receiving this because you were mentioned.
Reply to this email directly, [view it on GitHub](#9763 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AAN2AU25CLZTZYOBTO2SUXDUDJES5ANCNFSM5EMIKLWQ).
|
larsoner
left a comment
There was a problem hiding this comment.
See bugs reported in #9774 (comment)
adam2392
left a comment
There was a problem hiding this comment.
The main changes are in:
- mne/datasets/utils.py
- mne/datasets/fetch.py
larsoner
left a comment
There was a problem hiding this comment.
LGTM, will merge once CIs are happy to get CircleCI green again!
@drammock @agramfort feel free to comment as well but from what I can see @adam2392 has addressed your comments. If I missed something then feel free to comment and I'm sure @adam2392 will be happy to make another PR :)
|
I still don't understand the SphinxWindows failure, but this one on CircleCI seems legit (I can replicate locally): |
|
+1 for MRG if Ci failure is unrelated |
Ah yeah the file is a zip file, so we just needed to make it Original code I copied over was accidentally nested_untar (Ref: https://github.com/mne-tools/mne-python/pull/9742/files) |
|
With the exception of the 3 windows Azure CI failures that I have no idea on, this is good to go by me. I also tested it out w/ my downstream repo and the fetch_dataset works nicely for a private GITHUB repo. lmk if you find other things I need to fix. |
|
I don't understand the Windows Sphinx error -- I'll probably merge ignoring that assuming the empty |
|
Next: Details@adam2392 can you try to fix, and when you make your commit and push have |
|
So I think all the downloads work now, but this is a weird error: I see |
| head_mri_t = mne.coreg.estimate_head_mri_t('sample_seeg', subjects_dir) | ||
| this_subject_dir = op.join(misc_path, 'seeg') | ||
| head_mri_t = mne.coreg.estimate_head_mri_t('sample_seeg', this_subject_dir) |
There was a problem hiding this comment.
This should fix though. Lmk if this is incorrect.
Reference: https://github.com/mne-tools/mne-python/pull/9763/files#r716075200
| # update the checksum in `mne/data/dataset_checksums.txt` and change version | ||
| # here: ↓↓↓↓↓ ↓↓↓ | ||
| RELEASES = dict(testing='0.123', misc='0.18') | ||
| RELEASES = dict(testing='0.123', misc='0.22') |
There was a problem hiding this comment.
The misc dataset was never updated since before pooch went in
mne-python/mne/datasets/utils.py
Line 257 in 034ff3f
So I updated it here. This results in a few examples and tests breaking because it looks like sample_ecog.mat is replaced by sample_ecog_ieeg.fif. I point out where I updated in those files.
| elif dataset == 'misc': | ||
| fname = op.join(misc.data_path(), 'ecog', 'sample_ecog.edf') | ||
| raw = read_raw_edf(fname) | ||
| fname = op.join(misc.data_path(), 'ecog', 'sample_ecog_ieeg.fif') |
There was a problem hiding this comment.
| subjects_dir = op.join(sample_path, 'subjects') | ||
| misc_path = mne.datasets.misc.data_path() | ||
| ecog_data_fname = op.join(misc_path, 'ecog', 'sample_ecog.mat') | ||
| ecog_data_fname = op.join(misc_path, 'ecog', 'sample_ecog_ieeg.fif') |
There was a problem hiding this comment.
|
So in summary the circle CI fixes are:
This PR now addresses those and updated |
|
Green, in it goes! |
|
Great stuff @adam2392 🎉 Im excited to add a few datasets to MNE-NIRS using this, thanks. |
…d by all internal MNE datasets now (mne-tools#9763)" This reverts commit 0b503d8.
Reference issue
Closes #9736
Closes #9774
What does this implement/fix?
This adds the public function
mne.datasets.fetch_datasetand refactors the existing MNE datasets to use that function.This also refactors
_data_pathfunction to be more generic.