Skip to content

BUG: Split file issue on dashes#8340

Merged
larsoner merged 7 commits intomne-tools:masterfrom
eort:split-files-fix
Oct 12, 2020
Merged

BUG: Split file issue on dashes#8340
larsoner merged 7 commits intomne-tools:masterfrom
eort:split-files-fix

Conversation

@eort
Copy link
Copy Markdown
Contributor

@eort eort commented Oct 4, 2020

Reference issue

Potentially fixes #8339

Additional information

Local tests with a few variations on filenames worked, but I did not extensive testing.

The most likely source for problems in this commit is that I did not understand what the function of former lines 90 and 91 were. For my proposed fix, these lines would not do anything, so I removed them.

Hope that's useful.

@hoechenberger
Copy link
Copy Markdown
Member

Thanks for your contribution, @eort!

@larsoner Can you have a look?

@larsoner
Copy link
Copy Markdown
Member

larsoner commented Oct 5, 2020

Thanks for looking into this!

Local tests with a few variations on filenames worked, but I did not extensive testing.

Can you modify this test to save and load a split file using a name that fails on master but passes on this PR:

https://github.com/mne-tools/mne-python/blob/master/mne/io/fiff/tests/test_raw_fiff.py#L360-L361

It looks like Azure CI style check is unhappy with:

./mne/io/open.py:86:1: W293 blank line contains whitespace
./mne/io/open.py:93:1: W293 blank line contains whitespace

@eort
Copy link
Copy Markdown
Contributor Author

eort commented Oct 6, 2020

Can you modify this test to save and load a split file using a name that fails on master but passes on this PR:

Well, sort of. I don't think I can properly test it, because the internals of the test fif file seem to be different from the fif file I experienced the error. Specifically, in the test file the next file name seems to be coded in FIFF.FIFF_REF_FILE_NAME, whereas in my file it is coded in FIFF.FIFF_REF_FILE_NUM. So, with the test file I can't reproduce the problems I have experienced.

As a very crude approach to bypass this issue, I can force open.py to ignore the FIFF.FIFF_REF_FILE_NAME field and instead always use the FIFF.FIFF_REF_FILE_NUM field. When doing so, I can create a case such that reading a split file on master fails, while it works with this PR. However, the even though the PR works on that particular part, it doesn't pass the entire test. I suspect this is because I had to disable the FIFF.FIFF_REF_FILE_NAME field, in order to test the PR.

A better way of testing would be to use a different test file, one that doesn't code the next file name as name but as number. As far as I understand this would be coded somewhere in the fif file itself, right? Can this be done when saving split files? With some pointers on how to do that, I could try to fix that.

It looks like Azure CI style check is unhappy with

Yes, I fixed that.

examplename and others added 2 commits October 6, 2020 18:12
Copy link
Copy Markdown
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, in the test file the next file name seems to be coded in FIFF.FIFF_REF_FILE_NAME, whereas in my file it is coded in FIFF.FIFF_REF_FILE_NUM

I pushed a test in 57588ee that uses monkeypatch to prevent writing the name. This fails on master but passes on this PR, so we should be good to go from that end.

Can you update latest.inc in the BUG section? Then we can merge and backport

@larsoner larsoner changed the title fix split files issue on dashes BUG: Split file issue on dashes Oct 6, 2020
@larsoner larsoner added the backport-candidate on-merge: backport to maint/1.12 label Oct 6, 2020
@eort
Copy link
Copy Markdown
Contributor Author

eort commented Oct 7, 2020

Can you update latest.inc in the BUG section? Then we can merge and backport

Done.

One more thing, that extra test that you pushed does not test BIDS-formatted filenames (files ending on ...-1_meg.fif). I don't think this PR would pass this test because of line 83 not being a number, but <number>_meg. In fact, I adapted your new test to check a bids-formatted file and indeed failed under this PR.

Should BIDS-formatted files be covered with this function? Or should users just use the mne_bids for reading bids files? If split bids files should also be properly read with this function, the PR needs some fixing. (edit: and the changes to latest.inc to be undone for now)

@larsoner
Copy link
Copy Markdown
Member

larsoner commented Oct 7, 2020

In fact, I adapted your new test to check a bids-formatted file and indeed failed under this PR.

Does it pass on master but fail on this PR? Then yes it needs to be fixed in this PR. If no, then we can fix it in a follow-up PR if need be.

I'm not sure how often people will run into this problem / how problematic it actually is, though. MNE-Python (and MNE-BIDS) write the FIFF_REF_FILE_NAME tag. How were the files produced that didn't have the FIFF_REF_FILE_NAME tag? I'm not sure the BIDS-formatted filename is even FIFF_REF_FILE_NUM-compatible -- I'd assume that if only that tag is available, the subsequent ones must be named -N.fif.

@eort
Copy link
Copy Markdown
Contributor Author

eort commented Oct 12, 2020

Does it pass on master but fail on this PR?

No, it also fails on master. So, overall it seems that splitting on numbers neither works if the filename contains dashes (fixed by this PR), nor if the filename is BIDS-formatted (would require a new PR, that would also affect the current PR).

How were the files produced that didn't have the FIFF_REF_FILE_NAME tag?

They were written by the MEG machine/acquisition software (Neuromag/Elekta scanner).

I'm not sure the BIDS-formatted filename is even FIFF_REF_FILE_NUM-compatible -- I'd assume that if only that tag is available, the subsequent ones must be named -N.fif

Possible. I agree that it probably doesn't affect many users. The only reason I bumped into this problem was that I was aware of BIDS (so I tried to have my files BIDS-formatted), but not of mne_bids. With mne_bids, it all works just fine. In that case, would it then make sense to show a warning, urging participants to use mne_bids when the filepath appears to be BIDS-formatted?

@larsoner
Copy link
Copy Markdown
Member

No, it also fails on master. So, overall it seems that splitting on numbers neither works if the filename contains dashes (fixed by this PR), nor if the filename is BIDS-formatted (would require a new PR, that would also affect the current PR).

We already have tests for BIDS-formatting splits. They work just fine if you save using MNE-Python (master and this PR) when using split_naming='bids':

https://github.com/mne-tools/mne-python/blob/master/mne/io/fiff/tests/test_raw_fiff.py#L370-L381

So the "problem" is really with the Neuromag software, or maybe more so with our expectations for it. If a given piece of software only writes out the FILE_NUM tag when splitting, I think this means that the files that get created must be named *.fif and *-1.fif. Is this the behavior you see when you try saving a file with the acquisiton software?

If that's the case then I think things with MNE-Python and mne-bids are fine, but people using Elekta acquisition software should not try to use BIDS filenames if they end up with split files because the software cannot write them properly. Or if they do it and end up with non-BIDS filenames, they need to use MNE-Python (with split_naming='bids') to do a save-load round trip to get the names to be BIDS compatible.

@eort
Copy link
Copy Markdown
Contributor Author

eort commented Oct 12, 2020

I think this means that the files that get created must be named *.fif and *-1.fif. Is this the behavior you see when you try saving a file with the acquisiton software?

Yes, split-files are written with -1.fif appended to the file name.

people using Elekta acquisition software should not try to use BIDS filenames....Or if they do it and end up with non-BIDS filenames, they need to use MNE-Python (with split_naming='bids') to do a save-load round trip to get the names to be BIDS compatible.

That makes sense. But ideally this would also be somehow communicated. Otherwise the situation might be a little confusing.
if you try to read a neuromag/bids-formatted file, on the current main, there is no sign that there is a problem (unless you have a clear expectation how long the recording should be). On the PR, you will get RuntimeWarning: Split raw file detected but next file /tmp/pytest-of-ede/pytest-11/test_split_numbers0/sub-1_ses-2_task-3_split-01_meg.fi-2.fif does not exist. or something like this.

@larsoner
Copy link
Copy Markdown
Member

neuromag/bids-formatted file, on the current main, there is no sign that there is a problem

Can you make an issue for this on mne-bids? Perhaps they should detect the missing file warning and probably turn it into an error.

@larsoner larsoner merged commit 629db04 into mne-tools:master Oct 12, 2020
@larsoner
Copy link
Copy Markdown
Member

Thanks @eort !

larsoner added a commit that referenced this pull request Oct 12, 2020
* fix split files issue on dashes

* style fix

* TST: Add test case

* update doc

* new author contributions

Co-authored-by: examplename <e.ort@vu.nl>
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
@larsoner larsoner added backported and removed backport-candidate on-merge: backport to maint/1.12 labels Oct 12, 2020
@eort
Copy link
Copy Markdown
Contributor Author

eort commented Oct 12, 2020

Can you make an issue for this on mne-bids? Perhaps they should detect the missing file warning and probably turn it into an error.

Sure, but why on mne-bids? The issue occurs when using io.read_raw_fif(filename). I haven't tested what would happen when mne-bids was used for such files.

@larsoner
Copy link
Copy Markdown
Member

Sure, but why on mne-bids? The issue occurs when using io.read_raw_fif(filename)

Let's talk about a concrete example. What use case are you trying to address where you will get this warning?

From what I understand, if you:

  1. If you write data with MNE-Python or MNE-BIDS, things are fine and you can use BIDS names.
  2. If you write data with Neuromag software, they don't name it according to BIDS, which is reasonable (they don't give this option, and have been writing files for years before BIDS existed)
  3. If you rename the second part of the split file, it won't read properly -- but I'm not sure people would expect this behavior.

The motivation behind making changes in mne-bids is that mne-bids is all about conforming to a spec and making sure everything is in the right place. MNE-Python is more flexible and allows you to (and has allowed you to for quite a while) have these split files missing, and you get a warning.

I guess we could add a on_split_missing='warn' argument to read_raw_fif or so, but I have a feeling that this wouldn't actually address the problem much unless we change the default to on_split_missing='raise', which we could consider doing. People who want to ignore a split file can always set this to warn or ignore.

@eort
Copy link
Copy Markdown
Contributor Author

eort commented Oct 12, 2020

One more case:
4. If you change both parts of a split file, produced with neuromag, to a BIDS name.

But yes, this is a very specific case, that hardly ever will occur. As far as I see, there is only a problem when users try to read files with mne-python that were recorded on neuromag systems and were manually renamed to adhere to BIDS. Now that the PR is merged, split file names won't produce correct file names and an error will be raised. That could already be sufficient, but
a warning a la "Don't manually rename files to BIDS and try to read them with mne-python, but just use mne-bids" might be helpful, though super-niche. Perhaps it would be better to address such behavior in the docs?

The motivation behind making changes in mne-bids is that mne-bids is all about conforming to a spec and making sure everything is in the right place.

No, I get that and I completely agree. I just haven't tested anything with mne-bids. So, I wasn't aware that the same behaviour would be observed when using mne-bids. If the same issue is present in mne-bids, then indeed, split files would just silently be ignored, if they were produced by the niche-case described above.

I guess we could add a on_split_missing='warn' argument to read_raw_fif or so, but I have a feeling that this wouldn't actually address the problem much unless we change the default to on_split_missing='raise', which we could consider doing. People who want to ignore a split file can always set this to warn or ignore.

I guess that would generally be a useful warning, but given that you normally have clear expectations regarding the length of your files perhaps not essential.

marsipu pushed a commit to marsipu/mne-python that referenced this pull request Oct 14, 2020
* fix split files issue on dashes

* style fix

* TST: Add test case

* update doc

* new author contributions

Co-authored-by: examplename <e.ort@vu.nl>
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
@eort eort deleted the split-files-fix branch December 2, 2020 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

automatically combining split files with dashes in the filename fails

3 participants