[WIP] [ENH] Export EDF files using pyedflib by withmywoessner · Pull Request #12085 · mne-tools/mne-python

withmywoessner · 2023-10-06T03:39:01Z

Reference issue
Addresses #9883

What does this implement/fix?
I modify the _export_raw() function for edf files to use pyedflib instead of EDFlib as pyedflib is significantly faster.

Additional information
My basic implementation works with files that have an integer sampling rate. However, I run into problems with non int sampling rates. I think it has something to do with the setDatarecordDuration function being implemented differently in each library. I also noticed that in my implementation tolerances for the data are around .01 to .1 compared to the original file which may be too high.

for more information, see https://pre-commit.ci

wmvanvliet

Happy to see you working on this @withmywoessner!

wmvanvliet · 2023-10-09T06:45:02Z

mne/export/_edf.py

+    #  DeprecationWarning: `sample_rate` is deprecated and
+    # will be removed in a future release.
+    # Please use `sample_frequency` instead
+    # This causes test to fail, so we catch it here


Couldn't we update the test instead?

I'll look into it!

mne/export/_edf.py

wmvanvliet · 2023-10-09T06:59:51Z

mne/export/_edf.py

-                            f"trying to write {desc} at {onset} "
-                            f"for {duration} seconds."
-                        )
+    del hdl


why is this needed?

I'm not sure if it is. I just included it because in the documentation for the pyedflib high level functions they include it:

with pyedflib.EdfWriter(edf_file, n_channels=n_channels, file_type=file_type) as f: f.setDatarecordDuration(int(100000 * block_size)) f.setSignalHeaders(signal_headers) f.setHeader(header) f.writeSamples(signals, digital=digital) for annotation in annotations: f.writeAnnotation(*annotation) del f

@wmvanvliet Thank you for the suggestions! The thing I really wanted help/opinions on is getting the tolerances to be lower and getting the setDatarecordDuration function to match the original implementation so that the sampling rate matches the original implementation

regarding del f, I really don't think it's necessary since we're at the end of the function already: when the function returns, f should be cleaned up automatically.

Could you explain a bit more about the tolerances and need to set setDatarecordDuration? Sorry, I'm note completely up to date about the details of the EDF/BDF file format.

Could you explain a bit more about the tolerances and need to set setDatarecordDuration? Sorry, I'm note completely up to date about the details of the EDF/BDF file format.

Sure @wmvanvliet, basically when I tested my first code, the tolerances were way too high:

The current implementation expects tolerances of 1e-5. Also, the setDatarecordDuration function basically lets you have a non-int sampling rate. There are some problems with the documentation (see this issue), but I think I figured that out. The function is supposed to give the same sampling rate as the original file but instead, it is slightly lower.
Original:

EDF export:

I suspect this is one reason why the tolerances are different, but it can't entirely explain the problem as the tolerances are too high even with an integer sampling rate.

The current precision is 1.110142833033462e-08 V, or 0.011101428330334619 µV – so we should be good?

From my testing I thought the test expected 1e-5 µV, but I could have been wrong. Also, the test expects a tolerance of 1e-5 for the timepoints as well:

# Due to the data record duration limitations of EDF files, one # cannot store arbitrary float sampling rate exactly. Usually this # results in two sampling rates that are off by very low number of # decimal points. This for practical purposes does not matter # but will result in an error when say the number of time points # is very very large. assert_allclose(raw.times, raw_read.times[:orig_raw_len], rtol=0, atol=1e-5)

This is why the sampling rate is a problem because it changes the number of time points and therefore the value. Maybe we just need to change the tests?

wmvanvliet · 2023-10-09T08:09:44Z

Ah, I see now the problem with tolerances. For reference, here is a test script that demonstrates the problem:

import mne

raw_fname = mne.datasets.sample.data_path() / "MEG" / "sample" / "sample_audvis_raw.fif"
raw = mne.io.read_raw_fif(raw_fname)
raw.pick("eeg")
raw.load_data()

# Save to EDF and load again
raw.export("/tmp/test.edf", overwrite=True)
raw2 = mne.io.read_raw_edf("/tmp/test.edf", preload=True)

# Print the first 5 samples and timepoints
print(raw[0, :5])
print(raw2[0, :5])

current output:

(array([[1.13989260e-05, 9.85015885e-06, 7.68188489e-06, 5.82336435e-06,
        6.81457530e-07]]), array([0.        , 0.00166496, 0.00332992, 0.00499488, 0.00665984]))
(array([[1.14042070e-05, 9.86110885e-06, 7.68522951e-06, 5.83129150e-06,
        6.91331626e-07]]), array([0.        , 0.00166773, 0.00333546, 0.00500319, 0.00667092]))

wmvanvliet · 2023-10-09T08:25:46Z

I guess there's not much we can do. With 16-bit precision, the maximum precision we can obtain on the EEG part of the sample dataset is:

digital_min = -32767
digital_max = 32767
physical_min = raw.get_data().min()
physical_max = raw.get_data().max()
tolerance = (physical_max - physical_min) / (digital_max - digital_min)
print(tolerance * 1E6, "microvolts")

0.011101428330334619 microvolts

wmvanvliet · 2023-10-09T08:28:15Z

And it's a similar story for the sample rate. Given that EDF doesn't support non-integer sampling rates at all, I think we do rather well.

withmywoessner · 2023-10-09T08:32:53Z

Okay, I see. Thanks @wmvanvliet! Why does this not occur in the current implementation? Could this be improved in the pyedflib code? Also, since this may be a problem for others should we allow users to choose either library to use?

cbrnr · 2023-10-09T08:34:51Z

There are two (unrelated) problems:

Precision
Non-integer sampling frequencies

Regarding 1, why did the issue not occur previously?

Regarding 2, did you see Q9 in https://www.edfplus.info/specs/edffaq.html? I guess we have to live with some inaccuracies, I'm just not sure how to best define the record duration and the number of samples per record.

withmywoessner · 2023-10-09T08:42:40Z

Regarding 1, why did the issue not occur previously?

I also am wondering this. I looked in the pyedflib documentation and the digital_min is -32768 (vs -32767 in the current implementation) would this make a slight difference @wmvanvliet @cbrnr?

cbrnr · 2023-10-09T08:44:06Z

mne/export/_edf.py

            for key, val in [
                ("PatientCode", subj_info.get("his_id", "")),
                ("PatientName", name),
-                ("PatientGender", sex),
-                ("AdditionalPatientInfo", additional_patient_info),
+                ("Gender", sex),
+                ("PatientAdditional", additional_patient_info),


According to https://www.edfplus.info/specs/edfplus.html#additionalspecs, should the order not be: Code, Gender, Birthdate, Name?

I don't think this matters for the function of the code, because the code is just using those to call a function (ex setGender, setPatientAdditional). I can change it though to be more consistent. @cbrnr

You are right. Could you still change the order to be consistent with the specs?

cbrnr · 2023-10-09T08:45:40Z

I also am wondering this. I looked in the pyedflib documentation and the digital_min is -32768 (vs -32767 in the current implementation) would this make a slight difference @wmvanvliet @cbrnr?

No, -32768 is actually correct. You have 16bit, so you go from -2**15 to 2**15 - 1 (the - 1 accounts for zero).

Regarding the difference (I've also replied in the other thread), the current precision is 1.110142833033462e-08 V, or 0.011101428330334619 µV – so we should be good?

cbrnr · 2023-10-09T09:55:46Z

Looking at the difference between original and exported files for the current edflib-python based export yields:

>>> np.abs(raw.get_data() - raw_read.get_data()).max()
6.749146666805992e-09

In this PR (using pyedflib):

>>> np.abs(raw.get_data() - raw_read.get_data()).max()
0.00016842048660163812

That's a huge difference, and we should not lower the tolerance than what we currently have. If it worked with edflib-python, we should be able to achieve the same results with pyedflib I think.

wmvanvliet · 2023-10-09T10:47:59Z

How did the previous edflib implementation achieve such good tolerances? Did it save in 32 bit by any chance?

withmywoessner · 2023-10-10T01:13:34Z

@wmvanvliet @cbrnr. I did some testing and the tolerances can be much higher depending on the size of the data:

CURRENT IMP:
filename: /home/woess/workspace/current_assignment/read_brain/Raw Files/BRAIN_VIS.eeg
n_times: 3278420
sfreq: 1000.0
first 10 samples: [-10377.2896087 -10378.2173426 -10373.6763293 -10373.3345326
 -10380.0239823 -10383.1001526 -10380.9517162 -10376.8013277
 -10375.5806252 -10377.924374 ]
first 10 times: [0.    0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009]
last 10 times: [3278.41  3278.411 3278.412 3278.413 3278.414 3278.415 3278.416 3278.417
 3278.418 3278.419]

filename: /home/woess/workspace/current_assignment/read_brain/export/test_10.edf
n_times: 3279000
sfreq: 1000.0
first 10 samples: [-10377.28695257 -10378.30553301 -10373.72192106 -10373.72192106
 -10380.34269387 -10383.39843516 -10381.3612743  -10376.77766236
 -10375.75908193 -10378.30553301]
first 10 times: [0.    0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009]
last 10 times: [3278.99  3278.991 3278.992 3278.993 3278.994 3278.995 3278.996 3278.997
 3278.998 3278.999]


PYEDFLIB IMP:
filename: /home/woess/workspace/current_assignment/read_brain/Raw Files/BRAIN_VIS.eeg
n_times: 3278420
sfreq: 1000.0
first 10 samples: [-10377.2896087 -10378.2173426 -10373.6763293 -10373.3345326
 -10380.0239823 -10383.1001526 -10380.9517162 -10376.8013277
 -10375.5806252 -10377.924374 ]
first 10 times: [0.    0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009]
last 10 times: [3278.41  3278.411 3278.412 3278.413 3278.414 3278.415 3278.416 3278.417
 3278.418 3278.419]

filename: /home/woess/workspace/current_assignment/read_brain/export/test_10.edf
n_times: 3279000
sfreq: 1000.0
first 10 samples: [-10377.28695257 -10378.30553301 -10373.72192106 -10373.72192106
 -10380.34269387 -10383.39843516 -10381.3612743  -10376.77766236
 -10375.75908193 -10378.30553301]
first 10 times: [0.    0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009]
last 10 times: [3278.99  3278.991 3278.992 3278.993 3278.994 3278.995 3278.996 3278.997
 3278.998 3278.999]

As you can see the tolerance for both implementations is about .01 and the values are the same. This is very far from what the testing data expects (1e-5) This is for a file that is about an hour in length:

So for int sampling rates, I think we can definitively say it works the same so I do not think it is a problem with the size (16 bit vs 32 bit). I think it is entirely due to the datarecords being different sizes for non int sampling rates

cbrnr · 2023-10-10T11:40:20Z

So you are saying that the differences are almost entirely caused by to the non-integer sampling frequency? Do we have round-trip tests that use data with an integer sampling frequency?

withmywoessner · 2023-10-10T11:55:27Z

So you are saying that the differences are almost entirely caused by to the non-integer sampling frequency?

Yes

Do we have round-trip tests that use data with an integer sampling frequency?

I am not sure what a round-trip test is, but all the edf tests I see either generate a non-int sampling rate file from scratch or use 'test_raw.fif', which has a non-int sampling rate.

larsoner · 2023-10-10T12:26:22Z

A round-trip test in this case means you take some data, write it out, read what you wrote back in, and check the read-in data against the original data (have gone in a "full circle", i.e., a read-write round trip). We have a bunch of EDF test files in the testing dataset:

$ ls MNE-testing-data/EDF/
SC4001EC-Hypnogram.edf			subsecond_starttime.edf			test_edf_stim_resamp.edf		test_reduced.edf
chtypes_edf.edf				test_edf_overlapping_annotations.edf	test_generator_2.edf			test_utf8_annotations.edf

Some or all of these probably have integer sampling rates.

…dflibpytest

for more information, see https://pre-commit.ci

Co-authored-by: Marijn van Vliet <w.m.vanvliet@gmail.com>

cbrnr · 2023-11-09T10:54:43Z

Can I suggest that we give https://github.com/the-siesta-group/edfio a try? I haven't tested it yet, but I have a strong feeling that this package should be exactly what we want. I'll be playing around with it in the next couple of days, so I can also try to integrate it for our export.

withmywoessner · 2023-11-09T23:06:25Z

Can I suggest that we give https://github.com/the-siesta-group/edfio a try? I haven't tested it yet, but I have a strong feeling that this package should be exactly what we want. I'll be playing around with it in the next couple of days, so I can also try to integrate it for our export.

Okay sounds good! Maybe a new pull request should be made for that then?

cbrnr · 2023-11-10T07:17:01Z

Okay sounds good! Maybe a new pull request should be made for that then?

Yes, I'll definitely do that in a new PR. However, since I don't know how well this package will work, it's best to keep this one alive for now.

convert to pyedflib

e3a0b0b

withmywoessner requested review from cbrnr and sappelhoff as code owners October 6, 2023 03:39

withmywoessner changed the title ~~convert to pyedflib~~ Read EDF files using pyedflib Oct 6, 2023

withmywoessner and others added 2 commits October 5, 2023 22:49

Change warn to Runtime error

f2a7aab

[pre-commit.ci] auto fixes from pre-commit.com hooks

8e73352

for more information, see https://pre-commit.ci

withmywoessner marked this pull request as draft October 8, 2023 06:52

wmvanvliet requested changes Oct 9, 2023

View reviewed changes

cbrnr reviewed Oct 9, 2023

View reviewed changes

cbrnr changed the title ~~Read EDF files using pyedflib~~ Export EDF files using pyedflib Oct 24, 2023

withmywoessner changed the title ~~Export EDF files using pyedflib~~ [WIP] [ENH] Export EDF files using pyedflib Nov 9, 2023

withmywoessner and others added 5 commits November 9, 2023 03:48

Merge branch 'main' of https://github.com/mne-tools/mne-python into e…

4823ddc

…dflibpytest

[pre-commit.ci] auto fixes from pre-commit.com hooks

8ff83bc

for more information, see https://pre-commit.ci

Update mne/export/_edf.py

fab055b

Co-authored-by: Marijn van Vliet <w.m.vanvliet@gmail.com>

Update mne/export/_edf.py

de13e90

Co-authored-by: Marijn van Vliet <w.m.vanvliet@gmail.com>

Update mne/export/_edf.py

0121e2e

Co-authored-by: Marijn van Vliet <w.m.vanvliet@gmail.com>

hofaflo mentioned this pull request Nov 16, 2023

Speed up .edf export with edfio #12218

Merged

withmywoessner closed this Jan 17, 2024

Uh oh!

Conversation

withmywoessner commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wmvanvliet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

withmywoessner Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

withmywoessner Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wmvanvliet commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wmvanvliet commented Oct 9, 2023

Uh oh!

wmvanvliet commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

withmywoessner commented Oct 9, 2023

Uh oh!

cbrnr commented Oct 9, 2023

Uh oh!

withmywoessner commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbrnr commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbrnr commented Oct 9, 2023

Uh oh!

wmvanvliet commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

withmywoessner commented Oct 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbrnr commented Oct 10, 2023

Uh oh!

withmywoessner commented Oct 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larsoner commented Oct 10, 2023

Uh oh!

cbrnr commented Nov 9, 2023

Uh oh!

withmywoessner commented Nov 9, 2023

Uh oh!

cbrnr commented Nov 10, 2023

Uh oh!

withmywoessner commented Oct 6, 2023 •

edited

Loading

withmywoessner Oct 9, 2023 •

edited

Loading

withmywoessner Oct 9, 2023 •

edited

Loading

wmvanvliet commented Oct 9, 2023 •

edited

Loading

wmvanvliet commented Oct 9, 2023 •

edited

Loading

withmywoessner commented Oct 9, 2023 •

edited

Loading

cbrnr commented Oct 9, 2023 •

edited

Loading

wmvanvliet commented Oct 9, 2023 •

edited

Loading

withmywoessner commented Oct 10, 2023 •

edited

Loading

withmywoessner commented Oct 10, 2023 •

edited

Loading