Description of the problem
The Epochs object is a very convenient tool to work with when handling the large amounts of data that are typical in clinical applications. However, when trying to iterate over the channels and events of the object, there is multiple orders of magnitude difference in the speed of execution depending on how many calls to get_data used. This is despite the fact that the data has already been preloaded into memory.
I tested the performance difference on the mne standard ECoG data, with 7 events and 98 channels and found an approximately 7000x slowdown. This can be problematic with clinical data, with many SEEG arrays containing 100 - 300 channels and 200 events.
Steps to reproduce
from timeit import timeit
import mne
from mne_bids import BIDSPath, read_raw_bids
bids_path = BIDSPath(
root=mne.datasets.epilepsy_ecog.data_path(),
subject="pt1",
session="presurgery",
task="ictal",
datatype="ieeg",
extension=".vhdr",
)
raw = read_raw_bids(bids_path=bids_path, verbose="error")
events, ids = mne.events_from_annotations(raw)
trials = mne.Epochs(raw, events, event_id=ids, tmin=-10, tmax=10, preload=True, verbose=False)
ids_rev = {v: k for k, v in ids.items()}
trials.drop_bad()
def iter_over_ch_ev():
for ch in trials.ch_names: # 98 channels
for ev in events[:, 2]: # 7 events
yield trials.get_data([ch], item=ids_rev[ev]) # 98 X 7 = 686 calls
def iter_over_array():
arr = trials.get_data() # 1 call
for i in range(arr.shape[1]):
for j in range(arr.shape[0]):
yield arr[j, i]
ch_ev = timeit(lambda: list(iter_over_ch_ev()), number=10)
arr = timeit(lambda: list(iter_over_array()), number=10)
print(f'iterating over mne object: {ch_ev:.4}s')
print(f'iterating over array: {arr:.4}s')
print(f'array is {ch_ev / arr:.2f}x faster')
Link to data
No response
Expected results
Execution times should be roughly equal
Actual results
iterating over mne object: 34.25s
iterating over array: 0.004821s
array is 7103.70x faster
Additional information
Platform Windows-10-10.0.22621-SP0
Python 3.11.4 | packaged by Anaconda, Inc. | (main, Jul 5 2023, 13:38:37) [MSC v.1916 64 bit (AMD64)]
Executable C:\Users\Jakda\anaconda3\envs\ieeg\python.exe
CPU Intel64 Family 6 Model 140 Stepping 1, GenuineIntel (8 cores)
Memory 63.8 GB
Core
├☑ mne 1.4.2
├☑ numpy 1.25.2 (MKL 2023.1-Product with 4 threads)
├☑ scipy 1.11.1
├☑ matplotlib 3.7.1Backend QtAgg is interactive backend. Turning interactive mode on.
(backend=QtAgg)
├☑ pooch 1.7.0
└☑ jinja2 3.1.2
Numerical (optional)
├☑ sklearn 1.2.2
├☑ nibabel 5.1.0
├☑ dipy 1.7.0
├☑ pandas 2.0.3
└☐ unavailable numba, nilearn, openmeeg, cupy
Visualization (optional)
├☑ pyvista 0.39.1 (OpenGL 4.5.0 - Build 31.0.101.4338 via Intel(R) Iris(R) Xe Graphics)
├☑ pyvistaqt 0.11.0
├☑ vtk 9.2.6
├☑ qtpy 2.3.1 (PyQt5=5.15.2)
├☑ pyqtgraph 0.13.3
├☑ mne-qt-browser 0.5.1
└☐ unavailable ipyvtklink, ipympl
Ecosystem (optional)
├☑ mne-bids 0.12
└☐ unavailable mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline
Description of the problem
The
Epochsobject is a very convenient tool to work with when handling the large amounts of data that are typical in clinical applications. However, when trying to iterate over the channels and events of the object, there is multiple orders of magnitude difference in the speed of execution depending on how many calls toget_dataused. This is despite the fact that the data has already been preloaded into memory.I tested the performance difference on the mne standard ECoG data, with
7events and98channels and found an approximately7000xslowdown. This can be problematic with clinical data, with many SEEG arrays containing100 - 300channels and200events.Steps to reproduce
Link to data
No response
Expected results
Execution times should be roughly equal
Actual results
Additional information
Platform Windows-10-10.0.22621-SP0
Python 3.11.4 | packaged by Anaconda, Inc. | (main, Jul 5 2023, 13:38:37) [MSC v.1916 64 bit (AMD64)]
Executable C:\Users\Jakda\anaconda3\envs\ieeg\python.exe
CPU Intel64 Family 6 Model 140 Stepping 1, GenuineIntel (8 cores)
Memory 63.8 GB
Core
├☑ mne 1.4.2
├☑ numpy 1.25.2 (MKL 2023.1-Product with 4 threads)
├☑ scipy 1.11.1
├☑ matplotlib 3.7.1Backend QtAgg is interactive backend. Turning interactive mode on.
(backend=QtAgg)
├☑ pooch 1.7.0
└☑ jinja2 3.1.2
Numerical (optional)
├☑ sklearn 1.2.2
├☑ nibabel 5.1.0
├☑ dipy 1.7.0
├☑ pandas 2.0.3
└☐ unavailable numba, nilearn, openmeeg, cupy
Visualization (optional)
├☑ pyvista 0.39.1 (OpenGL 4.5.0 - Build 31.0.101.4338 via Intel(R) Iris(R) Xe Graphics)
├☑ pyvistaqt 0.11.0
├☑ vtk 9.2.6
├☑ qtpy 2.3.1 (PyQt5=5.15.2)
├☑ pyqtgraph 0.13.3
├☑ mne-qt-browser 0.5.1
└☐ unavailable ipyvtklink, ipympl
Ecosystem (optional)
├☑ mne-bids 0.12
└☐ unavailable mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline