Skip to content

get_data() calls are needlessly expensive #11899

@Aaronearlerichardson

Description

@Aaronearlerichardson

Description of the problem

The Epochs object is a very convenient tool to work with when handling the large amounts of data that are typical in clinical applications. However, when trying to iterate over the channels and events of the object, there is multiple orders of magnitude difference in the speed of execution depending on how many calls to get_data used. This is despite the fact that the data has already been preloaded into memory.

I tested the performance difference on the mne standard ECoG data, with 7 events and 98 channels and found an approximately 7000x slowdown. This can be problematic with clinical data, with many SEEG arrays containing 100 - 300 channels and 200 events.

Steps to reproduce

from timeit import timeit
import mne
from mne_bids import BIDSPath, read_raw_bids


bids_path = BIDSPath(
    root=mne.datasets.epilepsy_ecog.data_path(),
    subject="pt1",
    session="presurgery",
    task="ictal",
    datatype="ieeg",
    extension=".vhdr",
)
raw = read_raw_bids(bids_path=bids_path, verbose="error")
events, ids = mne.events_from_annotations(raw)
trials = mne.Epochs(raw, events, event_id=ids, tmin=-10, tmax=10, preload=True, verbose=False)
ids_rev = {v: k for k, v in ids.items()}
trials.drop_bad()


def iter_over_ch_ev():
    for ch in trials.ch_names:  # 98 channels
        for ev in events[:, 2]:  # 7 events
            yield trials.get_data([ch], item=ids_rev[ev])  # 98 X 7 = 686 calls


def iter_over_array():
    arr = trials.get_data()  # 1 call
    for i in range(arr.shape[1]):
        for j in range(arr.shape[0]):
            yield arr[j, i]


ch_ev = timeit(lambda: list(iter_over_ch_ev()), number=10)
arr = timeit(lambda: list(iter_over_array()), number=10)

print(f'iterating over mne object: {ch_ev:.4}s')
print(f'iterating over array: {arr:.4}s')
print(f'array is {ch_ev / arr:.2f}x faster')

Link to data

No response

Expected results

Execution times should be roughly equal

Actual results

iterating over mne object: 34.25s
iterating over array: 0.004821s
array is 7103.70x faster

Additional information

Platform Windows-10-10.0.22621-SP0
Python 3.11.4 | packaged by Anaconda, Inc. | (main, Jul 5 2023, 13:38:37) [MSC v.1916 64 bit (AMD64)]
Executable C:\Users\Jakda\anaconda3\envs\ieeg\python.exe
CPU Intel64 Family 6 Model 140 Stepping 1, GenuineIntel (8 cores)
Memory 63.8 GB
Core
├☑ mne 1.4.2
├☑ numpy 1.25.2 (MKL 2023.1-Product with 4 threads)
├☑ scipy 1.11.1
├☑ matplotlib 3.7.1Backend QtAgg is interactive backend. Turning interactive mode on.
(backend=QtAgg)
├☑ pooch 1.7.0
└☑ jinja2 3.1.2
Numerical (optional)
├☑ sklearn 1.2.2
├☑ nibabel 5.1.0
├☑ dipy 1.7.0
├☑ pandas 2.0.3
└☐ unavailable numba, nilearn, openmeeg, cupy
Visualization (optional)
├☑ pyvista 0.39.1 (OpenGL 4.5.0 - Build 31.0.101.4338 via Intel(R) Iris(R) Xe Graphics)
├☑ pyvistaqt 0.11.0
├☑ vtk 9.2.6
├☑ qtpy 2.3.1 (PyQt5=5.15.2)
├☑ pyqtgraph 0.13.3
├☑ mne-qt-browser 0.5.1
└☐ unavailable ipyvtklink, ipympl
Ecosystem (optional)
├☑ mne-bids 0.12
└☐ unavailable mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions