Skip to content

[ENH] Speed up read_raw_neuralynx() when Neuralynx recording has many gaps #12370

@KristijanArmeni

Description

@KristijanArmeni

Describe the new feature or enhancement

Using a larger non-public dataset (~1.7GB, cca 1 hour recording at 32Khz) with a large number of missing samples, I noticed that read_raw_neuralynx(..., preload=False) was talking quite long to return (tens of minutes). The bottleneck is the way search for samples corresponding to gap onsets was implemented.

It turns out to map detected gap starts onto sample numbers in the recording, it is using gap_onsets = [np.where(samples == idx) for idx in gap_indices] which will take long (tens of minutes) if a) there is a huge number to samples to search over and b) a large number of gap_indies to search for.

Describe your proposed implementation

PR: #12371

The proposal is to drop np.where and infer gap sample onsets by cumulatively summing the segment sample sizes using np.cumsum(segment_sizes). See PR for details.

Replacing np.where for np.cumsum removes the bottleneck:
image

Describe possible alternatives

I'm sure there's even more efficient methods out there. But this is already a substantial improvement and removes the bottleneck.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions