Describe the new feature or enhancement
Using a larger non-public dataset (~1.7GB, cca 1 hour recording at 32Khz) with a large number of missing samples, I noticed that read_raw_neuralynx(..., preload=False) was talking quite long to return (tens of minutes). The bottleneck is the way search for samples corresponding to gap onsets was implemented.
It turns out to map detected gap starts onto sample numbers in the recording, it is using gap_onsets = [np.where(samples == idx) for idx in gap_indices] which will take long (tens of minutes) if a) there is a huge number to samples to search over and b) a large number of gap_indies to search for.
Describe your proposed implementation
PR: #12371
The proposal is to drop np.where and infer gap sample onsets by cumulatively summing the segment sample sizes using np.cumsum(segment_sizes). See PR for details.
Replacing np.where for np.cumsum removes the bottleneck:

Describe possible alternatives
I'm sure there's even more efficient methods out there. But this is already a substantial improvement and removes the bottleneck.
Additional context
No response
Describe the new feature or enhancement
Using a larger non-public dataset (~1.7GB, cca 1 hour recording at 32Khz) with a large number of missing samples, I noticed that
read_raw_neuralynx(..., preload=False)was talking quite long to return (tens of minutes). The bottleneck is the way search for samples corresponding to gap onsets was implemented.It turns out to map detected gap starts onto sample numbers in the recording, it is using
gap_onsets = [np.where(samples == idx) for idx in gap_indices]which will take long (tens of minutes) if a) there is a huge number tosamplesto search over and b) a large number ofgap_indiesto search for.Describe your proposed implementation
PR: #12371
The proposal is to drop
np.whereand infer gap sample onsets by cumulatively summing the segment sample sizes usingnp.cumsum(segment_sizes). See PR for details.Replacing

np.wherefornp.cumsumremoves the bottleneck:Describe possible alternatives
I'm sure there's even more efficient methods out there. But this is already a substantial improvement and removes the bottleneck.
Additional context
No response