Skip to content

perf: detect & fastpath no-op range slices in all layout types#3642

Merged
ianna merged 13 commits intoscikit-hep:mainfrom
pfackeldey:perf_recordarray_getitem_range
Sep 10, 2025
Merged

perf: detect & fastpath no-op range slices in all layout types#3642
ianna merged 13 commits intoscikit-hep:mainfrom
pfackeldey:perf_recordarray_getitem_range

Conversation

@pfackeldey
Copy link
Copy Markdown
Collaborator

@pfackeldey pfackeldey commented Sep 8, 2025

This PR checks for trivial no-op slices in all layout types that essentially do nothing. If that's the case we can directly return self instead of constructing new python classes (and loosing their cached_properties), doing plenty of unnecessary repetitive checks, etc.

The most common operation that benefits from this improvement are __setitem__ calls on RecordArrays, see the following:

from coffea.nanoevents import NanoEventsFactory

events = NanoEventsFactory.from_root({"nanoaod.root": "Events"}, mode="eager").events()

# before this PR
%timeit events["Jet"] = events["Jet"]
10.4 ms ± 14.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# after this PR -> 32x speedup
%timeit events["Jet"] = events["Jet"]
327 μs ± 1.79 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# or nested:
# before this PR
%timeit events["Jet", "pt"] = events["Jet", "pt"]
11.2 ms ± 250 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# after this PR -> 16x speedup
%timeit events["Jet", "pt"] = events["Jet", "pt"]
700 μs ± 5.31 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Similar speedups are seen for VirtualNDArrays.

I expect this to speedup multiple other operations that traverse layouts as well.

Adding this to TypeTracerArrays is maybe possible (?) but much more complicated as the slices can be TypeTracerArrays themselves and thus need special & careful handling. I tried this, but the full logic turned out to be a bit complicated, and needs more thoughts -> maybe something for a future PR.

This improvement should become pretty noticeable in coffea analysis as it is highly common to store new fields on events, and currently that's one of the most expensive operations - even more expensive than running awkward kernels. And that should not be the case given that this is a metadata only operation.


This PR is reducing the total number of python allocations (new instances, etc) for a single events["Jet", "pt"] = events["Jet", "pt"] setitem from 8049 to 1692.

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 8, 2025

Codecov Report

❌ Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.66%. Comparing base (b749e49) to head (f408655).
⚠️ Report is 420 commits behind head on main.

Files with missing lines Patch % Lines
src/awkward/index.py 50.00% 1 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/awkward/contents/bytemaskedarray.py 87.69% <100.00%> (-1.96%) ⬇️
src/awkward/contents/indexedarray.py 85.06% <100.00%> (+4.02%) ⬆️
src/awkward/contents/indexedoptionarray.py 89.12% <100.00%> (+0.90%) ⬆️
src/awkward/contents/listarray.py 90.59% <100.00%> (+1.15%) ⬆️
src/awkward/contents/listoffsetarray.py 81.13% <100.00%> (-1.73%) ⬇️
src/awkward/contents/numpyarray.py 91.38% <100.00%> (-0.13%) ⬇️
src/awkward/contents/recordarray.py 85.10% <100.00%> (-0.09%) ⬇️
src/awkward/contents/unionarray.py 86.37% <100.00%> (+1.12%) ⬆️
src/awkward/contents/unmaskedarray.py 76.72% <100.00%> (+2.52%) ⬆️
src/awkward/index.py 90.00% <50.00%> (+0.28%) ⬆️

... and 187 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 8, 2025

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3642

@pfackeldey pfackeldey requested a review from ianna September 8, 2025 06:07
@pfackeldey
Copy link
Copy Markdown
Collaborator Author

pfackeldey commented Sep 8, 2025

@ianna I could add this fast path also to ListOffsetArray. This would bring the runtime down even further from 474 μs (904 μs) to 331 μs (708 μs) in the above (nested) setitem examples. The main improvement is however here in the RecordArray layout type as that one can become pretty huge (many fields).
What do you think about adding this to ListOffsetArray as well?

Copy link
Copy Markdown
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pfackeldey - thanks! I wonder if moving the following check start == 0 and stop == self.length up to line 448 is all what is needed here.

@pfackeldey pfackeldey changed the title perf: detect & fastpath no-op range slices in RecordArrays perf: detect & fastpath no-op range slices in all layout types Sep 9, 2025
Copy link
Copy Markdown
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pfackeldey - looks good to me! Thanks! I'll enable auto merge.

@ianna ianna enabled auto-merge (squash) September 10, 2025 20:42
@ianna ianna merged commit 8b5c487 into scikit-hep:main Sep 10, 2025
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants