perf: detect & fastpath no-op range slices in all layout types by pfackeldey · Pull Request #3642 · scikit-hep/awkward

pfackeldey · 2025-09-08T05:47:30Z

This PR checks for trivial no-op slices in all layout types that essentially do nothing. If that's the case we can directly return self instead of constructing new python classes (and loosing their cached_properties), doing plenty of unnecessary repetitive checks, etc.

The most common operation that benefits from this improvement are __setitem__ calls on RecordArrays, see the following:

from coffea.nanoevents import NanoEventsFactory

events = NanoEventsFactory.from_root({"nanoaod.root": "Events"}, mode="eager").events()

# before this PR
%timeit events["Jet"] = events["Jet"]
10.4 ms ± 14.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# after this PR -> 32x speedup
%timeit events["Jet"] = events["Jet"]
327 μs ± 1.79 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# or nested:
# before this PR
%timeit events["Jet", "pt"] = events["Jet", "pt"]
11.2 ms ± 250 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# after this PR -> 16x speedup
%timeit events["Jet", "pt"] = events["Jet", "pt"]
700 μs ± 5.31 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Similar speedups are seen for VirtualNDArrays.

I expect this to speedup multiple other operations that traverse layouts as well.

Adding this to TypeTracerArrays is maybe possible (?) but much more complicated as the slices can be TypeTracerArrays themselves and thus need special & careful handling. I tried this, but the full logic turned out to be a bit complicated, and needs more thoughts -> maybe something for a future PR.

This improvement should become pretty noticeable in coffea analysis as it is highly common to store new fields on events, and currently that's one of the most expensive operations - even more expensive than running awkward kernels. And that should not be the case given that this is a metadata only operation.

This PR is reducing the total number of python allocations (new instances, etc) for a single events["Jet", "pt"] = events["Jet", "pt"] setitem from 8049 to 1692.

codecov · 2025-09-08T05:54:53Z

Codecov Report

❌ Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.66%. Comparing base (b749e49) to head (f408655).
⚠️ Report is 420 commits behind head on main.

Files with missing lines	Patch %	Lines
src/awkward/index.py	50.00%	1 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
src/awkward/contents/bytemaskedarray.py	`87.69% <100.00%> (-1.96%)`	⬇️
src/awkward/contents/indexedarray.py	`85.06% <100.00%> (+4.02%)`	⬆️
src/awkward/contents/indexedoptionarray.py	`89.12% <100.00%> (+0.90%)`	⬆️
src/awkward/contents/listarray.py	`90.59% <100.00%> (+1.15%)`	⬆️
src/awkward/contents/listoffsetarray.py	`81.13% <100.00%> (-1.73%)`	⬇️
src/awkward/contents/numpyarray.py	`91.38% <100.00%> (-0.13%)`	⬇️
src/awkward/contents/recordarray.py	`85.10% <100.00%> (-0.09%)`	⬇️
src/awkward/contents/unionarray.py	`86.37% <100.00%> (+1.12%)`	⬆️
src/awkward/contents/unmaskedarray.py	`76.72% <100.00%> (+2.52%)`	⬆️
src/awkward/index.py	`90.00% <50.00%> (+0.28%)`	⬆️

... and 187 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-09-08T06:00:58Z

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3642

pfackeldey · 2025-09-08T07:31:32Z

@ianna I could add this fast path also to ListOffsetArray. This would bring the runtime down even further from 474 μs (904 μs) to 331 μs (708 μs) in the above (nested) setitem examples. The main improvement is however here in the RecordArray layout type as that one can become pretty huge (many fields).
What do you think about adding this to ListOffsetArray as well?

ianna

@pfackeldey - thanks! I wonder if moving the following check start == 0 and stop == self.length up to line 448 is all what is needed here.

src/awkward/contents/recordarray.py

src/awkward/contents/numpyarray.py

ianna

@pfackeldey - looks good to me! Thanks! I'll enable auto merge.

pfackeldey added 2 commits September 8, 2025 07:37

perf: detect & fastpath no-op range slices in RecordArrays

aa88082

remove unnecessary import

aa782d7

pfackeldey requested a review from ianna September 8, 2025 06:07

ianna reviewed Sep 8, 2025

View reviewed changes

src/awkward/contents/recordarray.py Outdated Show resolved Hide resolved

pfackeldey added 4 commits September 8, 2025 10:18

fix condition to check for step size == 1 and potential unknown_length

fc65ef3

add same shortcut to ListOffsetArray layouts

22d8a81

add shortcut to all layout types

af284bd

forgot union and numpy array layouts

d0e52bc

ianna reviewed Sep 9, 2025

View reviewed changes

src/awkward/contents/numpyarray.py Outdated Show resolved Hide resolved

pfackeldey changed the title ~~perf: detect & fastpath no-op range slices in RecordArrays~~ perf: detect & fastpath no-op range slices in all layout types Sep 9, 2025

pfackeldey and others added 7 commits September 9, 2025 14:37

remove unnecessary condition for known data

88ded56

style: pre-commit fixes

e5f5d8a

forgot removing the condition for RecordArrays

05d1ac9

style: pre-commit fixes

14d7bab

...and NumpyArray

ed5a9da

add this shortcut also to noop slices on Index instances

3048ca6

Merge branch 'main' into perf_recordarray_getitem_range

f408655

ianna approved these changes Sep 10, 2025

View reviewed changes

ianna enabled auto-merge (squash) September 10, 2025 20:42

ianna merged commit 8b5c487 into scikit-hep:main Sep 10, 2025
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: detect & fastpath no-op range slices in all layout types#3642

perf: detect & fastpath no-op range slices in all layout types#3642
ianna merged 13 commits intoscikit-hep:mainfrom
pfackeldey:perf_recordarray_getitem_range

pfackeldey commented Sep 8, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 8, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

pfackeldey commented Sep 8, 2025 •

edited

Loading

Uh oh!

ianna left a comment

Uh oh!

Uh oh!

Uh oh!

ianna left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pfackeldey commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Sep 8, 2025

Uh oh!

pfackeldey commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pfackeldey commented Sep 8, 2025 •

edited

Loading

codecov bot commented Sep 8, 2025 •

edited

Loading

pfackeldey commented Sep 8, 2025 •

edited

Loading