feat: make `ak.combinations` faster on GPU by using `cp.searchsorted` to compute output list indexes by ianna · Pull Request #3798 · scikit-hep/awkward

ianna · 2026-01-12T17:07:53Z

The GPU bottleneck in ak.combinations came from a Python loop computing output list indexes with repeated memset calls. Replacing it with a vectorized cp.searchsorted implementation removes the loop and dramatically improves performance.

This PR extends the existing approach to the RegularArray layout and improves uniformity between ListOffsetArray and RegularArray handling.

This work is inspired by @shwina’s PR #3795. Thanks to @shwina for the original work and guidance.

Before:

regular_layout = ak.contents.RegularArray(ak.contents.NumpyArray(values),size=6)
reg_arr = ak.Array(regular_layout)
reg_gpu_arr = ak.to_backend(reg_arr, "cuda")
cp.cuda.get_current_stream().synchronize()
ak.combinations(reg_gpu_arr, n=2)
result = timeit.timeit(lambda: ak.combinations(reg_gpu_arr, n=2),  number=10)
print(f"Time taken for ak.combinations: {result / 10:.4f} seconds")

Time taken for ak.combinations: 7.9204 seconds

After:

regular_layout = ak.contents.RegularArray(ak.contents.NumpyArray(values),size=6)
reg_arr = ak.Array(regular_layout)
reg_gpu_arr = ak.to_backend(reg_arr, "cuda")
cp.cuda.get_current_stream().synchronize()
ak.combinations(reg_gpu_arr, n=2)
result = timeit.timeit(lambda: ak.combinations(reg_gpu_arr, n=2),  number=10)
print(f"Time taken for ak.combinations: {result / 10:.4f} seconds")
Time taken for ak.combinations: 0.0013 seconds

This change reduces the runtime of ak.combinations on CUDA-backed RegularArrays by several orders of magnitude, bringing performance in line with expectations for regular, fixed-size layouts.

codecov · 2026-01-12T17:16:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.55%. Comparing base (d47fbef) to head (99a80b4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

see 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-01-12T17:27:03Z

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3798

maxymnaumchyk

awesome!

ianna added 2 commits January 12, 2026 11:45

remove python for loop bottleneck

e9e308d

pre-commit fixes

99a80b4

ianna requested a review from maxymnaumchyk January 12, 2026 17:17

maxymnaumchyk approved these changes Jan 13, 2026

View reviewed changes

ianna merged commit 1b7e3d6 into scikit-hep:main Jan 13, 2026
39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make `ak.combinations` faster on GPU by using `cp.searchsorted` to compute output list indexes#3798

feat: make `ak.combinations` faster on GPU by using `cp.searchsorted` to compute output list indexes#3798
ianna merged 2 commits intoscikit-hep:mainfrom
ianna:ianna/searchsorted_in_regular_array_combinations_kernel

ianna commented Jan 12, 2026

Uh oh!

codecov bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

maxymnaumchyk left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ianna commented Jan 12, 2026

Uh oh!

codecov bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jan 12, 2026

Uh oh!

maxymnaumchyk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 12, 2026 •

edited

Loading