Skip to content

feat: make ak.combinations faster on GPU by using cp.searchsorted to compute output list indexes#3798

Merged
ianna merged 2 commits intoscikit-hep:mainfrom
ianna:ianna/searchsorted_in_regular_array_combinations_kernel
Jan 13, 2026
Merged

feat: make ak.combinations faster on GPU by using cp.searchsorted to compute output list indexes#3798
ianna merged 2 commits intoscikit-hep:mainfrom
ianna:ianna/searchsorted_in_regular_array_combinations_kernel

Conversation

@ianna
Copy link
Copy Markdown
Member

@ianna ianna commented Jan 12, 2026

The GPU bottleneck in ak.combinations came from a Python loop computing output list indexes with repeated memset calls. Replacing it with a vectorized cp.searchsorted implementation removes the loop and dramatically improves performance.

This PR extends the existing approach to the RegularArray layout and improves uniformity between ListOffsetArray and RegularArray handling.

This work is inspired by @shwina’s PR #3795. Thanks to @shwina for the original work and guidance.

Before:

regular_layout = ak.contents.RegularArray(ak.contents.NumpyArray(values),size=6)
reg_arr = ak.Array(regular_layout)
reg_gpu_arr = ak.to_backend(reg_arr, "cuda")
cp.cuda.get_current_stream().synchronize()
ak.combinations(reg_gpu_arr, n=2)
result = timeit.timeit(lambda: ak.combinations(reg_gpu_arr, n=2),  number=10)
print(f"Time taken for ak.combinations: {result / 10:.4f} seconds")

Time taken for ak.combinations: 7.9204 seconds

After:

regular_layout = ak.contents.RegularArray(ak.contents.NumpyArray(values),size=6)
reg_arr = ak.Array(regular_layout)
reg_gpu_arr = ak.to_backend(reg_arr, "cuda")
cp.cuda.get_current_stream().synchronize()
ak.combinations(reg_gpu_arr, n=2)
result = timeit.timeit(lambda: ak.combinations(reg_gpu_arr, n=2),  number=10)
print(f"Time taken for ak.combinations: {result / 10:.4f} seconds")
Time taken for ak.combinations: 0.0013 seconds

This change reduces the runtime of ak.combinations on CUDA-backed RegularArrays by several orders of magnitude, bringing performance in line with expectations for regular, fixed-size layouts.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.55%. Comparing base (d47fbef) to head (99a80b4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

see 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ianna ianna requested a review from maxymnaumchyk January 12, 2026 17:17
@github-actions
Copy link
Copy Markdown

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3798

Copy link
Copy Markdown
Collaborator

@maxymnaumchyk maxymnaumchyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

@ianna ianna merged commit 1b7e3d6 into scikit-hep:main Jan 13, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants