feat: Make `ak.combinations` faster on GPU by using `cp.searchsorted` to compute output list indexes by shwina · Pull Request #3795 · scikit-hep/awkward

shwina · 2026-01-10T13:16:59Z

By far the bottleneck in ak.combinations for the GPU is the Python loop for computing the output list indexes. This can be done in a vectorized fashion using cp.searchsorted. As written, this does multiple memset operations in a loop which can be slow.

The improvement is massive.

Before:

lists=100,000 total_values=1,000,668 mean_len≈10
estimated total_pairs=5,009,024
Time taken for ak.combinations: 2.16 seconds

After:

lists=100,000 total_values=1,000,668 mean_len≈10
estimated total_pairs=5,009,024
Time taken for ak.combinations: 0.0015 seconds

Benchmarking script used:

import numpy as np
import cupy as cp
import awkward as ak
import timeit


# Create random jagged data
np.random.seed(42)
num_lists = 100_000
mean_len = 10
max_len = 40

# Generate random lengths and build offsets
lengths = np.random.poisson(lam=mean_len, size=num_lists).astype(np.int64)
lengths = np.clip(lengths, 0, max_len)
offsets = np.empty(num_lists + 1, dtype=np.int64)
offsets[0] = 0
np.cumsum(lengths, out=offsets[1:])

total_values = int(offsets[-1])
values = np.arange(total_values, dtype=np.int64)

# Build awkward array
layout = ak.contents.ListOffsetArray(
    ak.index.Index64(offsets),
    ak.contents.NumpyArray(values)
)
arr = ak.Array(layout)

# Estimate total pairs
total_pairs = sum(n * (n - 1) // 2 for n in lengths)
print(f"lists={num_lists:,} total_values={total_values:,} mean_len≈{mean_len}")
print(f"estimated total_pairs={total_pairs:,}")

# Move to CUDA
gpu_arr = ak.to_backend(arr, "cuda")
cp.cuda.get_current_stream().synchronize()

# Benchmark using timeit
# warmup:
ak.combinations(gpu_arr, n=2)
result = timeit.timeit(lambda: ak.combinations(gpu_arr, n=2), number=10)
print(f"Time taken for ak.combinations: {result / 10:.4f} seconds")

codecov · 2026-01-10T13:21:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.65%. Comparing base (43e2403) to head (415db5b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

see 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2026-01-10T13:28:37Z

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3795

ianna

@shwina - Thanks for this contribution — the performance improvement is genuinely impressive. Moving the list‑index computation from a Python loop to a vectorized cp.searchsorted call is exactly the kind of backend‑aware optimization that pays off at scale. The benchmark numbers you included make the impact very clear.

What looks great

Massive speedup: reducing runtime from ~2.16s → ~0.0015s for the provided workload is a transformative improvement for GPU users.

Thank you! 🙏

Use searchsorted to compute list indexes

415db5b

shwina changed the title ~~Make ak.combinations faster on GPU by using cp.searchsorted to compute output list indexes~~ feat: Make ak.combinations faster on GPU by using cp.searchsorted to compute output list indexes Jan 10, 2026

shwina mentioned this pull request Jan 10, 2026

Investigate what is needed to implement ak.combinations efficiently NVIDIA/cccl#7161

Open

ianna approved these changes Jan 11, 2026

View reviewed changes

ianna merged commit c6fc079 into scikit-hep:main Jan 11, 2026
39 of 40 checks passed

ianna mentioned this pull request Jan 12, 2026

feat: make ak.combinations faster on GPU by using cp.searchsorted to compute output list indexes #3798

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Make `ak.combinations` faster on GPU by using `cp.searchsorted` to compute output list indexes#3795

feat: Make `ak.combinations` faster on GPU by using `cp.searchsorted` to compute output list indexes#3795
ianna merged 1 commit intoscikit-hep:mainfrom
shwina:fix-list-index-to-use-searchsorted

shwina commented Jan 10, 2026

Uh oh!

codecov bot commented Jan 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 10, 2026

Uh oh!

ianna left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shwina commented Jan 10, 2026

Uh oh!

codecov bot commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jan 10, 2026

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 10, 2026 •

edited

Loading