Auto-port 5.0: IoUring: extend user data from short to long by netty-project-bot · Pull Request #16806 · netty/netty

netty-project-bot · 2026-05-12T21:49:08Z

Auto-port of #16682 to 5.0
Cherry-picked commit: 7f00b24

Motivation:

This PR extends io_uring userData handling from short to long without changing the existing fast path for short values.

We reuse Netty's IoUringIoHandler to drive some one-shot io_uring operations through a shared DefaultIoUringIoRegistration per EventLoop. In this model, short user data is too limited for real usage: it is not enough for some tracking payloads and cannot reliably carry values such as an fd or other larger identifiers.

Modification:

Keep the existing packed fast path when userData still fits in short.
Add a slow path for larger long userData values.
Track slow-path SQEs with a lightweight per-SQE table (PendingOpSlots) and resolve completions through the live registration table.
Keep the io_uring channel code and IoUringIoOps path compatible with long userData.

Result:
Keep long user data support for custom IoHandle, preserve near-baseline performance for the short user data path, and confine the remaining extra bookkeeping cost to the long user data slow path.

Design:
I also evaluated other tracking strategies, including open addressing and HashMap / LongObjectMap-style mappings.

In practice, they were not a better fit for this workload:

Open addressing with tombstones still introduced extra probe / insert / remove bookkeeping, and its CPU cost became more visible once removals were frequent or the live set grew larger.
HashMap / LongObjectMap-style solutions added extra lookup / indirection overhead on the slow path and were not competitive enough for this use case.
HashMap / LongObjectMap-style solutions add extra gc overhead on the slow path and were not competitive enough for this use case.
Some alternatives improved one side of the workload, but paid for it either with higher steady-state CPU cost or with a more expensive remove path.

The current approach is a better overall tradeoff for the target scenario:

custom IoHandle usage is relatively uncommon
collisions are expected to be rare
resizes should therefore also be uncommon
for non-network io_uring operations, SQEs usually have a short pending lifetime

That makes a simple array-backed per-SQE tracking scheme a good fit here: it keeps the common case straightforward and avoids introducing extra hot-path cost for more general but heavier data structures.

CustomIoHandleBenchmark on the current branch and 4.2 base

Fast path vs baseline

pendingOpsDepth	baseline fast	current fast	delta
4096	1,023,617 ops/s	1,024,338 ops/s	+0.07%
65536	974,757 ops/s	970,427 ops/s	-0.44%

Slow path vs current fast path

pendingOpsDepth	fast path	slow path	delta
4096	1,024,338 ops/s	944,886 ops/s	-7.76%
65536	970,427 ops/s	886,176 ops/s	-8.68%

These numbers are in the expected range for the added slow-path bookkeeping, while keeping the existing short-value fast path intact.

https://gist.github.com/dreamlike-ocean/05e7e272e0e6a9f45f40192229c938dc

fix #16634

Motivation: This PR extends io_uring `userData` handling from `short` to `long` without changing the existing fast path for short values. We reuse Netty's `IoUringIoHandler` to drive some one-shot io_uring operations through a shared `DefaultIoUringIoRegistration` per `EventLoop`. In this model, `short` user data is too limited for real usage: it is not enough for some tracking payloads and cannot reliably carry values such as an `fd` or other larger identifiers. Modification: - Keep the existing packed fast path when `userData` still fits in `short`. - Add a slow path for larger `long userData` values. - Track slow-path SQEs with a lightweight per-SQE table (`PendingOpSlots`) and resolve completions through the live registration table. - Keep the io_uring channel code and `IoUringIoOps` path compatible with long `userData`. Result: Keep `long` user data support for custom `IoHandle`, preserve near-baseline performance for the `short` user data path, and confine the remaining extra bookkeeping cost to the `long` user data slow path. Design: I also evaluated other tracking strategies, including open addressing and `HashMap` / `LongObjectMap`-style mappings. In practice, they were not a better fit for this workload: - Open addressing with tombstones still introduced extra probe / insert / remove bookkeeping, and its CPU cost became more visible once removals were frequent or the live set grew larger. - `HashMap` / `LongObjectMap`-style solutions added extra lookup / indirection overhead on the slow path and were not competitive enough for this use case. - `HashMap` / `LongObjectMap`-style solutions add extra gc overhead on the slow path and were not competitive enough for this use case. - Some alternatives improved one side of the workload, but paid for it either with higher steady-state CPU cost or with a more expensive remove path. The current approach is a better overall tradeoff for the target scenario: - custom `IoHandle` usage is relatively uncommon - collisions are expected to be rare - resizes should therefore also be uncommon - for non-network io_uring operations, SQEs usually have a short pending lifetime That makes a simple array-backed per-SQE tracking scheme a good fit here: it keeps the common case straightforward and avoids introducing extra hot-path cost for more general but heavier data structures. `CustomIoHandleBenchmark` on the current branch and 4.2 base Fast path vs baseline | pendingOpsDepth | baseline fast | current fast | delta | | --- | ---: | ---: | ---: | | 4096 | 1,023,617 ops/s | 1,024,338 ops/s | +0.07% | | 65536 | 974,757 ops/s | 970,427 ops/s | -0.44% | Slow path vs current fast path | pendingOpsDepth | fast path | slow path | delta | | --- | ---: | ---: | ---: | | 4096 | 1,024,338 ops/s | 944,886 ops/s | -7.76% | | 65536 | 970,427 ops/s | 886,176 ops/s | -8.68% | These numbers are in the expected range for the added slow-path bookkeeping, while keeping the existing short-value fast path intact. https://gist.github.com/dreamlike-ocean/05e7e272e0e6a9f45f40192229c938dc fix #16634 --------- Co-authored-by: Chris Vest <christianvest_hansen@apple.com> (cherry picked from commit 7f00b24)

netty-project-bot mentioned this pull request May 12, 2026

IoUring: extend user data from short to long #16682

Merged

chrisvest added this to the 5.0.0.Final milestone May 13, 2026

Fix compilation

3e8caa4

chrisvest enabled auto-merge (squash) May 13, 2026 00:18

chrisvest merged commit 2c08424 into 5.0 May 13, 2026
31 of 33 checks passed

chrisvest deleted the auto-port-pr-16682-to-5.0 branch May 13, 2026 23:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto-port 5.0: IoUring: extend user data from short to long#16806

Auto-port 5.0: IoUring: extend user data from short to long#16806
chrisvest merged 2 commits into
5.0from
auto-port-pr-16682-to-5.0

netty-project-bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

netty-project-bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants