Improve adaptive allocator thread local performance (#15741) by chrisvest · Pull Request #16107 · netty/netty

chrisvest · 2026-01-06T18:12:58Z

Motivation:

Adaptive allocator perform costly atomic operations in the thread local path, which reduce its performance

Modification:

Reduce the amount of atomic operations in the thread local allocation's fast path

Result:

Fixes #15571

These are the different variations I want to test:

Uses unguarded Recyclers
Implements "compressed" local free list (LIFO)
Use a mpsc q for the reuse chunk q in the thread-local case NO VISIBLE IMPROVEMENTS
Guards nextInLine's getAndSet with a null check via volatile get first, since size classed chunks rarely end up into nextInLine (i.e. which is mostly null)
NO VISIBLE IMPROVEMENTS
Implements a var handle based MpscIntQueue (done at 1c4e1e4)
NO VISIBLE IMPROVEMENTS
Remove the live/raw ref cnt as mentioned at Make AdaptiveByteBuf.setBytes faster #15736 (comment)
Remove the ref count for size classed chunks (see 8953bbe and
8cb1bf0)
Use the "pinned" Recycler instead of the FastThreadLocal-based one

(cherry picked from commit accd981)

Motivation: Adaptive allocator perform costly atomic operations in the thread local path, which reduce its performance Modification: Reduce the amount of atomic operations in the thread local allocation's fast path Result: Fixes netty#15571 These are the different variations I want to test: - [x] Uses unguarded `Recycler`s - [x] Implements "compressed" local free list (LIFO) - [x] Use a mpsc q for the reuse chunk q in the thread-local case **NO VISIBLE IMPROVEMENTS** - [x] Guards `nextInLine`'s `getAndSet` with a null check via volatile `get` first, since size classed chunks rarely end up into `nextInLine` (i.e. which is mostly `null`) **NO VISIBLE IMPROVEMENTS** - [x] Implements a var handle based `MpscIntQueue` (done at 1c4e1e4) **NO VISIBLE IMPROVEMENTS** - [x] Remove the live/raw ref cnt as mentioned at netty#15736 (comment) - [ ] Remove the ref count for size classed chunks (see 8953bbe and 8cb1bf0) - [ ] Use the "pinned" Recycler instead of the `FastThreadLocal`-based one (cherry picked from commit accd981)

chrisvest added this to the 5.0.0.Final milestone Jan 6, 2026

chrisvest enabled auto-merge (squash) January 6, 2026 18:13

chrisvest merged commit c3f9ede into netty:5.0 Jan 7, 2026
31 of 33 checks passed

chrisvest deleted the 5.0-adaptive-tl-perf branch January 7, 2026 04:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve adaptive allocator thread local performance (#15741)#16107

Improve adaptive allocator thread local performance (#15741)#16107
chrisvest merged 1 commit into
netty:5.0from
chrisvest:5.0-adaptive-tl-perf

chrisvest commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chrisvest commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants