Skip to content

Improve adaptive allocator thread local performance (#15741)#16107

Merged
chrisvest merged 1 commit into
netty:5.0from
chrisvest:5.0-adaptive-tl-perf
Jan 7, 2026
Merged

Improve adaptive allocator thread local performance (#15741)#16107
chrisvest merged 1 commit into
netty:5.0from
chrisvest:5.0-adaptive-tl-perf

Conversation

@chrisvest

Copy link
Copy Markdown
Member

Motivation:

Adaptive allocator perform costly atomic operations in the thread local path, which reduce its performance

Modification:

Reduce the amount of atomic operations in the thread local allocation's fast path

Result:

Fixes #15571

These are the different variations I want to test:

  • Uses unguarded Recyclers
  • Implements "compressed" local free list (LIFO)
  • Use a mpsc q for the reuse chunk q in the thread-local case NO VISIBLE IMPROVEMENTS
  • Guards nextInLine's getAndSet with a null check via volatile get first, since size classed chunks rarely end up into nextInLine (i.e. which is mostly null)
    NO VISIBLE IMPROVEMENTS
  • Implements a var handle based MpscIntQueue (done at 1c4e1e4)
    NO VISIBLE IMPROVEMENTS
  • Remove the live/raw ref cnt as mentioned at Make AdaptiveByteBuf.setBytes faster #15736 (comment)
  • Remove the ref count for size classed chunks (see 8953bbe and
    8cb1bf0)
  • Use the "pinned" Recycler instead of the FastThreadLocal-based one

(cherry picked from commit accd981)

Motivation:

Adaptive allocator perform costly atomic operations in the thread local
path, which reduce its performance

Modification:

Reduce the amount of atomic operations in the thread local allocation's
fast path

Result:

Fixes netty#15571

These are the different variations I want to test:

- [x] Uses unguarded `Recycler`s
- [x] Implements "compressed" local free list (LIFO)
- [x] Use a mpsc q for the reuse chunk q in the thread-local case
**NO VISIBLE IMPROVEMENTS**
- [x] Guards `nextInLine`'s `getAndSet` with a null check via volatile
`get` first, since size classed chunks rarely end up into `nextInLine`
(i.e. which is mostly `null`)
**NO VISIBLE IMPROVEMENTS**
- [x] Implements a var handle based `MpscIntQueue` (done at
1c4e1e4)
**NO VISIBLE IMPROVEMENTS**
- [x] Remove the live/raw ref cnt as mentioned at
netty#15736 (comment)
- [ ] Remove the ref count for size classed chunks (see
8953bbe and
8cb1bf0)
- [ ] Use the "pinned" Recycler instead of the `FastThreadLocal`-based
one

(cherry picked from commit accd981)
@chrisvest chrisvest added this to the 5.0.0.Final milestone Jan 6, 2026
@chrisvest chrisvest enabled auto-merge (squash) January 6, 2026 18:13
@chrisvest chrisvest merged commit c3f9ede into netty:5.0 Jan 7, 2026
31 of 33 checks passed
@chrisvest chrisvest deleted the 5.0-adaptive-tl-perf branch January 7, 2026 04:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants