Skip to content

Improve adaptive unshared size class allocation's fast-path #15530

@franz1981

Description

@franz1981

Thanks to #15525 we've realized that there's still lot of room for the adaptive allocation fast-path and found lot of issues and unoptimized behaviours.

Let me summarize few of them in this issue so we can address them (eventually).

The reference benchmark is franz1981@bc334c9 which include a "fake" adaptive allocator which perform the same measured heavy (atomic) operations of the current adaptive allocator in the thread-local and size-class scenario.

By running this benchmark and profiling it vs adaptive allocator there are few issues:

too many (uncontended) atomics:

  • hot: chunk retain/release (xadd, cas)
  • hot: segment mpsc int q's offer (cas)
  • cold: shared mpmc q's offer/polll

reference count checks for chunks fall off the optimized path of buffers:

(see #15525 (comment))

weird recycling behaviour:

The Recycler used for the thread local allocation of ByteBuf fail to inline its atomic int updater, see

Image

and its cost is just too high compared to what's performed by Mimalloc (which is a linked list's top removal and link).

unspecialized logic

The thread-local (unshared) size-class magazine allocation path is not optimized for such context of execution:

  1. there's no need of a mpmc shared queue since no other threads can allocate "without locks" because there's no lock: we can use a mpsc one
  2. there's no need to aggressively release the chunk from the magazine as soon as it looks like there's not enough capacity: none can reuse it due to the previous point
  3. there's no need to know the exact size before attempting to "read into" chunk because for size chunks we just need to know if there's an available segment (this applies to shared ones as well)
  4. [TO BE VERIFIED] a chunk is marked to be deallocated only if it fails to be placed in the shared queue. This decision can be taken only by the owner thread. After that, it can just observe release of segments and no new allocations, because is not visible to anyone: this info could be used to simplify the reference scheme for chunk, saving chunk retain/releases (mentioned at the beginning of this issue)

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions