Skip to content

perf(allocator): store pointers directly in Arena#21483

Merged
graphite-app[bot] merged 1 commit intomainfrom
om/04-15-perf_allocator_store_pointers_directly_in_arena_
Apr 16, 2026
Merged

perf(allocator): store pointers directly in Arena#21483
graphite-app[bot] merged 1 commit intomainfrom
om/04-15-perf_allocator_store_pointers_directly_in_arena_

Conversation

@overlookmotel
Copy link
Copy Markdown
Member

@overlookmotel overlookmotel commented Apr 15, 2026

Arena stores pointers to current bump cursor, and start of the chunk in ChunkFooter.

This means that every allocation involves pointer chasing - read the pointer to theChunkFooter, then read the ChunkFooter's cursor_ptr and start_ptr fields. Because the pointer to the ChunkFooter is wrapped in a Cell, compiler likely cannot assume the value is still what it was last time it read the field, and will read it over and over repeatedly. This adds ~4 cycles of latency to every allocation.

Instead, store these pointers as fields of the Arena itself, to avoid this indirection.

start_ptr still also needs to be stored in ChunkFooter for use when deallocating chunks. And cursor_ptr is also stored in ChunkFooter to support iter_allocated_chunks and iter_allocated_chunks_raw methods.

0.3% - 0.5% perf improvement in parser benchmarks. Allocation is so fast already, that the impact is small - allocation is not the bottleneck. But a micro-benchmark testing allocation in isolation shows that allocation itself gets a 2x speed-up.

See comment on Arena for details of a field layout oddity which has a huge impact on aarch64 (Apple Silicon).

Copy link
Copy Markdown
Member Author


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent changes, fast-track this PR to the front of the merge queue

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions Bot added A-allocator Area - Allocator C-performance Category - Solution not expected to change functional behavior, only performance labels Apr 15, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 15, 2026

Merging this PR will not alter performance

✅ 48 untouched benchmarks
⏩ 3 skipped benchmarks1


Comparing om/04-15-perf_allocator_store_pointers_directly_in_arena_ (421a549) with main (9c9b6a2)2

Open in CodSpeed

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (6c8684b) during the generation of this report, so 9c9b6a2 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@overlookmotel overlookmotel self-assigned this Apr 15, 2026
@overlookmotel overlookmotel marked this pull request as ready for review April 15, 2026 21:51
Copilot AI review requested due to automatic review settings April 15, 2026 21:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes oxc_allocator::arena::Arena’s fast-path allocations by storing the current chunk’s bump cursor and start pointer directly on Arena, avoiding repeated indirection through the ChunkFooter (and repeated loads through a Cell).

Changes:

  • Added Arena::cursor_ptr and Arena::start_ptr fields (both Cell<NonNull<u8>>) and initialized/maintained them across arena lifecycle operations.
  • Repurposed ChunkFooter::cursor_ptr to store the final cursor only for retired chunks (to support chunk iteration APIs).
  • Updated chunk iteration to source the current chunk’s cursor from Arena while continuing to read retired chunk cursors from each footer.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/oxc_allocator/src/arena/mod.rs Adds cursor_ptr/start_ptr to Arena, adjusts ChunkFooter fields, removes ChunkFooter::as_raw_parts.
crates/oxc_allocator/src/arena/from_raw_parts.rs Updates raw-transfer construction and cursor mutation to use Arena::cursor_ptr.
crates/oxc_allocator/src/arena/drop.rs Updates reset() to restore the arena-level cursor pointer.
crates/oxc_allocator/src/arena/create.rs Initializes arena-level start_ptr/cursor_ptr in new_impl, updates footer construction.
crates/oxc_allocator/src/arena/chunks.rs Uses arena-level cursor for current chunk iteration; stores it in the iterator for the first step.
crates/oxc_allocator/src/arena/alloc_impl.rs Moves fast-path allocation reads/writes to arena-level pointers; syncs retiring chunk cursor back into its footer.

@overlookmotel overlookmotel force-pushed the om/04-15-perf_allocator_store_pointers_directly_in_arena_ branch from 8dae04d to 2df9f19 Compare April 15, 2026 22:11
@overlookmotel overlookmotel marked this pull request as draft April 15, 2026 22:44
@overlookmotel overlookmotel force-pushed the om/04-15-perf_allocator_store_pointers_directly_in_arena_ branch from 2df9f19 to bc3f364 Compare April 16, 2026 01:15
@overlookmotel overlookmotel marked this pull request as ready for review April 16, 2026 01:18
@overlookmotel overlookmotel requested a review from Copilot April 16, 2026 01:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Comment thread crates/oxc_allocator/src/arena/alloc_impl.rs Outdated
Comment thread crates/oxc_allocator/src/arena/from_raw_parts.rs Outdated
Comment thread crates/oxc_allocator/src/arena/mod.rs
Comment thread crates/oxc_allocator/src/arena/mod.rs
@overlookmotel overlookmotel marked this pull request as draft April 16, 2026 02:16
@overlookmotel overlookmotel force-pushed the om/04-15-perf_allocator_store_pointers_directly_in_arena_ branch from bc3f364 to 421a549 Compare April 16, 2026 13:51
@overlookmotel overlookmotel requested a review from Copilot April 16, 2026 13:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@overlookmotel overlookmotel marked this pull request as ready for review April 16, 2026 14:03
@overlookmotel overlookmotel added the 0-merge Merge with Graphite Merge Queue label Apr 16, 2026
Copy link
Copy Markdown
Member Author

overlookmotel commented Apr 16, 2026

Merge activity

`Arena` stores pointers to current bump cursor, and start of the chunk in `ChunkFooter`.

This means that every allocation involves pointer chasing - read the pointer to the`ChunkFooter`, then read the `ChunkFooter`'s `cursor_ptr` and `start_ptr` fields. Because the pointer to the `ChunkFooter` is wrapped in a `Cell`, compiler likely cannot assume the value is still what it was last time it read the field, and will read it over and over repeatedly. This adds ~4 cycles of latency to every allocation.

Instead, store these pointers as fields of the `Arena` itself, to avoid this indirection.

`start_ptr` still also needs to be stored in `ChunkFooter` for use when deallocating chunks. And `cursor_ptr` is also stored in `ChunkFooter` to support `iter_allocated_chunks` and `iter_allocated_chunks_raw` methods.

0.3% - 0.5% perf improvement in parser benchmarks. Allocation is so fast already, that the impact is small - allocation is not the bottleneck. But a micro-benchmark testing allocation in isolation shows that allocation itself gets a 2x speed-up.

See comment on `Arena` for details of a field layout oddity which has a huge impact on aarch64 (Apple Silicon).
@graphite-app graphite-app Bot force-pushed the om/04-15-perf_allocator_store_pointers_directly_in_arena_ branch from 421a549 to be2b392 Compare April 16, 2026 14:10
@graphite-app graphite-app Bot merged commit be2b392 into main Apr 16, 2026
27 checks passed
@graphite-app graphite-app Bot deleted the om/04-15-perf_allocator_store_pointers_directly_in_arena_ branch April 16, 2026 14:15
@graphite-app graphite-app Bot removed the 0-merge Merge with Graphite Merge Queue label Apr 16, 2026
camc314 added a commit that referenced this pull request Apr 20, 2026
### 🐛 Bug Fixes

- 48967e8 isolated_declarations: Drop required type check for private
parameter properties on private constructors (#21515) (Dunqing)
- 91e5bde transformer/typescript: Preserve computed-key static block
when class has an empty constructor (#21562) (Dunqing)
- 50e9d26 mangler: Assign correct slot to shadowed function-expression
names (#21535) (Dunqing)
- 065ce47 isolated_declarations: Collect types from private accessors
for paired inference (#21516) (Dunqing)
- 00fc136 codegen: Preserve coverage comments before object properties
(#21312) (bab)
- d676e0c minifier: Mark LHS of `??=` as read when converting from `==
null &&` (#21546) (Gunnlaugur Thor Briem)

### ⚡ Performance

- e45efc5 parser: Reduce `try_parse` usage in favour of `lookahead`
(#21532) (Boshen)
- ddb1bf8 parser: Avoid redundant `IdentifierReference` clone in
shorthand property (#21511) (Boshen)
- be2b392 allocator: Store pointers directly in `Arena` (#21483)
(overlookmotel)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Co-authored-by: Cameron <cameron.clark@hey.com>
graphite-app Bot pushed a commit that referenced this pull request Apr 26, 2026
The chunks in an `Arena` form a linked list. Previously the list was terminated with a canonical empty chunk, defined as a `static`.

Now that we store `start_ptr` and `cursor_ptr` in `Arena` itself (#21483), an empty `Arena` doesn't need to have a pointer to a chunk.

Remove the empty chunk, and use `None` to signify the end of the linked list instead.

This makes checking "is this chunk the last?" a little cheaper (comparison to 0, not a 64-bit static value), and feels less hacky, and more explicit. It also removes the potential hazard of accidentally mutating the immutable `static` empty chunk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-allocator Area - Allocator C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants