perf(allocator): reduce instructions on allocation hot path by overlookmotel · Pull Request #21510 · oxc-project/oxc

overlookmotel · 2026-04-16T20:40:17Z

Perf optimization to Arena. Trim off instructions from allocation hot path (try_alloc_layout_fast).

Original implementation taken from bumpalo performed quite convoluted arithmetic to determine whether an allocation can be serviced within the current allocator chunk, or whether a new chunk needs to be added to the allocator.

Reduce these checks to 2 simple instructions.

The basis of the optimization is the invariants that:

A chunk's size can never be greater than isize::MAX bytes (because no allocation can be).
ALayout's size can also never exceed isize::MAX bytes.

Utilizing these invariants, the bounds checks can be collapsed to a single subtraction and branch on the sign bit flag. More details in comments in the code.

When the allocation's requested alignment is larger than the arena's minimum alignment (which it is for all AST nodes, as they're all aligned on 8), this additionally collapses 2 branches into a single check.

On x86, the new version uses 1 less register. This is significant as this code gets inlined into call sites, potentially reducing stack spills.

Path	Arch	Old	New	Saved
Less	x86-64	7 instructions 1 branch 2 registers	5 instructions 1 branch 1 register	-2 instructions -1 register
Less	aarch64	8 instructions 1 branch 2 registers	7 instructions 1 branch 2 registers	-1 instruction
Less dynamic	x86-64	10 instructions 1 branch 3 registers	6 instructions 1 branch 1 register	-4 instructions -2 registers
Less dynamic	aarch64	10 instructions 1 branch 3 registers	8 instructions 1 branch 2 registers	-2 instructions -1 register
Equal	x86-64	7 instructions 1 branch 2 registers	5 instructions 1 branch 1 register	-2 instructions -1 register
Equal	aarch64	8 instructions 1 branch 2 registers	7 instructions 1 branch 2 registers	-1 instruction
Equal dynamic	x86-64	8 instructions 1 branch 3 registers	5 instructions 1 branch 1 register	-3 instructions -2 registers
Equal dynamic	aarch64	9 instructions 1 branch 3 registers	7 instructions 1 branch 2 registers	-2 instructions -1 register
Greater	x86-64	9 instructions 2 branches 2 registers	6 instructions 1 branch 1 register	-3 instructions -1 branch -1 register
Greater	aarch64	9 instructions 2 branches 2 registers	8 instructions 1 branch 2 registers	-1 instruction -1 branch
Greater dynamic	x86-64	9 instructions 2 branches 2 registers	6 instructions 1 branch 1 register	-3 instructions -1 branch -1 register
Greater dynamic	aarch64	9 instructions 1 branch 2 registers	8 instructions 1 branch 2 registers	-1 instruction

1% perf improvement on all parser benchmarks.

overlookmotel · 2026-04-16T20:40:29Z

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent changes, fast-track this PR to the front of the merge queue

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

codspeed-hq · 2026-04-16T20:47:06Z

Merging this PR will not alter performance

✅ 48 untouched benchmarks
⏩ 3 skipped benchmarks¹

_{Comparing om/04-16-perf_allocator_reduce_instructions_on_allocation_hot_path (3cf7cef) with main (ca81199)²}

3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on main (63eb98b) during the generation of this report, so ca81199 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

overlookmotel · 2026-04-16T20:57:57Z

For the record, here's Claude's annotated assembly before/after this PR.

The paths which are hit repeatedly in Oxc are:

"Greater (static)" - AST node allocation
"Equal (dynamic)" - Arena::alloc_str

Less path (static size, `align < MIN_ALIGN`)

Arena<8>::try_alloc_layout(Layout(29, 1))

Old (x86-64)

mov  rax, [rdi]              ; load cursor_ptr
mov  rcx, rax                ; copy cursor (need original later for new_ptr)
sub  rcx, [rdi + 16]         ; capacity = cursor - start_ptr
cmp  rcx, 32                 ; capacity >= 32? (29 rounded up to MIN_ALIGN=8 = 32)
jb   .slow                   ; if not, branch to slow path
add  rax, -32                ; new_ptr = cursor - 32
mov  [rdi], rax              ; store new cursor
ret

New (x86-64)

mov  rax, [rdi]              ; load cursor_ptr
add  rax, -32                ; new_ptr = cursor - 32
cmp  rax, [rdi + 16]         ; new_ptr vs start_ptr
js   .slow                   ; if signed, tail-call slow path
mov  [rdi], rax              ; store new cursor
ret

Old (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
sub  x9, x8, x9              ; capacity = cursor - start_ptr
cmp  x9, #32                 ; capacity >= 32?
b.lo .slow                   ; if not, branch to slow path
sub  x8, x8, #32             ; new_ptr = cursor - 32
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; move result to return register
ret

New (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
sub  x8, x8, #32             ; new_ptr = cursor - 32 (AND elided)
cmp  x8, x9                  ; new_ptr vs start_ptr
b.mi .slow                   ; branch on negative flag
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

Less path dynamic (dynamic size, `align < MIN_ALIGN`)

Arena<8>::try_alloc_layout(Layout(size, 1))

Old (x86-64)

mov  rdx, rsi                ; save size for slow-path call
mov  rax, [rdi]              ; load cursor_ptr
lea  rcx, [rsi + 7]          ; size + 7 (round-up step 1)
and  rcx, -8                 ; aligned_size = (size + 7) & ~7 (round-up step 2)
mov  rsi, rax                ; copy cursor
sub  rsi, [rdi + 16]         ; capacity = cursor - start_ptr
cmp  rcx, rsi                ; aligned_size > capacity?
ja   .slow                   ; if so, branch to slow path
sub  rax, rcx                ; new_ptr = cursor - aligned_size
mov  [rdi], rax              ; store new cursor
ret

New (x86-64)

mov  rax, [rdi]              ; load cursor_ptr
sub  rax, rsi                ; cursor - size
and  rax, -8                 ; round down to MIN_ALIGN = new_ptr
cmp  rax, [rdi + 16]         ; new_ptr vs start_ptr
js   .slow                   ; tail-call slow path on out of bounds
mov  [rdi], rax              ; store new cursor
ret

Old (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x10, [x0, #16]          ; load start_ptr
add  x9, x1, #7              ; size + 7
and  x9, x9, #~7             ; aligned_size = (size + 7) & ~7
sub  x10, x8, x10            ; capacity = cursor - start_ptr
cmp  x9, x10                 ; aligned_size > capacity?
b.hi .slow                   ; if so, branch to slow path
sub  x8, x8, x9              ; new_ptr = cursor - aligned_size
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

New (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
sub  x8, x8, x1              ; cursor - size
and  x8, x8, #~7             ; round down to MIN_ALIGN
cmp  x8, x9                  ; new_ptr vs start_ptr
b.mi .slow                   ; branch on negative
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

Equal path (static size, `align == MIN_ALIGN`)

Arena<8>::try_alloc_layout(Layout(32, 8))

Old (x86-64)

mov  rax, [rdi]              ; load cursor_ptr
mov  rcx, rax                ; copy cursor
sub  rcx, [rdi + 16]         ; capacity = cursor - start_ptr
cmp  rcx, 32                 ; capacity >= 32?
jb   .slow                   ; if not, branch to slow path
add  rax, -32                ; new_ptr = cursor - 32
mov  [rdi], rax              ; store new cursor
ret

New (x86-64)

mov  rax, [rdi]              ; load cursor_ptr
add  rax, -32                ; new_ptr = cursor - 32 (AND elided)
cmp  rax, [rdi + 16]         ; new_ptr vs start_ptr
js   .slow                   ; tail-call slow path
mov  [rdi], rax              ; store new cursor
ret

Old (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
sub  x9, x8, x9              ; capacity = cursor - start_ptr
cmp  x9, #32                 ; capacity >= 32?
b.lo .slow                   ; if not, branch to slow path
sub  x8, x8, #32             ; new_ptr = cursor - 32
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

New (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
sub  x8, x8, #32             ; new_ptr = cursor - 32 (AND elided)
cmp  x8, x9                  ; new_ptr vs start_ptr
b.mi .slow                   ; branch on negative
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

Equal path dynamic (dynamic size known multiple of 8, `align == MIN_ALIGN`)

Arena<8>::try_alloc_layout(Layout(size, 8))

Old (x86-64)

mov  rdx, rsi                ; save size for slow path
mov  rax, [rdi]              ; load cursor_ptr
mov  rcx, rax                ; copy cursor
sub  rcx, [rdi + 16]         ; capacity = cursor - start_ptr
cmp  rsi, rcx                ; size > capacity?
ja   .slow                   ; if so, branch to slow path
sub  rax, rdx                ; new_ptr = cursor - size
mov  [rdi], rax              ; store new cursor
ret

New (x86-64)

mov  rax, [rdi]              ; load cursor_ptr
sub  rax, rsi                ; new_ptr = cursor - size (AND elided via is_multiple_of(8) hint)
cmp  rax, [rdi + 16]         ; new_ptr vs start_ptr
js   .slow                   ; tail-call slow path
mov  [rdi], rax              ; store new cursor
ret

Old (aarch64)

mov  x2, x1                  ; save size for slow path
ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
sub  x9, x8, x9              ; capacity = cursor - start_ptr
cmp  x1, x9                  ; size > capacity?
b.hi .slow                   ; if so, branch to slow path
sub  x8, x8, x2              ; new_ptr = cursor - size
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

New (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
sub  x8, x8, x1              ; new_ptr = cursor - size (AND elided)
cmp  x8, x9                  ; new_ptr vs start_ptr
b.mi .slow                   ; branch on negative
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

Greater path (static size, `align > MIN_ALIGN`)

Arena<1>::try_alloc_layout(Layout(32, 8))

Old (x86-64)

mov  rax, [rdi]              ; load cursor_ptr
and  rax, -8                 ; aligned_ptr = cursor & ~7 (round down to align 8)
mov  rcx, rax                ; copy aligned_ptr
sub  rcx, [rdi + 16]         ; capacity = aligned_ptr - start_ptr (sets carry on underflow)
jb   .slow                   ; branch 1: aligned_ptr < start_ptr (underflow)
cmp  rcx, 32                 ; capacity >= 32?
jb   .slow                   ; branch 2: not enough room
add  rax, -32                ; new_ptr = aligned_ptr - 32
mov  [rdi], rax              ; store new cursor
ret

New (x86-64)

mov  rax, [rdi]              ; load cursor_ptr
add  rax, -32                ; cursor - 32
and  rax, -8                 ; round down to align 8 = new_ptr
cmp  rax, [rdi + 16]         ; new_ptr vs start_ptr
js   .slow                   ; tail-call slow path
mov  [rdi], rax              ; store new cursor
ret

Old (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
and  x8, x8, #~7             ; aligned_ptr = cursor & ~7
subs x9, x8, x9              ; capacity = aligned_ptr - start (sets flags)
b.lo .slow                   ; branch 1: aligned_ptr < start (underflow)
cmp  x9, #32                 ; capacity >= 32?
b.lo .slow                   ; branch 2: not enough room
sub  x8, x8, #32             ; new_ptr = aligned_ptr - 32
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

New (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
sub  x8, x8, #32             ; cursor - 32
and  x8, x8, #~7             ; round down to align 8 = new_ptr
cmp  x8, x9                  ; new_ptr vs start_ptr
b.mi .slow                   ; branch on negative
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

Greater path dynamic (dynamic size, `align > MIN_ALIGN`)

Arena<1>::try_alloc_layout(Layout(size, 8))

Old (x86-64)

mov  rax, [rdi]              ; load cursor_ptr
and  rax, -8                 ; aligned_ptr = cursor & ~7
mov  rcx, rax                ; copy aligned_ptr
sub  rcx, [rdi + 16]         ; capacity = aligned_ptr - start_ptr (sets carry on underflow)
jb   .slow                   ; branch 1: aligned_ptr < start (underflow) — tail call
cmp  rsi, rcx                ; size > capacity?
ja   .slow                   ; branch 2: not enough room — tail call
sub  rax, rsi                ; new_ptr = aligned_ptr - size
mov  [rdi], rax              ; store new cursor
ret

New (x86-64)

mov  rax, [rdi]              ; load cursor_ptr
sub  rax, rsi                ; cursor - size
and  rax, -8                 ; round down to align 8 = new_ptr
cmp  rax, [rdi + 16]         ; new_ptr vs start_ptr
js   .slow                   ; tail-call slow path
mov  [rdi], rax              ; store new cursor
ret

Old (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
and  x8, x8, #~7             ; aligned_ptr = cursor & ~7
subs x9, x8, x9              ; capacity = aligned_ptr - start (sets flags)
ccmp x1, x9, #2, hs          ; if no underflow: compare size with capacity, else set "out of bounds"
b.hi .slow                   ; branch on either failure
sub  x8, x8, x1              ; new_ptr = aligned_ptr - size
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

New (aarch64)

ldr  x8, [x0]                ; load cursor_ptr
ldr  x9, [x0, #16]           ; load start_ptr
sub  x8, x8, x1              ; cursor - size
and  x8, x8, #~7             ; round down to align 8 = new_ptr
cmp  x8, x9                  ; new_ptr vs start_ptr
b.mi .slow                   ; branch on negative
str  x8, [x0]                ; store new cursor
mov  x0, x8                  ; return register
ret

Copilot

Pull request overview

Optimizes oxc_allocator’s Arena allocation fast path by simplifying alignment/space checks into fewer instructions and branches, aiming to improve hot-path performance during allocation-heavy workloads (e.g., parsing/AST building).

Changes:

Reworks try_alloc_layout_fast to compute the allocation pointer via a unified alignment/size calculation and a single bounds check based on the sign/top-bit property.
Removes the previous Ordering-based match and replaces it with an if layout.align() <= MIN_ALIGN split.
Updates debug assertions and expands in-code rationale/commentary for the new approach.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

overlookmotel · 2026-04-21T23:16:59Z

@copilot The implementation is changed considerably since the first version you reviewed. Please review again carefully, starting from scratch.

Copilot · 2026-04-21T23:29:53Z

@copilot The implementation is changed considerably since the first version you reviewed. Please review again carefully, starting from scratch.

Reviewed the implementation from scratch. Here's the full analysis:

Algorithm

new_ptr = round_down(cursor - size, max(align, MIN_ALIGN))

This unified formula handles all three alignment cases. The round_mut_ptr_down_to implementation using wrapping_sub (rather than AND-masking) preserves pointer provenance and critically enables LLVM's known-bits analysis to fold (cursor - constant_size) - constant_low_bits into a single cursor - constant for statically-known sizes. The assert_unchecked! on cursor alignment is the essential hint that makes this fold possible.

Bounds check correctness

The claim: new_ptr.wrapping_sub(start) > isize::MAX iff new_ptr < start.

In-bounds (new_ptr >= start): new_ptr - start doesn't wrap, and is <= cursor - start <= isize::MAX. ✓

Out-of-bounds (new_ptr < start): The proof in the comments is sound.

Max total subtraction is isize::MAX + 1 (the less/equal path reaching round_up(isize::MAX, MIN_ALIGN) when MIN_ALIGN is a power of 2).
The minimum out-of-bounds wrapping result is isize::MAX + 1 (the extremal case of empty chunk + max total subtraction).
All other out-of-bounds cases give larger wrapping results, correctly detected by > isize::MAX.

Top-half addresses: The comment (line 206-207) claims correctness without relying on addresses being in the bottom half. I verified this independently. When cursor is strictly above TOP (2^63), the Layout invariant (round_up(size, align) <= isize::MAX) implies that for any power-of-2 alignment, isize::MAX ≡ align-1 (mod align), so the max valid size is isize::MAX - (align-1). This means cursor - max_valid_size >= align, so new_ptr >= align after rounding. Since start - new_ptr <= (TOP + delta) - (delta + 1) = isize::MAX, the wrapping difference is >= 2^63 + 1 > isize::MAX. The check is always correct. ✓

Null pointer safety (line 295-297): The argument holds on all real platforms where user-space allocations are in the bottom half. On any platform where an allocation could start exactly at TOP, 0.wrapping_sub(TOP) = isize::MAX + 1 > isize::MAX still catches it. For starts strictly above TOP, the analysis above shows new_ptr can't be null with valid Layout sizes. ✓

Comment accuracy

Line 125: "maximum subtraction before rounding is isize::MAX - align + 1" — correct: for align > 1, since round_up(size, align) <= isize::MAX and isize::MAX ≡ align-1 (mod align), the max valid size is exactly isize::MAX - (align-1).
Line 191: "So new_ptr.wrapping_sub(start_ptr) > isize::MAX + 1" — correct for "all other" out-of-bounds cases (i.e., not the minimum extremal case). Together with the minimum case at isize::MAX + 1, all out-of-bounds results are >= isize::MAX + 1.
Lines 184-187: The arithmetic for the extremal case (0.wrapping_sub(isize::MAX + 1) = isize::MAX + 1) is correct.

`unsafe { assert_unchecked! }` soundness

Sound: Arena maintains cursor alignment to MIN_ALIGN at all mutation points — initial creation (footer is CHUNK_ALIGN-aligned and CHUNK_ALIGN >= MIN_ALIGN), try_alloc_layout_fast (result of round_mut_ptr_down_to with divisor >= MIN_ALIGN), alloc_layout_slow (sets cursor to new footer), and dealloc (rounds up to MIN_ALIGN). ✓

Test coverage

The test suite is comprehensive. It covers: bottom-half, top-half, spanning-midpoint, near-usize::MAX, all three alignment paths (less/equal/greater), ZSTs, consecutive allocations, off-by-ones, max valid Layout sizes for each path, boundary total-subtraction exactly at isize::MAX + 1, and various MIN_ALIGN values (1, 8, 16). All 60 tests pass.

No issues found. The implementation is correct, the invariant proof is sound, and the test suite provides thorough coverage of the edge cases.

graphite-app · 2026-04-21T23:54:35Z

Merge activity

Apr 21, 11:54 PM UTC: overlookmotel added this pull request to the Graphite merge queue.
Apr 22, 12:03 AM UTC: Merged by the Graphite merge queue.

Perf optimization to `Arena`. Trim off instructions from allocation hot path (`try_alloc_layout_fast`). Original implementation taken from `bumpalo` performed quite convoluted arithmetic to determine whether an allocation can be serviced within the current allocator chunk, or whether a new chunk needs to be added to the allocator. Reduce these checks to 2 simple instructions. The basis of the optimization is the invariants that: 1. A chunk's size can never be greater than `isize::MAX` bytes (because no allocation can be). 2. A`Layout`'s size can also never exceed `isize::MAX` bytes. Utilizing these invariants, the bounds checks can be collapsed to a single subtraction and branch on the sign bit flag. More details in comments in the code. When the allocation's requested alignment is larger than the arena's minimum alignment (which it is for all AST nodes, as they're all aligned on 8), this additionally collapses 2 branches into a single check. On x86, the new version uses 1 less register. This is significant as this code gets inlined into call sites, potentially reducing stack spills. | Path | Arch | Old | New | Saved | | --------------------- | ------- | ---------------------------------------------- | ---------------------------------------------- | ------------------------------------------------ | | Less | x86-64 | 7 instructions 1 branch 2 registers | 5 instructions 1 branch 1 register | -2 instructions   -1 register | | Less | aarch64 | 8 instructions 1 branch 2 registers | 7 instructions 1 branch 2 registers | -1 instruction     | | Less dynamic | x86-64 | 10 instructions 1 branch 3 registers | 6 instructions 1 branch 1 register | -4 instructions   -2 registers | | Less dynamic | aarch64 | 10 instructions 1 branch 3 registers | 8 instructions 1 branch 2 registers | -2 instructions   -1 register | | Equal | x86-64 | 7 instructions 1 branch 2 registers | 5 instructions 1 branch 1 register | -2 instructions   -1 register | | Equal | aarch64 | 8 instructions 1 branch 2 registers | 7 instructions 1 branch 2 registers | -1 instruction     | | Equal dynamic | x86-64 | 8 instructions 1 branch 3 registers | 5 instructions 1 branch 1 register | -3 instructions   -2 registers | | Equal dynamic | aarch64 | 9 instructions 1 branch 3 registers | 7 instructions 1 branch 2 registers | -2 instructions   -1 register | | Greater | x86-64 | 9 instructions 2 branches 2 registers | 6 instructions 1 branch 1 register | -3 instructions -1 branch -1 register | | Greater | aarch64 | 9 instructions 2 branches 2 registers | 8 instructions 1 branch 2 registers | -1 instruction -1 branch   | | Greater dynamic | x86-64 | 9 instructions 2 branches 2 registers | 6 instructions 1 branch 1 register | -3 instructions -1 branch -1 register | | Greater dynamic | aarch64 | 9 instructions 1 branch 2 registers | 8 instructions 1 branch 2 registers | -1 instruction     | 1% perf improvement on all parser benchmarks.

Fix a test for `Bump` which was failing on 32-bit in CI. #21291 temporarily disabled the test. Turn out, there's no easy way to make this test work on 32-bit platforms, and #21510 added ample unit tests which effectively test the same thing. So keep it disabled on 32-bit, but fix the test to make it work properly on 64-bit. Rationale for the fix is explained in fitzgen/bumpalo#320.

### 💥 BREAKING CHANGES - 502e804 ast: [**BREAKING**] Reduce size of `TSTypePredicateName` (#21711) (overlookmotel) - 5651539 ast: [**BREAKING**] Reduce size of `JSXExpression` (#21710) (overlookmotel) - c44e280 ast: [**BREAKING**] Reduce size of `ArrayExpressionElement` (#21709) (overlookmotel) - c5b3deb syntax: [**BREAKING**] Remove `CommentNodeId` (#21679) (overlookmotel) ### 🚀 Features - b738a39 allocator: Add `Allocator::cursor_ptr` method (#21773) (overlookmotel) - 678767e ast: Generate node_id accessors for AST enum wrappers (#21653) (camc314) - f091d77 minifier: Inline constant spread elements into arrays (#21095) (Armano) ### 🐛 Bug Fixes - 0d608c2 minifier: Preserve raw CR in template literals (#21645) (Dunqing) - a889ea9 minifier: Track pure functions in DCE mode (#21722) (Dunqing) - 674dfac allocator: `Arena` retry allocation when chunk size approaches maximum (#21777) (overlookmotel) - f130cc0 allocator: Fix arithmetic overflow in `Arena::new_chunk_memory_details` (#21745) (overlookmotel) - b9bf239 allocator: Fix UB in `Arena::grow_zeroed` (#21739) (overlookmotel) - d2b9389 allocator: Clippy warning when building without `testing` feature (#21681) (camc314) - 503dc86 codegen: Map sourcemaps from visible output starts (#21662) (Dunqing) - c92bd3b transformer: Use SPAN for synthesized helper calls to prevent comment misattribution (#21578) (Dunqing) - 0d80441 codegen: Add mapping before printing `#` for private ident (#21619) (camc314) ### ⚡ Performance - 9fa362e napi/parser: Do not generate tokens except in tests (#21811) (overlookmotel) - 0044392 allocator: Reduce branches when allocating new chunk (#21776) (overlookmotel) - 7896bd0 allocator: `Allocator::used_bytes` do not use chunk iterator (#21771) (overlookmotel) - a5c562f allocator: Remove check in `Arena::new_chunk_memory_details` (#21750) (overlookmotel) - 35bbe1f allocator: `Arena` use unchecked size round up where guaranteed no overflow (#21743) (overlookmotel) - ffe229b allocator: Remove unnecessary check from `Arena::try_alloc_layout_slow_impl` (#21732) (overlookmotel) - 72fece5 allocator: Use `NonNull::offset_from_unsigned` in `Arena::chunk_capacity` (#21731) (overlookmotel) - cab32ae ast: Add `#[inline(always)]` to `node_id` methods on enums with all variants unboxed (#21707) (overlookmotel) - b179688 parser: Allocate `TriviaBuilder` comments in the arena (#21512) (Boshen) - 2290f31 lexer: Fix perf of `Token::set_*` methods on Rust 1.95.0 (#21659) (overlookmotel) - 1b58029 allocator: Move code into cold path in `Arena::alloc_layout` (#21622) (overlookmotel) - 3cf7cef allocator: Reduce instructions on allocation hot path (#21510) (overlookmotel) ### 📚 Documentation - ce65070 data_structures: Document why `as_ref` and `as_mut` on `NonNullConst` and `NonNullMut` take `self` (#21800) (overlookmotel) - 93b7dbd allocator: Improve doc comments for `ChunkFooter` (#21733) (overlookmotel) - 295db8d transformer: Fix comment (#21717) (overlookmotel) - 5c93af8 ast: Add comments explaining `#[inline(always)]` to `node_id` methods on enums (#21706) (overlookmotel) - e4cea25 transform: Use the `node:` namespace in the example (#19998) (루밀LuMir) ### 🛡️ Security - d8076c9 deps: Update rolldown (#21639) (renovate) Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>

github-actions Bot added A-allocator Area - Allocator C-performance Category - Solution not expected to change functional behavior, only performance labels Apr 16, 2026

overlookmotel self-assigned this Apr 16, 2026

overlookmotel marked this pull request as ready for review April 16, 2026 20:55

Copilot AI review requested due to automatic review settings April 16, 2026 20:55

Copilot started reviewing on behalf of overlookmotel April 16, 2026 20:55 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread crates/oxc_allocator/src/arena/alloc_impl.rs Outdated

Comment thread crates/oxc_allocator/src/arena/alloc_impl.rs Outdated

overlookmotel force-pushed the om/04-16-perf_allocator_reduce_instructions_on_allocation_hot_path branch 2 times, most recently from e46cbb7 to 4a40a6b Compare April 21, 2026 22:53

overlookmotel requested a review from Copilot April 21, 2026 23:08

overlookmotel mentioned this pull request Apr 21, 2026

perf(allocator): move code into cold path in Arena::alloc_layout #21622

Merged

Copilot started reviewing on behalf of overlookmotel April 21, 2026 23:09 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread crates/oxc_allocator/src/arena/alloc_impl.rs

Copilot started work on behalf of overlookmotel April 21, 2026 23:17 View session

Copilot finished work on behalf of overlookmotel April 21, 2026 23:30

overlookmotel force-pushed the om/04-16-perf_allocator_reduce_instructions_on_allocation_hot_path branch from 24db091 to 17997d3 Compare April 21, 2026 23:38

graphite-app Bot force-pushed the om/04-16-perf_allocator_reduce_instructions_on_allocation_hot_path branch from 17997d3 to 3cf7cef Compare April 21, 2026 23:54

graphite-app Bot merged commit 3cf7cef into main Apr 22, 2026
37 checks passed

graphite-app Bot deleted the om/04-16-perf_allocator_reduce_instructions_on_allocation_hot_path branch April 22, 2026 00:03

This was referenced Apr 23, 2026

Revamp allocator #20513

Open

test(allocator): fix Bump test on 32-bit platforms #21309

Merged

oxc-guard Bot mentioned this pull request Apr 27, 2026

release(crates): oxc v0.128.0 #21837

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(allocator): reduce instructions on allocation hot path#21510

perf(allocator): reduce instructions on allocation hot path#21510
graphite-app[bot] merged 1 commit intomainfrom
om/04-16-perf_allocator_reduce_instructions_on_allocation_hot_path

overlookmotel commented Apr 16, 2026 •

edited

Loading

Uh oh!

overlookmotel commented Apr 16, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

overlookmotel commented Apr 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

overlookmotel commented Apr 21, 2026

Uh oh!

Copilot AI commented Apr 21, 2026

Uh oh!

graphite-app Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

overlookmotel commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

overlookmotel commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

codspeed-hq Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

overlookmotel commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Less path (static size, align < MIN_ALIGN)

Old (x86-64)

New (x86-64)

Old (aarch64)

New (aarch64)

Less path dynamic (dynamic size, align < MIN_ALIGN)

Old (x86-64)

New (x86-64)

Old (aarch64)

New (aarch64)

Equal path (static size, align == MIN_ALIGN)

Old (x86-64)

New (x86-64)

Old (aarch64)

New (aarch64)

Equal path dynamic (dynamic size known multiple of 8, align == MIN_ALIGN)

Old (x86-64)

New (x86-64)

Old (aarch64)

New (aarch64)

Greater path (static size, align > MIN_ALIGN)

Old (x86-64)

New (x86-64)

Old (aarch64)

New (aarch64)

Greater path dynamic (dynamic size, align > MIN_ALIGN)

Old (x86-64)

New (x86-64)

Old (aarch64)

New (aarch64)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

overlookmotel commented Apr 21, 2026

Uh oh!

Copilot AI commented Apr 21, 2026

Algorithm

Bounds check correctness

Comment accuracy

unsafe { assert_unchecked! } soundness

Test coverage

Uh oh!

graphite-app Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

overlookmotel commented Apr 16, 2026 •

edited

Loading

overlookmotel commented Apr 16, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 16, 2026 •

edited

Loading

overlookmotel commented Apr 16, 2026 •

edited

Loading

Less path (static size, `align < MIN_ALIGN`)

Less path dynamic (dynamic size, `align < MIN_ALIGN`)

Equal path (static size, `align == MIN_ALIGN`)

Equal path dynamic (dynamic size known multiple of 8, `align == MIN_ALIGN`)

Greater path (static size, `align > MIN_ALIGN`)

Greater path dynamic (dynamic size, `align > MIN_ALIGN`)

`unsafe { assert_unchecked! }` soundness

graphite-app Bot commented Apr 21, 2026 •

edited

Loading