Skip to content

perf(codegen): accept one-shot wrap closures#23265

Merged
graphite-app[bot] merged 1 commit into
mainfrom
codex/perf-codegen-wrap-fnonce
Jun 11, 2026
Merged

perf(codegen): accept one-shot wrap closures#23265
graphite-app[bot] merged 1 commit into
mainfrom
codex/perf-codegen-wrap-fnonce

Conversation

@camc314

@camc314 camc314 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Change Codegen::wrap from FnMut to FnOnce, matching how the helper is used: every closure passed to wrap is invoked exactly once.
  • Add an assembly comparison note showing the optimized codegen difference before and after the change.

Fn, FnMut, and FnOnce ?

Fn, FnMut, and FnOnce all describe how a closure may be called:

  • Fn is the most restrictive for the closure body: it is callable through &self, can be called repeatedly, and cannot require mutable or consuming access to captured state.
  • FnMut is callable through &mut self, can be called repeatedly, and may mutate captured state.
  • FnOnce is callable by value, may consume captured state, and is only guaranteed to be callable once.

The trait relationship goes from most specific to most general call capability:

Fn: FnMut
FnMut: FnOnce

So accepting FnOnce is the least restrictive bound for a callback that is only invoked once. It still accepts Fn and FnMut closures, but it also tells the optimizer that wrap does not need a reusable mutable closure object.

Assembly Impact

Codegen::wrap is an inline generic helper, so there is no stable standalone wrap assembly symbol in release output. The impact shows up in monomorphized call sites such as Class::gen, Function::gen, and expression gen_expr closures.

Before, several call sites materialized a closure environment on the stack before calling the closure:

strb    w9, [sp, #15]
add     x9, sp, #15
stp     x0, x9, [sp, #16]
add     x0, sp, #16
bl      <...>::{{closure}}

After the FnOnce bound, the same shape can pass the one-shot closure state directly through registers:

and     w1, w2, #0xfffffffd
mov     x2, x19
bl      <...>::{{closure}}

Some wrapper frames also shrink. For example, representative Class::gen / Function::gen paths go from a 64 byte frame to a 48 byte frame:

- sub     sp, sp, #64
- stp     x20, x19, [sp, #32]
- stp     x29, x30, [sp, #48]
+ sub     sp, sp, #48
+ stp     x20, x19, [sp, #16]
+ stp     x29, x30, [sp, #32]

Several restored-frame return paths also become tail calls:

ldp     x29, x30, [sp, #32]
ldp     x20, x19, [sp, #16]
add     sp, sp, #48
b       <closure or push_slow target>

The assembly diff also contains expected local label renumbering noise, such as switch-table suffixes changing from .318 to .330; those are not behavior changes.

@github-actions github-actions Bot added the A-codegen Area - Code Generation label Jun 11, 2026
@camc314 camc314 force-pushed the codex/perf-codegen-wrap-fnonce branch from 2203325 to 4cdc212 Compare June 11, 2026 07:40
@camc314 camc314 marked this pull request as ready for review June 11, 2026 07:40
Copilot AI review requested due to automatic review settings June 11, 2026 07:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@camc314

camc314 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@codex[agent] review

@chatgpt-codex-connector

Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

@camc314

camc314 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@codex[agent] review this PR

@codspeed-hq

codspeed-hq Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 62 untouched benchmarks
⏩ 9 skipped benchmarks1


Comparing codex/perf-codegen-wrap-fnonce (4cdc212) with main (5e532ee)

Open in CodSpeed

Footnotes

  1. 9 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@camc314

camc314 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Hmm not sure why there's a regression - I would've expected an improvement based on the assembly.

Screenshot 2026-06-11 at 10 02 13

@Boshen

Boshen commented Jun 11, 2026

Copy link
Copy Markdown
Member

Probably noise.

@camc314 camc314 added the 0-merge Merge with Graphite Merge Queue label Jun 11, 2026
@camc314 camc314 self-assigned this Jun 11, 2026

camc314 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Merge activity

@camc314

camc314 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Will merge as I think this is a net positive - the benchmarks seem like noise.

We can always revert at a later point if needed.

## Summary

- Change `Codegen::wrap` from `FnMut` to `FnOnce`, matching how the helper is used: every closure passed to `wrap` is invoked exactly once.
- Add an assembly comparison note showing the optimized codegen difference before and after the change.

## `Fn`, `FnMut`, and `FnOnce` ?

`Fn`, `FnMut`, and `FnOnce` all describe how a closure may be called:

- `Fn` is the most restrictive for the closure body: it is callable through `&self`, can be called repeatedly, and cannot require mutable or consuming access to captured state.
- `FnMut` is callable through `&mut self`, can be called repeatedly, and may mutate captured state.
- `FnOnce` is callable by value, may consume captured state, and is only guaranteed to be callable once.

The trait relationship goes from most specific to most general call capability:

```rust
Fn: FnMut
FnMut: FnOnce
```

So accepting `FnOnce` is the least restrictive bound for a callback that is only invoked once. It still accepts `Fn` and `FnMut` closures, but it also tells the optimizer that `wrap` does not need a reusable mutable closure object.

## Assembly Impact

`Codegen::wrap` is an inline generic helper, so there is no stable standalone `wrap` assembly symbol in release output. The impact shows up in monomorphized call sites such as `Class::gen`, `Function::gen`, and expression `gen_expr` closures.

Before, several call sites materialized a closure environment on the stack before calling the closure:

```asm
strb    w9, [sp, #15]
add     x9, sp, #15
stp     x0, x9, [sp, #16]
add     x0, sp, #16
bl      <...>::{{closure}}
```

After the `FnOnce` bound, the same shape can pass the one-shot closure state directly through registers:

```asm
and     w1, w2, #0xfffffffd
mov     x2, x19
bl      <...>::{{closure}}
```

Some wrapper frames also shrink. For example, representative `Class::gen` / `Function::gen` paths go from a `64` byte frame to a `48` byte frame:

```asm
- sub     sp, sp, #64
- stp     x20, x19, [sp, #32]
- stp     x29, x30, [sp, #48]
+ sub     sp, sp, #48
+ stp     x20, x19, [sp, #16]
+ stp     x29, x30, [sp, #32]
```

Several restored-frame return paths also become tail calls:

```asm
ldp     x29, x30, [sp, #32]
ldp     x20, x19, [sp, #16]
add     sp, sp, #48
b       <closure or push_slow target>
```

The assembly diff also contains expected local label renumbering noise, such as switch-table suffixes changing from `.318` to `.330`; those are not behavior changes.
@graphite-app graphite-app Bot force-pushed the codex/perf-codegen-wrap-fnonce branch from 4cdc212 to e89729b Compare June 11, 2026 12:01
@graphite-app graphite-app Bot merged commit e89729b into main Jun 11, 2026
29 checks passed
@graphite-app graphite-app Bot removed the 0-merge Merge with Graphite Merge Queue label Jun 11, 2026
@graphite-app graphite-app Bot deleted the codex/perf-codegen-wrap-fnonce branch June 11, 2026 12:07
Boshen added a commit that referenced this pull request Jun 15, 2026
### 💥 BREAKING CHANGES

- 7a24911 codegen: [**BREAKING**] Borrow sourcemaps from codegen
(#23422) (Boshen)
- bb0ed44 transformer: [**BREAKING**] Disable styled-components
transpileTemplateLiterals by default (#23171) (Boshen)

### 🚀 Features

- 1490a0a linter/react: Implement react-compiler rule (#23202) (Boshen)
- 6c0bdf0 transformer/react-refresh: Support `module.property.useHook()`
(#23190) (Dunqing)
- 47991bd semantic: Report TS1228 for invalid type predicates (#23174)
(camc314)
- 1d3af58 parser: Add TS2398 parameter property diagnostic (#23216)
(camc314)
- 44313da semantic: Add `scope_is_descendant_of` api (#22313) (camc314)
- e5050c0 parser: Improve diagnostic for rest initializer (#23205)
(camc314)
- ec266bb transformer: Run React Compiler as a feature-gated transform
pass (#23201) (Boshen)
- e7374fe parser: Report error for `const` modifier on interface type
parameter (#23173) (camc314)
- a7c1c9b parser: Report ambient definite variable assertions (#23165)
(camc314)
- d169fcd parser: Report invalid class definite assertions (#23164)
(camc314)
- 00244d8 parser: Report definite property initializer errors (#23160)
(camc314)

### 🐛 Bug Fixes

- 52d0c31 transformer: Replace ambient dot defines (#23231) (camc314)
- 2c28748 transformer/class: Parent generated constructors to class
scope (#23222) (camc314)
- 8edd234 parser: Report accessor definite assertion on token (#23203)
(camc314)
- de38a3f react_compiler: Keep imports referenced only by a local
re-export (#23176) (Boshen)
- f5721c2 codegen: Preserve parentheses around `intrinsic` type
reference (#23156) (Boshen)
- e89f81d parser: Don't emit TS1477 for parenthesized instantiation
expression (#23147) (Boshen)
- 8a04149 parser: Reject module-referencing imports/exports in a
namespace body (#22829) (Boshen)

### ⚡ Performance

- 2783295 parser: Table-driven operator precedence lookup (#23346)
(Boshen)
- 231d5de parser: Single-match member expression dispatch (#23347)
(Boshen)
- e89729b codegen: Accept one-shot wrap closures (#23265) (camc314)
- a6c11fa parser: Force-inline read_non_decimal to fold per-digit number
dispatch (#23157) (Boshen)
- d74964c parser: Store class definite assertion offset (#23170)
(camc314)
- f0fda4d parser: Shrink-wrap cold diagnostic tails out of hot frames
(#23159) (Boshen)
- a082180 parser: Store definite assertion offset (#23167) (camc314)
- 534f9c6 oxc: Conditionally rebuild semantic in compiler pipeline
(#23153) (Boshen)
- b435c6a parser: Skip checkpoint for `infer T extends U` constraint in
disallow context (#23128) (Boshen)
- 7464dce parser: Peek instead of checkpoint/rewind for `export default`
modifier (#23124) (Boshen)
- 80a9a32 parser: Fast-path single-keyword TS declarations (#23083)
(Boshen)
- da1a6c6 diagnostics: Migrate to allocation-optimized oxc-miette
(#23094) (Boshen)
- b7b08ce parser: Peek once for the static modifier disambiguation
(#23079) (Boshen)
- e7e07a3 parser: Fold unary dispatch into a single match (#23076)
(Boshen)

### 📚 Documentation

- d241add semantic: Add `AGENTS.md` test guidance for agents (#23441)
(camc314)
- 026f1ae parser: Add `AGENTS.md` test guidance for agents (#23440)
(camc314)
- 09755ac transformer: Add `AGENTS.md` test guidance for agents (#23439)
(camc314)
- e6bdfd4 lexer: Correct reference link for `byte_handlers!` (#23313)
(Dunqing)
- 65b6d7a allocator: Fix memory leaks in `Arena` examples (#23257)
(overlookmotel)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
camc314 added a commit that referenced this pull request Jul 3, 2026
## Summary

- Change `Codegen::wrap` from `FnMut` to `FnOnce`, matching how the helper is used: every closure passed to `wrap` is invoked exactly once.
- Add an assembly comparison note showing the optimized codegen difference before and after the change.

## `Fn`, `FnMut`, and `FnOnce` ?

`Fn`, `FnMut`, and `FnOnce` all describe how a closure may be called:

- `Fn` is the most restrictive for the closure body: it is callable through `&self`, can be called repeatedly, and cannot require mutable or consuming access to captured state.
- `FnMut` is callable through `&mut self`, can be called repeatedly, and may mutate captured state.
- `FnOnce` is callable by value, may consume captured state, and is only guaranteed to be callable once.

The trait relationship goes from most specific to most general call capability:

```rust
Fn: FnMut
FnMut: FnOnce
```

So accepting `FnOnce` is the least restrictive bound for a callback that is only invoked once. It still accepts `Fn` and `FnMut` closures, but it also tells the optimizer that `wrap` does not need a reusable mutable closure object.

## Assembly Impact

`Codegen::wrap` is an inline generic helper, so there is no stable standalone `wrap` assembly symbol in release output. The impact shows up in monomorphized call sites such as `Class::gen`, `Function::gen`, and expression `gen_expr` closures.

Before, several call sites materialized a closure environment on the stack before calling the closure:

```asm
strb    w9, [sp, #15]
add     x9, sp, #15
stp     x0, x9, [sp, #16]
add     x0, sp, #16
bl      <...>::{{closure}}
```

After the `FnOnce` bound, the same shape can pass the one-shot closure state directly through registers:

```asm
and     w1, w2, #0xfffffffd
mov     x2, x19
bl      <...>::{{closure}}
```

Some wrapper frames also shrink. For example, representative `Class::gen` / `Function::gen` paths go from a `64` byte frame to a `48` byte frame:

```asm
- sub     sp, sp, #64
- stp     x20, x19, [sp, #32]
- stp     x29, x30, [sp, #48]
+ sub     sp, sp, #48
+ stp     x20, x19, [sp, #16]
+ stp     x29, x30, [sp, #32]
```

Several restored-frame return paths also become tail calls:

```asm
ldp     x29, x30, [sp, #32]
ldp     x20, x19, [sp, #16]
add     sp, sp, #48
b       <closure or push_slow target>
```

The assembly diff also contains expected local label renumbering noise, such as switch-table suffixes changing from `.318` to `.330`; those are not behavior changes.
camc314 pushed a commit that referenced this pull request Jul 3, 2026
### 💥 BREAKING CHANGES

- 7a24911 codegen: [**BREAKING**] Borrow sourcemaps from codegen
(#23422) (Boshen)
- bb0ed44 transformer: [**BREAKING**] Disable styled-components
transpileTemplateLiterals by default (#23171) (Boshen)

### 🚀 Features

- 1490a0a linter/react: Implement react-compiler rule (#23202) (Boshen)
- 6c0bdf0 transformer/react-refresh: Support `module.property.useHook()`
(#23190) (Dunqing)
- 47991bd semantic: Report TS1228 for invalid type predicates (#23174)
(camc314)
- 1d3af58 parser: Add TS2398 parameter property diagnostic (#23216)
(camc314)
- 44313da semantic: Add `scope_is_descendant_of` api (#22313) (camc314)
- e5050c0 parser: Improve diagnostic for rest initializer (#23205)
(camc314)
- ec266bb transformer: Run React Compiler as a feature-gated transform
pass (#23201) (Boshen)
- e7374fe parser: Report error for `const` modifier on interface type
parameter (#23173) (camc314)
- a7c1c9b parser: Report ambient definite variable assertions (#23165)
(camc314)
- d169fcd parser: Report invalid class definite assertions (#23164)
(camc314)
- 00244d8 parser: Report definite property initializer errors (#23160)
(camc314)

### 🐛 Bug Fixes

- 52d0c31 transformer: Replace ambient dot defines (#23231) (camc314)
- 2c28748 transformer/class: Parent generated constructors to class
scope (#23222) (camc314)
- 8edd234 parser: Report accessor definite assertion on token (#23203)
(camc314)
- de38a3f react_compiler: Keep imports referenced only by a local
re-export (#23176) (Boshen)
- f5721c2 codegen: Preserve parentheses around `intrinsic` type
reference (#23156) (Boshen)
- e89f81d parser: Don't emit TS1477 for parenthesized instantiation
expression (#23147) (Boshen)
- 8a04149 parser: Reject module-referencing imports/exports in a
namespace body (#22829) (Boshen)

### ⚡ Performance

- 2783295 parser: Table-driven operator precedence lookup (#23346)
(Boshen)
- 231d5de parser: Single-match member expression dispatch (#23347)
(Boshen)
- e89729b codegen: Accept one-shot wrap closures (#23265) (camc314)
- a6c11fa parser: Force-inline read_non_decimal to fold per-digit number
dispatch (#23157) (Boshen)
- d74964c parser: Store class definite assertion offset (#23170)
(camc314)
- f0fda4d parser: Shrink-wrap cold diagnostic tails out of hot frames
(#23159) (Boshen)
- a082180 parser: Store definite assertion offset (#23167) (camc314)
- 534f9c6 oxc: Conditionally rebuild semantic in compiler pipeline
(#23153) (Boshen)
- b435c6a parser: Skip checkpoint for `infer T extends U` constraint in
disallow context (#23128) (Boshen)
- 7464dce parser: Peek instead of checkpoint/rewind for `export default`
modifier (#23124) (Boshen)
- 80a9a32 parser: Fast-path single-keyword TS declarations (#23083)
(Boshen)
- da1a6c6 diagnostics: Migrate to allocation-optimized oxc-miette
(#23094) (Boshen)
- b7b08ce parser: Peek once for the static modifier disambiguation
(#23079) (Boshen)
- e7e07a3 parser: Fold unary dispatch into a single match (#23076)
(Boshen)

### 📚 Documentation

- d241add semantic: Add `AGENTS.md` test guidance for agents (#23441)
(camc314)
- 026f1ae parser: Add `AGENTS.md` test guidance for agents (#23440)
(camc314)
- 09755ac transformer: Add `AGENTS.md` test guidance for agents (#23439)
(camc314)
- e6bdfd4 lexer: Correct reference link for `byte_handlers!` (#23313)
(Dunqing)
- 65b6d7a allocator: Fix memory leaks in `Arena` examples (#23257)
(overlookmotel)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-codegen Area - Code Generation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants