refactor(semantic): identify illegal function declarations without using AST nodes#22972
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
Merging this PR will not alter performance
Comparing Footnotes
|
1fabb2f to
b3a87da
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the JavaScript Annex B.3.3 function-redeclaration check to avoid reading a previous declaration’s AST node via NodeId, enabling future work where AST node storage can be optional. It does this by recording, at bind time, which function declarations were async/generator in sloppy-mode block scopes (JS-only), and consulting that build-only scratch data during the redeclaration check; it also improves diagnostics by pointing at the earlier async/generator declaration when that’s what makes the redeclaration illegal.
Changes:
- Replace previous-declaration AST-node inspection in
check_function_redeclarationwith a scan over build-time recordedasync/generator declarationNodeIds. - Add a build-only
Vec<NodeId>scratch field onSemanticBuilderto trackasync/generator function declarations in sloppy-mode JS block scopes. - Populate that scratch field during function binding under the narrow conditions where Annex B.3.3 applies.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| crates/oxc_semantic/src/checker/javascript.rs | Updates Annex B.3.3 redeclaration logic to avoid AST node reads and to report the earlier offending async/generator declaration span. |
| crates/oxc_semantic/src/builder.rs | Adds build-only scratch storage (async_or_generator_function_node_ids) on SemanticBuilder. |
| crates/oxc_semantic/src/binder.rs | Records qualifying async/generator function declaration node IDs during binding for later use by the checker. |
Merge activity
|
…ing AST nodes (#22972) Closes: #22934 Alternative to #22934. ### The problem `check_function_redeclaration` (the Annex B.3.3 legacy duplicate-function rule) distinguished a plain `FunctionDeclaration` from an `async`/generator one by reading the _previous_ declaration's AST node through its stored `NodeId`: ```rust let prev_function = ctx.nodes.kind(prev.declaration).as_function(); if prev_function.is_some_and(|f| !(f.r#async || f.generator)) { return; } ``` That node is a popped sibling (not an ancestor of the current node), so it's the last build-time read that would prevent making AST node storage optional. ### Starting point In real-world code, no-one uses function redeclarations. They are completely pointless. We only need to handle this case to be conformant with the spec, just to be able to say we're conformant with the spec. Further, the specific circumstances in which this Annex B rule applies are even more rare. All the following must be satisfied: - Sloppy-mode code. - Function redeclarations in a nested block (not top-level or nested in another function). - JS not TS. Therefore, we can assume the code which handles this case is unreachable in practice. The priority is to make recording the data required for the check as cheap as possible - in terms of perf, and memory usage. ### How Record the one bit of information we need at bind time, while the node is still in hand. When binding an `async`/generator function declaration in a sloppy-mode block, push its `NodeId` into a `Vec` on `SemanticBuilder`. The check then answers "was the previous declaration async/generator?" by searching this `Vec`. The check only applies in the specific circumstances listed above. So don't push values to the `Vec` in modules, strict mode, top-level code, or TypeScript. In practice, the `Vec` will always remain empty. The checking code in `check_redeclared_function` does a linear scan of the `Vec`, which is expensive, but as function redeclarations in nested blocks in sloppy mode JS are vanishingly rare in real-world code, this doesn't matter. Additionally, improve the errors when a violation _is_ found - the error points to the offending `async`/generator function, rather than stopping at the immediately preceding declaration. That only adds diagnostics on code that was already invalid - conformance is unchanged.
b3a87da to
594ce58
Compare
…d]` function (#22973) #22972 made the code which handles function redeclarations heavier. The assumption of that PR is that function redeclarations are extremely rare. So move all the expensive code into a separate `#[cold]` function, so that the fast checks in `check_function_redeclaration` (the hot path) can be inlined into the caller.

Closes: #22934
Alternative to #22934.
The problem
check_function_redeclaration(the Annex B.3.3 legacy duplicate-function rule) distinguished a plainFunctionDeclarationfrom anasync/generator one by reading the previous declaration's AST node through its storedNodeId:That node is a popped sibling (not an ancestor of the current node), so it's the last build-time read that would prevent making AST node storage optional.
Starting point
In real-world code, no-one uses function redeclarations. They are completely pointless. We only need to handle this case to be conformant with the spec, just to be able to say we're conformant with the spec.
Further, the specific circumstances in which this Annex B rule applies are even more rare. All the following must be satisfied:
Therefore, we can assume the code which handles this case is unreachable in practice. The priority is to make recording the data required for the check as cheap as possible - in terms of perf, and memory usage.
How
Record the one bit of information we need at bind time, while the node is still in hand. When binding an
async/generator function declaration in a sloppy-mode block, push itsNodeIdinto aVeconSemanticBuilder. The check then answers "was the previous declaration async/generator?" by searching thisVec.The check only applies in the specific circumstances listed above. So don't push values to the
Vecin modules, strict mode, top-level code, or TypeScript. In practice, theVecwill always remain empty.The checking code in
check_redeclared_functiondoes a linear scan of theVec, which is expensive, but as function redeclarations in nested blocks in sloppy mode JS are vanishingly rare in real-world code, this doesn't matter.Additionally, improve the errors when a violation is found - the error points to the offending
async/generator function, rather than stopping at the immediately preceding declaration. That only adds diagnostics on code that was already invalid - conformance is unchanged.