perf(linter/plugins): lazy deserialize tokens and comments by overlookmotel · Pull Request #20474 · oxc-project/oxc

overlookmotel · 2026-03-17T23:43:47Z

Performance improvement to tokens and comments APIs.

The problem

Previously, all tokens and comments methods would deserialize all tokens/comments into an array of Token / Comment / Token | Comment objects, and then binary search through those arrays to find the token(s) / comment(s) they're looking for.

This has 2 major disadvantages:

Files typically contain a lot of tokens (even more than the number of AST nodes). Deserializing them all is very costly (up to 30% of total Oxlint runtime when run with only a JS rule which just calls a tokens-related method).
The binary searches these methods do are quite expensive. Even in TurboFan-optimized code, accessing token.start involves getting pointer to the Token object from the tokens array, an "is this object a Token?" safety check, then reading the start field from the Token - all just to access a single u32, and that happens over and over.

This PR's solution

Solve both these problems by making tokens and comments methods read start / end offsets directly from the buffers which contain the tokens/comments data.

This data is tightly packed in memory, and strongly typed (read from Uint32Arrays), so getting start / end of a token requires no indirection and no type checks.

More importantly, it removes the need to deserialize all tokens / comments upfront. The desired token(s) are located, touching only the buffer, and then only the ones which need to be returned to rule code are deserialized into JS objects.

If a rule accesses ast.tokens, ast.comments, or sourceCode.tokensAndComments then all tokens / comments need to be deserialized, as they're all returned to the rule as an array - but that's unavoidable. This PR doesn't make that any cheaper, but it doesn't make it measurably more costly either.

But where no rule requires the full array of tokens / comments, and they only use token/comment search methods (e.g. getFirstToken, getCommentsBefore), a great deal of work will be saved. This covers the vast majority of rules.

Implementation details

The main complication is the includeComments option to tokens methods. When true, search needs to be over a combined set of both tokens and comments.

When includeComments: true option is passed to a tokens method, a buffer is created containing data about all tokens and comments, interleaved in source code order. This buffer can then be used for binary search in tokens methods.

Whether each token / comment has been deserialized already or not is tracked by a "deserialized" flag in the tokens/comments buffers. Each token / comment in the buffer is 16 bytes. This flag lives in byte 15. For tokens, this byte is always already 0 in the buffer when it arrives from Rust side. For comments, we manually set comment.content = CommentContent::None; for every comment on Rust side. comment.content is positioned at byte 15 in the Comment struct, and CommentContent::None is stored as 0.

Possible future improvements

SoA storage

Binary search operates only on start field of tokens / comments, which are 16 bytes apart in the buffer. It would be more efficient if tokens were stored in struct-of-arrays (SoA) style so all start values were tightly packed together. This would reduce CPU cache misses in the hot loops of binary searches.

Pre-compute tokens-and-comments buffer on Rust side

The buffer containing tokens and comments, required to support includeComments: true, is currently generated on JS side (but lazily). We could move that to Rust side, which would be faster. However, it might be redundant work in many cases because the buffer is only required if a rule uses includeComments: true.

We could alternatively keep the laziness optimization, by calling back into Rust to build the buffer on demand - but JS-Rust calls have a cost too. Maybe communicating via Atomics would be faster than an actual function call?

If we had a way to share buffers with WASM, optimal solution might be to generate the buffer lazily (as now) but in WASM, which would be faster for this kind of pure number-crunching, but without the overhead of calling into Rust.

overlookmotel · 2026-03-17T23:43:58Z

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent changes, fast-track this PR to the front of the merge queue

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copilot

Pull request overview

This PR optimizes Oxlint’s JS plugin token/comment APIs by avoiding eager deserialization and performing searches directly over packed Uint32Array buffers, only materializing JS objects for entries that are actually returned to rules.

Changes:

Add a per-entry “deserialized” flag (byte offset exported as DESERIALIZED_FLAG_OFFSET) to support lazy token/comment object creation.
Rewrite tokens_methods.ts and comments_methods.ts to binary-search packed buffers (and build a merged tokens+comments buffer lazily for includeComments: true).
Add a new tokens_and_comments.ts module to manage the merged buffer and cached tokensAndComments array, plus fixture coverage to validate initialization/order/identity invariants.

Reviewed changes

Copilot reviewed 135 out of 137 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tasks/ast_tools/src/generators/raw_transfer.rs	Exports `DESERIALIZED_FLAG_OFFSET` constant derived from Rust layout for JS-side lazy deserialization.
napi/parser/src-js/generated/constants.js	Regenerates `DESERIALIZED_FLAG_OFFSET` in JS generated constants.
crates/oxc_linter/src/lib.rs	Sets `comment.content = None` during external-linter raw-transfer prep to provide a zeroed flag byte.
apps/oxlint/src/js_plugins/parse.rs	Mirrors the same `comment.content = None` behavior for JS plugin parsing/raw transfer.
apps/oxlint/tsdown_plugins/inline_search.ts	Updates docs/examples to match the new inlined binary search signature over `Uint32Array`.
apps/oxlint/src-js/plugins/tokens.ts	Adds lazy token buffer views + deserialization flag tracking and on-demand token materialization.
apps/oxlint/src-js/plugins/comments.ts	Adds lazy comment buffer views + deserialization flag tracking and on-demand comment materialization.
apps/oxlint/src-js/plugins/tokens_and_comments.ts	New module: builds/queries merged tokens+comments buffer and cached `tokensAndComments` array.
apps/oxlint/src-js/plugins/tokens_methods.ts	Reworks token search APIs to use packed buffers and lazily deserialize returned entries.
apps/oxlint/src-js/plugins/comments_methods.ts	Reworks comment adjacency APIs to use packed buffers and lazily deserialize returned entries.
apps/oxlint/src-js/plugins/source_code.ts	Switches `tokensAndComments` getter to the new merged-buffer implementation and resets it per file.
apps/oxlint/src-js/generated/constants.ts	Regenerates `DESERIALIZED_FLAG_OFFSET` in TS generated constants.
apps/oxlint/test/tokens.test.ts	Updates type import to the new `tokens_and_comments.ts` type source.
apps/oxlint/test/tokens.test-d.ts	Updates type imports to split `Token` vs `TokenOrComment` sources.
apps/oxlint/test/fixtures/tokens_and_comments_order/plugin.ts	Adds fixture plugin validating correctness + object identity across API access permutations.
apps/oxlint/test/fixtures/tokens_and_comments_order/.oxlintrc.json	Fixture config enabling the plugin rule for the permutation test corpus.
apps/oxlint/test/fixtures/tokens_and_comments_order/output.snap.md	Snapshot output for the new permutation fixture run.
apps/oxlint/test/fixtures/tokens_and_comments_order/files/*.js	Adds 120 permutation fixture files (001–120) with identical content to vary access order by filename.

apps/oxlint/src-js/plugins/tokens_methods.ts

crates/oxc_linter/src/lib.rs

apps/oxlint/src/js_plugins/parse.rs

tasks/ast_tools/src/generators/raw_transfer.rs

Copilot

Pull request overview

This PR optimizes Oxlint’s JS plugin token/comment APIs by avoiding eager deserialization and by performing searches directly over tightly packed raw-transfer buffers (Uint32Array-backed), only materializing JS Token/Comment objects when they’re actually returned to rule code.

Changes:

Add a shared “deserialized” flag byte (exported as DESERIALIZED_FLAG_OFFSET) to support lazy token/comment deserialization.
Refactor tokens_methods.ts and comments_methods.ts to binary-search raw buffers and lazily deserialize results.
Introduce tokens_and_comments.ts to lazily build and reuse an interleaved tokens+comments buffer/array for includeComments: true and sourceCode.tokensAndComments.

Reviewed changes

Copilot reviewed 135 out of 137 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tasks/ast_tools/src/generators/raw_transfer.rs	Generates `DESERIALIZED_FLAG_OFFSET` constant from Rust `Comment.content` field offset for raw-transfer interop.
napi/parser/src-js/generated/constants.js	Emits `DESERIALIZED_FLAG_OFFSET` into the NAPI parser JS constants output.
apps/oxlint/src-js/generated/constants.ts	Emits `DESERIALIZED_FLAG_OFFSET` into the Oxlint JS runtime constants output.
crates/oxc_linter/src/lib.rs	Sets `comment.content = None` during UTF-16 span conversion so byte 15 is 0 for the JS-side flag.
apps/oxlint/src/js_plugins/parse.rs	Mirrors the `comment.content = None` initialization for the JS plugins parse path.
apps/oxlint/tsdown_plugins/inline_search.ts	Updates inline-search plugin docs/examples to reflect the new Uint32Array-based search signature.
apps/oxlint/src-js/plugins/tokens.ts	Adds token buffer views + lazy per-token deserialization using the flag byte.
apps/oxlint/src-js/plugins/comments.ts	Adds comment buffer views + lazy per-comment deserialization using the shared flag byte.
apps/oxlint/src-js/plugins/tokens_and_comments.ts	New module: lazily builds/reuses merged tokens+comments buffer and cached merged array.
apps/oxlint/src-js/plugins/tokens_methods.ts	Refactors token search APIs to use raw buffers (Uint32Array) and lazy deserialization.
apps/oxlint/src-js/plugins/comments_methods.ts	Refactors comment adjacency/range APIs to use raw buffers and lazy deserialization.
apps/oxlint/src-js/plugins/source_code.ts	Switches `tokensAndComments` getter to new `getTokensAndComments()` and resets merged state per file.
apps/oxlint/test/tokens.test.ts	Updates type import for `TokenOrComment` to new `tokens_and_comments.ts` location.
apps/oxlint/test/tokens.test-d.ts	Updates type imports to split `Token` and `TokenOrComment` across modules.
apps/oxlint/test/fixtures/tokens_and_comments_order/**	Adds a fixture/plugin + snapshots validating correct behavior across all access-order permutations (test data).

graphite-app · 2026-03-21T12:23:36Z

Merge activity

Mar 21, 12:23 PM UTC: overlookmotel added this pull request to the Graphite merge queue.
Mar 21, 12:28 PM UTC: Merged by the Graphite merge queue.

Performance improvement to tokens and comments APIs. ## The problem Previously, all tokens and comments methods would deserialize *all* tokens/comments into an array of `Token` / `Comment` / `Token | Comment` objects, and then binary search through those arrays to find the token(s) / comment(s) they're looking for. This has 2 major disadvantages: 1. Files typically contain *a lot* of tokens (even more than the number of AST nodes). Deserializing them all is very costly (up to 30% of total Oxlint runtime when run with only a JS rule which just calls a tokens-related method). 2. The binary searches these methods do are quite expensive. Even in TurboFan-optimized code, accessing `token.start` involves getting pointer to the `Token` object from the `tokens` array, an "is this object a `Token`?" safety check, then reading the `start` field from the `Token` - all just to access a single `u32`, and that happens over and over. ## This PR's solution Solve both these problems by making tokens and comments methods read `start` / `end` offsets directly from the buffers which contain the tokens/comments data. This data is tightly packed in memory, and strongly typed (read from `Uint32Array`s), so getting `start` / `end` of a token requires no indirection and no type checks. More importantly, it removes the need to deserialize all tokens / comments upfront. The desired token(s) are located, touching only the buffer, and then *only* the ones which need to be returned to rule code are deserialized into JS objects. If a rule accesses `ast.tokens`, `ast.comments`, or `sourceCode.tokensAndComments` then all tokens / comments need to be deserialized, as they're all returned to the rule as an array - but that's unavoidable. This PR doesn't make that any cheaper, but it doesn't make it measurably more costly either. But where no rule requires the full array of tokens / comments, and they only use token/comment search methods (e.g. `getFirstToken`, `getCommentsBefore`), a great deal of work will be saved. This covers the vast majority of rules. ## Implementation details The main complication is the `includeComments` option to tokens methods. When `true`, search needs to be over a combined set of both tokens and comments. When `includeComments: true` option is passed to a tokens method, a buffer is created containing data about all tokens and comments, interleaved in source code order. This buffer can then be used for binary search in tokens methods. Whether each token / comment has been deserialized already or not is tracked by a "deserialized" flag in the tokens/comments buffers. Each token / comment in the buffer is 16 bytes. This flag lives in byte 15. For tokens, this byte is always already 0 in the buffer when it arrives from Rust side. For comments, we manually set `comment.content = CommentContent::None;` for every comment on Rust side. `comment.content` is positioned at byte 15 in the `Comment` struct, and `CommentContent::None` is stored as 0. ## Possible future improvements ### SoA storage Binary search operates only on `start` field of tokens / comments, which are 16 bytes apart in the buffer. It would be more efficient if tokens were stored in struct-of-arrays (SoA) style so all `start` values were tightly packed together. This would reduce CPU cache misses in the hot loops of binary searches. ### Pre-compute tokens-and-comments buffer on Rust side The buffer containing tokens and comments, required to support `includeComments: true`, is currently generated on JS side (but lazily). We could move that to Rust side, which would be faster. However, it might be redundant work in many cases because the buffer is only required if a rule uses `includeComments: true`. We could alternatively keep the laziness optimization, by calling back into Rust to build the buffer on demand - but JS-Rust calls have a cost too. Maybe communicating via `Atomics` would be faster than an actual function call? If we had a way to share buffers with WASM, optimal solution might be to generate the buffer lazily (as now) but in WASM, which would be faster for this kind of pure number-crunching, but without the overhead of calling into Rust.

overlookmotel mentioned this pull request Mar 17, 2026

test(linter/plugins): fix stack traces in conformance snapshots #20473

Merged

github-actions bot added A-linter Area - Linter A-parser Area - Parser A-cli Area - CLI A-ast-tools Area - AST tools A-linter-plugins Area - Linter JS plugins C-performance Category - Solution not expected to change functional behavior, only performance labels Mar 17, 2026

overlookmotel self-assigned this Mar 18, 2026

This was referenced Mar 18, 2026

perf(linter/plugins): reduce operations in binary search #20490

Merged

perf(linter/plugins): recycle Location objects #20491

Merged

overlookmotel marked this pull request as ready for review March 18, 2026 13:44

overlookmotel requested a review from camc314 as a code owner March 18, 2026 13:44

Copilot AI review requested due to automatic review settings March 18, 2026 13:44

Copilot started reviewing on behalf of overlookmotel March 18, 2026 13:44 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

apps/oxlint/src-js/plugins/tokens_methods.ts Show resolved Hide resolved

crates/oxc_linter/src/lib.rs Show resolved Hide resolved

apps/oxlint/src/js_plugins/parse.rs Show resolved Hide resolved

tasks/ast_tools/src/generators/raw_transfer.rs Show resolved Hide resolved

overlookmotel requested a review from Copilot March 18, 2026 19:05

Copilot AI reviewed Mar 18, 2026

View reviewed changes

overlookmotel marked this pull request as draft March 18, 2026 20:05

overlookmotel mentioned this pull request Mar 19, 2026

fix(linter/plugins): include loc when call JSON.stringify on Tokens and Comments #20512

Merged

graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Mar 20, 2026

overlookmotel removed the 0-merge Merge with Graphite Merge Queue label Mar 20, 2026

overlookmotel added the 0-merge Merge with Graphite Merge Queue label Mar 20, 2026 — with Graphite App

overlookmotel marked this pull request as ready for review March 21, 2026 12:19

overlookmotel force-pushed the om/03-15-test_linter_plugins_fix_stack_traces_in_conformance_snapshots branch from e35f7ab to e487ead Compare March 21, 2026 12:20

overlookmotel force-pushed the om/03-14-perf_linter_plugins_lazy_deserialize_tokens_and_comments branch from 77b3b94 to 839f548 Compare March 21, 2026 12:20

graphite-app bot force-pushed the om/03-15-test_linter_plugins_fix_stack_traces_in_conformance_snapshots branch from e487ead to 608cd3c Compare March 21, 2026 12:24

graphite-app bot force-pushed the om/03-14-perf_linter_plugins_lazy_deserialize_tokens_and_comments branch from 839f548 to 9a622c7 Compare March 21, 2026 12:24

Base automatically changed from om/03-15-test_linter_plugins_fix_stack_traces_in_conformance_snapshots to main March 21, 2026 12:27

graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Mar 21, 2026

graphite-app bot merged commit 9a622c7 into main Mar 21, 2026
26 checks passed

graphite-app bot deleted the om/03-14-perf_linter_plugins_lazy_deserialize_tokens_and_comments branch March 21, 2026 12:28

overlookmotel added 0-merge Merge with Graphite Merge Queue labels Mar 21, 2026

This was referenced Mar 23, 2026

release(crates): oxc v0.122.0 #20648

Merged

release(apps): oxlint v1.57.0 && oxfmt v0.42.0 #20649

Merged

release(apps): oxlint v1.57.0 && oxfmt v0.42.0 #20680

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(linter/plugins): lazy deserialize tokens and comments#20474

perf(linter/plugins): lazy deserialize tokens and comments#20474
graphite-app[bot] merged 1 commit intomainfrom
om/03-14-perf_linter_plugins_lazy_deserialize_tokens_and_comments

overlookmotel commented Mar 17, 2026 •

edited

Loading

Uh oh!

overlookmotel commented Mar 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

graphite-app bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

overlookmotel commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The problem

This PR's solution

Implementation details

Possible future improvements

SoA storage

Pre-compute tokens-and-comments buffer on Rust side

Uh oh!

overlookmotel commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

graphite-app bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

overlookmotel commented Mar 17, 2026 •

edited

Loading

overlookmotel commented Mar 17, 2026 •

edited

Loading

graphite-app bot commented Mar 21, 2026 •

edited

Loading