fix(linter/plugins): fix memory leak in tokens and comments by overlookmotel · Pull Request #20477 · oxc-project/oxc

overlookmotel · 2026-03-17T23:44:25Z

Fix a JS-side memory leak from tokens and comments.

The problem

V8 uses "SlicedString" optimization to avoid copying string data. sourceText.slice(10, 20) does not copy bytes 10..20 to a new string, but references the original string data of sourceText (much like &str in Rust).

#19978 and #20362 introduced caching and recycling of Token and Comment objects. This was a large performance gain.

However, it contained a flaw. The problem is that Token objects now live forever, and they contain little slices of the source text in their value property. This means they prevent the source text which they reference from being garbage-collected, and therefore it lives forever too (or at least until another file with an equal or greater number of tokens overwrites the value property of all the Token objects).

Solution in this PR

Fix this by setting the value property of all Tokens and Comments to null after linting a file in resetTokens and resetComments.

Implementation details

This requires some additional book-keeping to track which tokens have been deserialized and which haven't, so that resetTokens / resetComments do the minimum amount of work necessary - only resetting tokens / comments which actually had their value property set in the first place.

The indexes of tokens are stored in 2 x Uint32Arrays deserializedTokenIndexes and deserializedCommentIndexes. A simpler solution would have been to store the Token / Comment objects themselves in a plain Array, but Uint32Array has 3 advantages:

4 bytes to store index (u32) per token/comment, instead of 8 bytes to store a pointer to the Token / Comment itself.
Faster to write to Uint32Array - no write barrier checks.
On GC runs, a Uint32Array is known not to contain references to any other heap objects, so GC tracing doesn't need to walk it, which it would with an Array.

Uint32Array does have one disadvantage over Array:

1 level of indirection in resetTokens / resetComments. You have to read the index, then index in to cachedTokens array to get the Token object, in order to reset it.

The latter will only have a significant effect when there's a large number of tokens/comments to reset. We avoid this in the worst case where all tokens have been deserialized (e.g. by accessing ast.tokens). When allTokensDeserialized === true, resetTokens just loops through cachedTokens itself, and ignores the Uint32Array entirely.

Future potential optimizations

It's probably not necessary to reset all tokens quite so aggressively when all tokens have been deserialized. If a rule has done that on 1 file, chances are it'll do it again on the next file, at which point most Tokens get their value property overwritten with slices of that source text anyway.

It's acceptable for source texts to hang around in memory for a bit, just not forever. We could choose to do token resetting only e.g. once every 10 files, to reduce the cost.

Possibly clean-up operations could also run in next tick, so they happen in parallel while Rust side is prepping the next file to send over to JS, rather than blocking.

overlookmotel · 2026-03-17T23:44:42Z

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent changes, fast-track this PR to the front of the merge queue

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copilot

Pull request overview

Reduces peak memory usage during linting by ensuring token/comment value slices (and regex pattern/flags) don’t keep entire source-text strings alive across files when pooled objects are reused.

Changes:

Track indices of lazily deserialized tokens/comments and clear their value fields during reset* to release source-text slices.
Clear regex descriptor pattern/flags (and token.regex) on reset to avoid retaining sliced strings from regex literals.
Make deserialize*IfNeeded internal and route call sites through getToken/getComment so tracking is consistent.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
apps/oxlint/src-js/plugins/tokens.ts	Track deserialized token indices; clear token values + regex fields on reset; make `deserializeTokenIfNeeded` private.
apps/oxlint/src-js/plugins/tokens_methods.ts	Stop importing/using `deserializeTokenIfNeeded`; use `getToken` for range deserialization.
apps/oxlint/src-js/plugins/comments.ts	Track deserialized comment indices; clear comment values on reset; make `deserializeCommentIfNeeded` private.
apps/oxlint/src-js/plugins/comments_methods.ts	Stop importing/using `deserializeCommentIfNeeded`; use `getComment` for range deserialization.

graphite-app · 2026-03-21T12:25:57Z

Merge activity

Mar 21, 12:25 PM UTC: overlookmotel added this pull request to the Graphite merge queue.
Mar 21, 12:29 PM UTC: The Graphite merge queue removed this pull request due to downstack failures on PR #20475.
Mar 21, 12:29 PM UTC: The Graphite merge queue removed this pull request due to downstack failures on PR #20475.
Mar 21, 12:38 PM UTC: The merge label '0-merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
Mar 21, 12:43 PM UTC: overlookmotel added this pull request to the Graphite merge queue.
Mar 21, 12:50 PM UTC: Merged by the Graphite merge queue.

Fix a JS-side memory leak from tokens and comments. ## The problem V8 uses "SlicedString" optimization to avoid copying string data. `sourceText.slice(10, 20)` does not copy bytes 10..20 to a new string, but references the original string data of `sourceText` (much like `&str` in Rust). #19978 and #20362 introduced caching and recycling of `Token` and `Comment` objects. This was a large performance gain. However, it contained a flaw. The problem is that `Token` objects now live forever, and they contain little slices of the source text in their `value` property. This means they prevent the source text which they reference from being garbage-collected, and therefore it lives forever too (or at least until another file with an equal or greater number of tokens overwrites the `value` property of all the `Token` objects). ## Solution in this PR Fix this by setting the `value` property of all `Token`s and `Comment`s to `null` after linting a file in `resetTokens` and `resetComments`. ## Implementation details This requires some additional book-keeping to track which tokens have been deserialized and which haven't, so that `resetTokens` / `resetComments` do the minimum amount of work necessary - only resetting tokens / comments which actually had their `value` property set in the first place. The indexes of tokens are stored in 2 x `Uint32Array`s `deserializedTokenIndexes` and `deserializedCommentIndexes`. A simpler solution would have been to store the `Token` / `Comment` objects themselves in a plain `Array`, but `Uint32Array` has 3 advantages: * 4 bytes to store index (`u32`) per token/comment, instead of 8 bytes to store a pointer to the `Token` / `Comment` itself. * Faster to write to `Uint32Array` - no write barrier checks. * On GC runs, a `Uint32Array` is known not to contain references to any other heap objects, so GC tracing doesn't need to walk it, which it would with an `Array`. `Uint32Array` does have one disadvantage over `Array`: * 1 level of indirection in `resetTokens` / `resetComments`. You have to read the index, then index in to `cachedTokens` array to get the `Token` object, in order to reset it. The latter will only have a significant effect when there's a large number of tokens/comments to reset. We avoid this in the worst case where *all* tokens have been deserialized (e.g. by accessing `ast.tokens`). When `allTokensDeserialized === true`, `resetTokens` just loops through `cachedTokens` itself, and ignores the `Uint32Array` entirely. ## Future potential optimizations It's probably not necessary to reset all tokens quite so aggressively when all tokens have been deserialized. If a rule has done that on 1 file, chances are it'll do it again on the next file, at which point most `Token`s get their `value` property overwritten with slices of *that* source text anyway. It's acceptable for source texts to hang around in memory for a bit, just not forever. We could choose to do token resetting only e.g. once every 10 files, to reduce the cost. Possibly clean-up operations could also run in next tick, so they happen in parallel while Rust side is prepping the next file to send over to JS, rather than blocking.

Optimize storing regex tokens so they can have their `regex` property set back to `undefined` at the end of linting a file. Previously we pushed the `Token`s into an `Array` and then reset the array with `tokensWithRegex.length = 0;` at the end. This had a problem: When you set an array's length to 0, V8 releases the whole backing allocation, which means that on the next file, it has to allocate again. Worse, that new allocation is made in "new space", and if there's a lot of object creation in rules while linting the file, it will be copied during minor GC, and may even get graduated to "old space". All of that is expensive. Instead, use a `Uint32Array` to store indexes, for the same reasons as in #20477.

github-actions bot added A-linter Area - Linter A-cli Area - CLI A-linter-plugins Area - Linter JS plugins labels Mar 17, 2026

overlookmotel mentioned this pull request Mar 17, 2026

test(linter/plugins): fix stack traces in conformance snapshots #20473

Merged

github-actions bot added the C-bug Category - Bug label Mar 17, 2026

This was referenced Mar 17, 2026

perf(linter/plugins): lazy deserialize tokens and comments #20474

Merged

perf(linter/plugins): remove regex from getCommentsBefore + getCommentsAfter #20475

Merged

overlookmotel marked this pull request as ready for review March 17, 2026 23:45

overlookmotel requested a review from camc314 as a code owner March 17, 2026 23:45

Copilot AI review requested due to automatic review settings March 17, 2026 23:45

Copilot started reviewing on behalf of overlookmotel March 17, 2026 23:46 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

overlookmotel self-assigned this Mar 18, 2026

overlookmotel force-pushed the om/03-17-fix_linter_plugins_fix_memory_leak_in_tokens_and_comments branch from 0eb994b to 7b220af Compare March 18, 2026 09:45

This was referenced Mar 18, 2026

perf(linter/plugins): reduce operations in binary search #20490

Merged

perf(linter/plugins): recycle Location objects #20491

Merged

overlookmotel marked this pull request as draft March 18, 2026 09:47

overlookmotel force-pushed the om/03-17-fix_linter_plugins_fix_memory_leak_in_tokens_and_comments branch from 7b220af to f7bf3dc Compare March 18, 2026 16:27

overlookmotel marked this pull request as ready for review March 18, 2026 16:40

graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Mar 20, 2026

overlookmotel removed the 0-merge Merge with Graphite Merge Queue label Mar 20, 2026

overlookmotel added the 0-merge Merge with Graphite Merge Queue label Mar 20, 2026 — with Graphite App

overlookmotel force-pushed the om/03-17-refactor_linter_plugins_simplify_token_types branch from e097ce7 to 4224e9e Compare March 21, 2026 12:20

overlookmotel force-pushed the om/03-17-fix_linter_plugins_fix_memory_leak_in_tokens_and_comments branch from f7bf3dc to 1d06d2c Compare March 21, 2026 12:20

graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Mar 21, 2026

overlookmotel force-pushed the om/03-17-fix_linter_plugins_fix_memory_leak_in_tokens_and_comments branch from 1d06d2c to c00f3e6 Compare March 21, 2026 12:32

overlookmotel force-pushed the om/03-17-refactor_linter_plugins_simplify_token_types branch from 4224e9e to 8b4b4c0 Compare March 21, 2026 12:32

overlookmotel added 0-merge Merge with Graphite Merge Queue labels Mar 21, 2026

graphite-app bot changed the base branch from om/03-17-refactor_linter_plugins_simplify_token_types to graphite-base/20477 March 21, 2026 12:42

graphite-app bot changed the base branch from graphite-base/20477 to main March 21, 2026 12:45

graphite-app bot force-pushed the om/03-17-fix_linter_plugins_fix_memory_leak_in_tokens_and_comments branch from c00f3e6 to c3d9e91 Compare March 21, 2026 12:46

graphite-app bot merged commit c3d9e91 into main Mar 21, 2026
23 checks passed

graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Mar 21, 2026

graphite-app bot deleted the om/03-17-fix_linter_plugins_fix_memory_leak_in_tokens_and_comments branch March 21, 2026 12:50

This was referenced Mar 23, 2026

release(apps): oxlint v1.57.0 && oxfmt v0.42.0 #20649

Merged

release(apps): oxlint v1.57.0 && oxfmt v0.42.0 #20680

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(linter/plugins): fix memory leak in tokens and comments#20477

fix(linter/plugins): fix memory leak in tokens and comments#20477
graphite-app[bot] merged 1 commit intomainfrom
om/03-17-fix_linter_plugins_fix_memory_leak_in_tokens_and_comments

overlookmotel commented Mar 17, 2026 •

edited

Loading

Uh oh!

overlookmotel commented Mar 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

graphite-app bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

overlookmotel commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The problem

Solution in this PR

Implementation details

Future potential optimizations

Uh oh!

overlookmotel commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

graphite-app bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

overlookmotel commented Mar 17, 2026 •

edited

Loading

overlookmotel commented Mar 17, 2026 •

edited

Loading

graphite-app bot commented Mar 21, 2026 •

edited

Loading