perf(linter/plugins): recycle comment objects#20362
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR extends the existing token object-reuse optimization to comments in the oxlint JS plugin runtime, shifting comment deserialization to a pooled Comment class with lazy loc computation and adding raw-transfer constants needed to read comments directly from the buffer.
Changes:
- Add pooled comment deserialization (
initComments/resetComments) with aCommentclass that lazily computes and cachesloc. - Update the raw-transfer generator and generated artifacts to provide comment buffer offsets/sizes/kind discriminants.
- Move the public
Commenttype toplugins/comments.tsand update imports/exports accordingly.
Reviewed changes
Copilot reviewed 9 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tasks/ast_tools/src/generators/typescript.rs | Adjust generated type exports/imports to reference plugins/comments.ts and use type-only exports. |
| tasks/ast_tools/src/generators/raw_transfer.rs | Generate comment-related constants and import comments/initComments into the linter deserializer. |
| napi/parser/src-js/generated/constants.js | Add generated constants for comment offsets/size/kind discriminant. |
| crates/oxc_ast/src/serialize/mod.rs | Update Program’s comments getter to use pooled initComments() path. |
| apps/oxlint/src-js/plugins/types.ts | Source plugin Comment type from plugins/comments.ts (removing inline interface). |
| apps/oxlint/src-js/plugins/tokens.ts | Ensure comment initialization goes through initComments(); remove local comment caching. |
| apps/oxlint/src-js/plugins/source_code.ts | Reset pooled comments between files via resetComments(). |
| apps/oxlint/src-js/plugins/location.ts | Narrow getNodeLoc to nodes only and update comments on loc-caching behavior. |
| apps/oxlint/src-js/plugins/comments.ts | Implement pooled Comment class, comment deserialization from buffer, and reset logic. |
| apps/oxlint/src-js/plugins.ts | Re-export Comment type from plugins/comments.ts. |
| apps/oxlint/src-js/generated/types.d.ts | Update generated type imports/exports to reference plugins/comments.ts. |
| apps/oxlint/src-js/generated/deserialize.js | Remove per-comment deserialization helpers; delegate to initComments(). |
| apps/oxlint/src-js/generated/deserialize.d.ts | Update GetLoc type to no longer accept Comment. |
| apps/oxlint/src-js/generated/constants.ts | Add generated constants for comment offsets/size/kind discriminant. |
Merge activity
|
f2c54e7 to
4240164
Compare
Apply the same optimization as #19978 to comments - hold a pool of `Comment` objects, and re-use those objects rather than creating new objects each time. Same as with `Token`s, `loc` property is a getter which calculates `loc` lazily, and caches it in a private property.
9a256b8 to
53acd73
Compare
Apply the same optimization as #19978 to comments - hold a pool of `Comment` objects, and re-use those objects rather than creating new objects each time. Same as with `Token`s, `loc` property is a getter which calculates `loc` lazily, and caches it in a private property.
4240164 to
b9f09fc
Compare
53acd73 to
9cd612f
Compare
### 🐛 Bug Fixes - edb8677 ecmascript: Treat collection constructor with variable arg as side-effectful (#20383) (Dunqing) - 1f65c3f transformer: Emit design:paramtypes when class has static anonymous class expression (#20382) (bab) - fa70d5c transformer: Use implementation signature for design:paramtypes when constructor is overloaded (#20394) (bab) - ed5a7fb parser: Report syntax error for `new super()` (#20384) (Boshen) - e62524d minifier: Treat object spread of getters as having side effects (#20380) (Boshen) - f8fbd6e linter/plugins: Remove `hashbang` property from AST (#20365) (overlookmotel) ### ⚡ Performance - 30a2b0f minifier: Use atom_from_strs_array for template literal concat (#20386) (Boshen) - 690ce17 minifier: Use Vec::with_capacity for inline template expressions (#20389) (Boshen) - 9cd612f linter/plugins: Recycle comment objects (#20362) (overlookmotel) Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
# Oxlint ### 🚀 Features - c95951f linter/plugins: Implement `sourceCode.markVariableAsUsed` (#20357) (overlookmotel) - 7a2a7d0 linter: Implement `n/handle-callback-err` rule (#19616) (Mikhail Baev) ### 🐛 Bug Fixes - f8fbd6e linter/plugins: Remove `hashbang` property from AST (#20365) (overlookmotel) - 6eb5b01 linter/prefer-await-to-then: Ignore Promise static methods (#20347) (camc314) - a4b61f7 linter: Remove `defineConfig` check (#20308) (camc314) - 3ad7f53 linter/explicit-module-boundary-types: False positive with satisfies expr (#20309) (camc314) - f547401 linter/no-unused-private-class-members: Treat switch discriminants as read (#20307) (camc314) - 1c07b3b diagnostics: Handle `WouldBlock` in stdout writes to prevent panic (#20295) (Boshen) ### ⚡ Performance - e4f7248 linter: Remove unnecessary clone of owned String in drain loop (#20388) (Boshen) - 4a67f1d linter: Eliminate Vec allocation in disable directive matching (#20387) (Boshen) - 618a598 linter/plugins: Add fast path for files with no comments (#20366) (overlookmotel) - b0125c5 linter/plugins: Deserialize comments without AST (#20364) (overlookmotel) - 9cd612f linter/plugins: Recycle comment objects (#20362) (overlookmotel) - bf442f8 linter/plugins: Cheaper `Token` creation (#20360) (overlookmotel) - 5474d0a semantic: V8-style walk-up reference resolution (#20292) (Boshen) - 7946eba linter/plugins: Avoid arguments spread and temp array when merging (#20318) (overlookmotel) - fc7cf8a linter/plugins: Pre-define less CFG merger functions (#20317) (overlookmotel) - 3b9eb28 linter/plugins: Streamline getting/creating visit fn mergers (#20319) (overlookmotel) - f04e850 linter/plugins: Inline binary search functions into call sites (#20312) (overlookmotel) - fe24afe linter/plugins: Apply replace globals TSDown plugin to JS files (#20305) (overlookmotel) - 77cdacc linter/plugins: Use array buffer views for tokens (#20301) (overlookmotel) - 910c941 linter/plugins: Reorder branches in `getTokenByRangeStart` (#20296) (overlookmotel) - af7674c linter/tokens: Avoid extra token value allocation (#20013) (camc314) ### 📚 Documentation - 24490b5 linter: Improve formatting for 80ish rules' docs. (#20411) (connorshea) - 3383523 linter: Improve `--tsconfig` flag docs (#20342) (camc314) # Oxfmt ### 🚀 Features - d22c443 oxfmt: Export `OxfmtConfig` type (#20275) (leaysgur) - a11ecff oxfmt/lsp: Respect `angular` language id as `.component.html` file (#20242) (Sysix) ### 🐛 Bug Fixes - ce65099 formatter: Preserve parentheses around as expression before private field access (#20419) (bab) - f908742 oxfmt: Revert #20326 partially (#20413) (leaysgur) - 4ef93ea formatter: Honor trailing ignore comments after list separators (#19925) (Andreas Lubbe) - 68fb0d0 oxfmt: Skip vite.config.ts which fails to import (#20326) (leaysgur) - 88ee826 oxfmt: Handle literalline for script-in-vue (#20130) (leaysgur) - 1c07b3b diagnostics: Handle `WouldBlock` in stdout writes to prevent panic (#20295) (Boshen) Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Fix a JS-side memory leak from tokens and comments. ## The problem V8 uses "SlicedString" optimization to avoid copying string data. `sourceText.slice(10, 20)` does not copy bytes 10..20 to a new string, but references the original string data of `sourceText` (much like `&str` in Rust). #19978 and #20362 introduced caching and recycling of `Token` and `Comment` objects. This was a large performance gain. However, it contained a flaw. The problem is that `Token` objects now live forever, and they contain little slices of the source text in their `value` property. This means they prevent the source text which they reference from being garbage-collected, and therefore it lives forever too (or at least until another file with an equal or greater number of tokens overwrites the `value` property of all the `Token` objects). ## Solution in this PR Fix this by setting the `value` property of all `Token`s and `Comment`s to `null` after linting a file in `resetTokens` and `resetComments`. ## Implementation details This requires some additional book-keeping to track which tokens have been deserialized and which haven't, so that `resetTokens` / `resetComments` do the minimum amount of work necessary - only resetting tokens / comments which actually had their `value` property set in the first place. The indexes of tokens are stored in 2 x `Uint32Array`s `deserializedTokenIndexes` and `deserializedCommentIndexes`. A simpler solution would have been to store the `Token` / `Comment` objects themselves in a plain `Array`, but `Uint32Array` has 3 advantages: * 4 bytes to store index (`u32`) per token/comment, instead of 8 bytes to store a pointer to the `Token` / `Comment` itself. * Faster to write to `Uint32Array` - no write barrier checks. * On GC runs, a `Uint32Array` is known not to contain references to any other heap objects, so GC tracing doesn't need to walk it, which it would with an `Array`. `Uint32Array` does have one disadvantage over `Array`: * 1 level of indirection in `resetTokens` / `resetComments`. You have to read the index, then index in to `cachedTokens` array to get the `Token` object, in order to reset it. The latter will only have a significant effect when there's a large number of tokens/comments to reset. We avoid this in the worst case where *all* tokens have been deserialized (e.g. by accessing `ast.tokens`). When `allTokensDeserialized === true`, `resetTokens` just loops through `cachedTokens` itself, and ignores the `Uint32Array` entirely. ## Future potential optimizations It's probably not necessary to reset all tokens quite so aggressively when all tokens have been deserialized. If a rule has done that on 1 file, chances are it'll do it again on the next file, at which point most `Token`s get their `value` property overwritten with slices of *that* source text anyway. It's acceptable for source texts to hang around in memory for a bit, just not forever. We could choose to do token resetting only e.g. once every 10 files, to reduce the cost. Possibly clean-up operations could also run in next tick, so they happen in parallel while Rust side is prepping the next file to send over to JS, rather than blocking.

Apply the same optimization as #19978 to comments - hold a pool of
Commentobjects, and re-use those objects rather than creating new objects each time.Same as with
Tokens,locproperty is a getter which calculatesloclazily, and caches it in a private property.