perf(formatter_core): add printable-ASCII fast path to TextWidth#23913
Conversation
`TextWidth::from_text` and `from_non_whitespace_str` call `UnicodeWidthStr::width`, which is a per-character multi-level table walk with no string-level ASCII shortcut. The text it measures (identifiers, private names, JSX names, numbers, and most string-literal content) is overwhelmingly printable ASCII. Add a fast path: if every byte is in `0x20..=0x7E` the display width equals the byte length and the text is single-line, so we can return it directly without consulting the Unicode width table. The range excludes `\t`, `\n`, all control bytes and every multi-byte UTF-8 sequence, which fall through to the unchanged scan. Behaviour is byte-identical: a differential unit test checks the fast path against the original computation over ASCII, tab/newline, control and multi-byte inputs, and the full Prettier conformance suite is unchanged.
Merging this PR will improve performance by 4.54%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | formatter[errors.ts] |
613.7 µs | 570.7 µs | +7.53% |
| ⚡ | Simulation | formatter[handle-comments.js] |
2.9 ms | 2.8 ms | +4.16% |
| ⚡ | Simulation | formatter[index.tsx] |
3.9 ms | 3.8 ms | +3.36% |
| ⚡ | Simulation | formatter[core.js] |
1.6 ms | 1.6 ms | +3.17% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing linyiru:perf/formatter-ascii-text-width (931afdd) with main (b4f5b4b)2
Footnotes
-
19 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
-
No successful run was found on
main(0b07c4c) during the generation of this report, so b4f5b4b was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
# Oxlint ### 💥 BREAKING CHANGES - 88f4455 str: [**BREAKING**] `Str` and `Ident` methods take `&GetAllocator` (#23781) (overlookmotel) ### 🚀 Features - f2091b3 ast: Unify old and new `AstBuilder`s (#23875) (overlookmotel) - 1c8f50c linter: Add schema for `eslint/no-restricted-import` (#23642) (Sysix) ### 🐛 Bug Fixes - 7cb85c4 linter/eslint/no-negated-condition: Add autofix for negated conditions (#23825) (Yagiz Nizipli) - f7d1f50 oxlint, oxfmt: Enable `disable_old_builder` Cargo feature for `oxc_ast` crate (#23886) (overlookmotel) - d891990 linter/jsx-a11y/role-supports-aria-props: Ignore nullish prop values (#23865) (Mikhail Baev) - 94b6599 linter: Deduplicate missing plugin errors (#23853) (camc314) - eff3eff linter/oxc/branches-sharing-code: Avoid else-if false positives (#23843) (camc314) - 2a2d3b9 linter/eslint/prefer-destructuring: Skip `AssignmentExpression` autofixes (#23818) (camc314) - ddc24ae linter/eslint/id-length: Respect checkGeneric for mapped type keys (#23802) (bab) - cd89202 linter/react/exhaustive-deps: Skip wrapper expression when analyzing hook initializers (#23793) (camc314) - 20e8285 linter/unicorn/prefer-native-coercion-function: Allow ts type predicates (#23774) (camc314) - d86f60b lsp: Normalize user config path to watch pattern (#23723) (Sysix) - 52032cf linter: Newline-terminate tsgolint errors (#23762) (Mikhail Baev) - 368fda7 linter/eslint/no-warning-comments: Avoid dropping generated regex patterns (#23741) (camc314) - ce44fbd linter/valid-title: Escape disallowed words regex (#23742) (camc314) - 3100d11 linter/prefer-called-exactly-once-with: Avoid out-of-bounds slice panic at end of file (#23625) (Jerry Zhao) - 742be36 refactor/node/handle-callback-err: Reject invalid regex config (#23740) (camc314) - d7be179 linter/eslint/no-restricted-globals: Handle shadowed locals (#23736) (camc314) - b3b1ff8 linter/vitest/expect-expect: Handle global vitest detection correctly (#23734) (camc314) ### ⚡ Performance - 68f9472 linter/jsx-a11y: Skip lowercasing non-aria attribute names (#23906) (Lawrence Lin) - b9312b4 linter/unicorn/prefer-export-from: Use keyed binding lookup (#23893) (Marius Schulz) - cd5204e linter/typescript/no-unsafe-declaration-merging: Use keyed binding lookup (#23894) (Marius Schulz) - e948498 linter/eslint/prefer-named-capture-group: Only dispatch for relevant node types (#23868) (Connor Shea) - 4ac7a8e linter/eslint/max-depth: Derive node types (#23896) (Connor Shea) - daeed09 linter/eslint/no-restricted-globals: Only scan unresolved references (#23890) (camc314) - e808514 linter/jest-vitest: Speed up no-standalone-expect (#23883) (camc314) - 8b165e5 linter/react/exhaustive-deps: Skip non-reactive calls early (#23882) (camc314) - 54005e7 linter/eslint/no-unused-vars: Precompute exported bindings (#23881) (camc314) - 9bc2f8c linter/unicorn/prefer-number-properties: Speed up global checks (#23880) (camc314) - 4ff104f linter: Optimize `require-hook` and `prefer-mock-*` rules to run on specific node types (#23871) (Connor Shea) - cc2213b linter: Run `no-underscore-dangle` only when relevant node types are present (#23867) (Connor Shea) - 3e55c21 linter/promise/always-return: Narrow to function node types (#23878) (Connor Shea) - 7136182 linter/jest-vitest: Speed up no-commented-out-tests (#23864) (camc314) - f138264 linter/eslint/no-script-url: Match javascript: prefix without allocating (#23861) (Lawrence Lin) - 7ef6895 linter/react/no-array-index-key: Delay index symbol lookup (#23857) (camc314) - 26bc171 linter/react/no-array-index-key: Match callback methods directly (#23856) (camc314) - 44fbbda linter/jsx-a11y/interactive-supports-focus: Check cheap conditions first (#23854) (camc314) - 84a5aa3 linter/eslint/no-extend-native: Skip lowercase references early (#23851) (camc314) - 88a74b2 linter/eslint/no-nonoctal-decimal-escape: Scan decimal escapes as bytes (#23850) (camc314) - fca69a8 linter: Skip traversal without this expressions (#23845) (camc314) - 838fd63 linter: Reduce preallocation for per-file diagnostics `Vec` (#23705) (Marius Schulz) - 417b506 linter/typescript/array-type: Remove full source text clone (#23751) (Marius Schulz) ### 📚 Documentation - 57e4469 linter/unicorn: Update prefer-dom-node-text-content rationale (#23933) (Mikhail Baev) - 3d61dea all: Correct capitalization in comments (#23887) (overlookmotel) ### 🛡️ Security - 3cdd18f deps: Update npm packages (#23690) (renovate[bot]) # Oxfmt ### 💥 BREAKING CHANGES - 259e0cd oxfmt,formatter_graphql: [**BREAKING**] Support draft syntax with removing prettier fallback (#23326) (leaysgur) - accbc49 oxfmt: [**BREAKING**] Format `parser:css,less,scss` files + css-in-js by `oxc_formatter_css` (#23321) (leaysgur) ### 🚀 Features - dffa4b3 formatter_css: Implement `oxc_formatter_css` (#23320) (leaysgur) - 01de9ec oxfmt: Format `parser:graphql` files by `oxc_formatter_graphql` (#23318) (leaysgur) - 4e66212 formatter_graphql: Implement oxc_formatter_graphql (#23317) (leaysgur) ### 🐛 Bug Fixes - 67325ae formatter_css: Handle frontmatter language (#23819) (leaysgur) - 3f355e5 formatter_graphql: Improve major prettier diffs (#23419) (leaysgur) - 48e2d78 formatter_css: Improve major prettier diffs (#23327) (leaysgur) - 8c07cad all: Enable `disable_old_builder` Cargo feature for `oxc_ast` crate in tests (#23888) (overlookmotel) - f7d1f50 oxlint, oxfmt: Enable `disable_old_builder` Cargo feature for `oxc_ast` crate (#23886) (overlookmotel) - d86f60b lsp: Normalize user config path to watch pattern (#23723) (Sysix) ### ⚡ Performance - 4ddcba0 formatter_core: Add printable-ASCII fast path to TextWidth (#23913) (Lawrence Lin) ### 📚 Documentation - b4d0dc9 oxfmt,formatter,formatter_css,formatter_core: Update AGENTS.md (#23814) (leaysgur) Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com> Co-authored-by: Cameron <cameron.clark@hey.com>
) ## What `TextWidth::from_text` and `TextWidth::from_non_whitespace_str` both call `UnicodeWidthStr::width`, which is a per-character multi-level table walk with no string-level ASCII shortcut. The text they measure — identifiers, private names, JSX names, numbers, and most string-literal content — is overwhelmingly printable ASCII. This adds a fast path: if every byte is in `0x20..=0x7E`, the display width equals the byte length and the text is single-line, so the result is returned directly without consulting the Unicode width table: ```rust if text.as_bytes().iter().all(|&b| matches!(b, 0x20..=0x7e)) { return Self::single(text.len() as u32); } ``` The range excludes `\t` (0x09), `\n` (0x0A), every control byte and all multi-byte UTF-8, which fall through to the unchanged scan. This replaces a per-char table lookup with an autovectorizable byte-range scan; it is *not* an attempt to beat existing SIMD. ## Correctness (byte-identical) - A new differential unit test asserts the fast path equals the original computation over ASCII, leading/trailing spaces, tabs, `\n`, `\r\n`, control bytes (VT/FF/DEL/US) and multi-byte/emoji-VS16 inputs (`cargo test -p oxc_formatter_core`). - The full **Prettier conformance suite is unchanged** — `js 746/753`, `ts 591/601`, all JSON variants and jsdoc identical, with **zero snapshot diffs**. ## Performance CodSpeed confirms a measurable improvement (the per-text-element table walk is removed on the hottest node types): | Benchmark | `BASE` | `HEAD` | Efficiency | | --- | --- | --- | --- | | `formatter[errors.ts]` | 613.1 µs | 570.3 µs | **+7.5%** | | `formatter[handle-comments.js]` | 2.9 ms | 2.8 ms | +4.17% | | `formatter[index.tsx]` | 3.9 ms | 3.8 ms | +3.19% | | `formatter[core.js]` | 1.6 ms | 1.6 ms | +3.15% | 4 improved benchmarks, 0 regressed. --- *Disclosure: this change was prepared with AI assistance (Claude). I reviewed and tested it myself — byte-identity is verified by the differential unit test and the unchanged Prettier conformance snapshots, and the speedup above is from CodSpeed on this PR.* --------- Co-authored-by: Yuji Sugiura <y.sugiura.0316@gmail.com>
# Oxlint ### 💥 BREAKING CHANGES - 88f4455 str: [**BREAKING**] `Str` and `Ident` methods take `&GetAllocator` (#23781) (overlookmotel) ### 🚀 Features - f2091b3 ast: Unify old and new `AstBuilder`s (#23875) (overlookmotel) - 1c8f50c linter: Add schema for `eslint/no-restricted-import` (#23642) (Sysix) ### 🐛 Bug Fixes - 7cb85c4 linter/eslint/no-negated-condition: Add autofix for negated conditions (#23825) (Yagiz Nizipli) - f7d1f50 oxlint, oxfmt: Enable `disable_old_builder` Cargo feature for `oxc_ast` crate (#23886) (overlookmotel) - d891990 linter/jsx-a11y/role-supports-aria-props: Ignore nullish prop values (#23865) (Mikhail Baev) - 94b6599 linter: Deduplicate missing plugin errors (#23853) (camc314) - eff3eff linter/oxc/branches-sharing-code: Avoid else-if false positives (#23843) (camc314) - 2a2d3b9 linter/eslint/prefer-destructuring: Skip `AssignmentExpression` autofixes (#23818) (camc314) - ddc24ae linter/eslint/id-length: Respect checkGeneric for mapped type keys (#23802) (bab) - cd89202 linter/react/exhaustive-deps: Skip wrapper expression when analyzing hook initializers (#23793) (camc314) - 20e8285 linter/unicorn/prefer-native-coercion-function: Allow ts type predicates (#23774) (camc314) - d86f60b lsp: Normalize user config path to watch pattern (#23723) (Sysix) - 52032cf linter: Newline-terminate tsgolint errors (#23762) (Mikhail Baev) - 368fda7 linter/eslint/no-warning-comments: Avoid dropping generated regex patterns (#23741) (camc314) - ce44fbd linter/valid-title: Escape disallowed words regex (#23742) (camc314) - 3100d11 linter/prefer-called-exactly-once-with: Avoid out-of-bounds slice panic at end of file (#23625) (Jerry Zhao) - 742be36 refactor/node/handle-callback-err: Reject invalid regex config (#23740) (camc314) - d7be179 linter/eslint/no-restricted-globals: Handle shadowed locals (#23736) (camc314) - b3b1ff8 linter/vitest/expect-expect: Handle global vitest detection correctly (#23734) (camc314) ### ⚡ Performance - 68f9472 linter/jsx-a11y: Skip lowercasing non-aria attribute names (#23906) (Lawrence Lin) - b9312b4 linter/unicorn/prefer-export-from: Use keyed binding lookup (#23893) (Marius Schulz) - cd5204e linter/typescript/no-unsafe-declaration-merging: Use keyed binding lookup (#23894) (Marius Schulz) - e948498 linter/eslint/prefer-named-capture-group: Only dispatch for relevant node types (#23868) (Connor Shea) - 4ac7a8e linter/eslint/max-depth: Derive node types (#23896) (Connor Shea) - daeed09 linter/eslint/no-restricted-globals: Only scan unresolved references (#23890) (camc314) - e808514 linter/jest-vitest: Speed up no-standalone-expect (#23883) (camc314) - 8b165e5 linter/react/exhaustive-deps: Skip non-reactive calls early (#23882) (camc314) - 54005e7 linter/eslint/no-unused-vars: Precompute exported bindings (#23881) (camc314) - 9bc2f8c linter/unicorn/prefer-number-properties: Speed up global checks (#23880) (camc314) - 4ff104f linter: Optimize `require-hook` and `prefer-mock-*` rules to run on specific node types (#23871) (Connor Shea) - cc2213b linter: Run `no-underscore-dangle` only when relevant node types are present (#23867) (Connor Shea) - 3e55c21 linter/promise/always-return: Narrow to function node types (#23878) (Connor Shea) - 7136182 linter/jest-vitest: Speed up no-commented-out-tests (#23864) (camc314) - f138264 linter/eslint/no-script-url: Match javascript: prefix without allocating (#23861) (Lawrence Lin) - 7ef6895 linter/react/no-array-index-key: Delay index symbol lookup (#23857) (camc314) - 26bc171 linter/react/no-array-index-key: Match callback methods directly (#23856) (camc314) - 44fbbda linter/jsx-a11y/interactive-supports-focus: Check cheap conditions first (#23854) (camc314) - 84a5aa3 linter/eslint/no-extend-native: Skip lowercase references early (#23851) (camc314) - 88a74b2 linter/eslint/no-nonoctal-decimal-escape: Scan decimal escapes as bytes (#23850) (camc314) - fca69a8 linter: Skip traversal without this expressions (#23845) (camc314) - 838fd63 linter: Reduce preallocation for per-file diagnostics `Vec` (#23705) (Marius Schulz) - 417b506 linter/typescript/array-type: Remove full source text clone (#23751) (Marius Schulz) ### 📚 Documentation - 57e4469 linter/unicorn: Update prefer-dom-node-text-content rationale (#23933) (Mikhail Baev) - 3d61dea all: Correct capitalization in comments (#23887) (overlookmotel) ### 🛡️ Security - 3cdd18f deps: Update npm packages (#23690) (renovate[bot]) # Oxfmt ### 💥 BREAKING CHANGES - 259e0cd oxfmt,formatter_graphql: [**BREAKING**] Support draft syntax with removing prettier fallback (#23326) (leaysgur) - accbc49 oxfmt: [**BREAKING**] Format `parser:css,less,scss` files + css-in-js by `oxc_formatter_css` (#23321) (leaysgur) ### 🚀 Features - dffa4b3 formatter_css: Implement `oxc_formatter_css` (#23320) (leaysgur) - 01de9ec oxfmt: Format `parser:graphql` files by `oxc_formatter_graphql` (#23318) (leaysgur) - 4e66212 formatter_graphql: Implement oxc_formatter_graphql (#23317) (leaysgur) ### 🐛 Bug Fixes - 67325ae formatter_css: Handle frontmatter language (#23819) (leaysgur) - 3f355e5 formatter_graphql: Improve major prettier diffs (#23419) (leaysgur) - 48e2d78 formatter_css: Improve major prettier diffs (#23327) (leaysgur) - 8c07cad all: Enable `disable_old_builder` Cargo feature for `oxc_ast` crate in tests (#23888) (overlookmotel) - f7d1f50 oxlint, oxfmt: Enable `disable_old_builder` Cargo feature for `oxc_ast` crate (#23886) (overlookmotel) - d86f60b lsp: Normalize user config path to watch pattern (#23723) (Sysix) ### ⚡ Performance - 4ddcba0 formatter_core: Add printable-ASCII fast path to TextWidth (#23913) (Lawrence Lin) ### 📚 Documentation - b4d0dc9 oxfmt,formatter,formatter_css,formatter_core: Update AGENTS.md (#23814) (leaysgur) Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com> Co-authored-by: Cameron <cameron.clark@hey.com>
What
TextWidth::from_textandTextWidth::from_non_whitespace_strboth callUnicodeWidthStr::width, which is a per-character multi-level table walk with no string-level ASCII shortcut. The text they measure — identifiers, private names, JSX names, numbers, and most string-literal content — is overwhelmingly printable ASCII.This adds a fast path: if every byte is in
0x20..=0x7E, the display width equals the byte length and the text is single-line, so the result is returned directly without consulting the Unicode width table:The range excludes
\t(0x09),\n(0x0A), every control byte and all multi-byte UTF-8, which fall through to the unchanged scan. This replaces a per-char table lookup with an autovectorizable byte-range scan; it is not an attempt to beat existing SIMD.Correctness (byte-identical)
\n,\r\n, control bytes (VT/FF/DEL/US) and multi-byte/emoji-VS16 inputs (cargo test -p oxc_formatter_core).js 746/753,ts 591/601, all JSON variants and jsdoc identical, with zero snapshot diffs.Performance
CodSpeed confirms a measurable improvement (the per-text-element table walk is removed on the hottest node types):
BASEHEADformatter[errors.ts]formatter[handle-comments.js]formatter[index.tsx]formatter[core.js]4 improved benchmarks, 0 regressed.
Disclosure: this change was prepared with AI assistance (Claude). I reviewed and tested it myself — byte-identity is verified by the differential unit test and the unchanged Prettier conformance snapshots, and the speedup above is from CodSpeed on this PR.