Improve Unicode handling in code-frame tokenizer by JLHwung · Pull Request #17589 · babel/babel

JLHwung · 2025-11-07T02:00:59Z

Q	A
Fixed Issues?	`@babel/code-frame` does not correctly tokenize non-BMP capitalized identifiers.
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass?	Yes
Documentation PR Link
Any Dependency Changes?
License	MIT

In this PR we enable the prefer-string-starts-ends-with ts-eslint rule and fixes most lint errors.

Then we improve Unicode handling in the code-frame tokenizer. Previously we only test whether string[0] equals to string[0].toLowerCase, apparently this approach does not respect non-BMP characters.

babel-bot · 2025-11-07T02:04:14Z

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/60164

pkg-pr-new · 2025-11-07T02:10:42Z

Open in StackBlitz

commit: a98ee72

ehoogeveen-medweb · 2025-11-08T16:19:02Z

Happened upon this and started wondering if you ever need to look at it on a grapheme cluster level to decide the case. Thankfully it seems that in practice, looking at the first code point is always enough.

Another O(1) way to get the first code point in a string is tokenValue[Symbol.iterator]().next().value, but aside from being very ugly I'm not sure it's faster either.

JLHwung · 2025-11-09T14:00:48Z

Happened upon this and started wondering if you ever need to look at it on a grapheme cluster level to decide the case. Thankfully it seems that in practice, looking at the first code point is always enough.

Another O(1) way to get the first code point in a string is tokenValue[Symbol.iterator]().next().value, but aside from being very ugly I'm not sure it's faster either.

Thank you. I think it might be overkill to implement UAX 29 as the cluster boundary rules are mostly for Hangul, Arabic or other scripts using ZWJ, and Emoji. None of them have concept of case.

The performance concern is a good point since the identifier handling is a hot path. I will add a fast pass for ASCII characters.

JLHwung added 2 commits November 6, 2025 20:38

chore: enable prefer-string-starts-ends-with

ff3bb19

fix: respect Unicode uppercase letter

3a1adee

JLHwung added the PR: Bug Fix 🐛 A type of pull request used for our changelog categories label Nov 7, 2025

skip test for Node.js 6

80e652b

perf: add fast path for ASCII identifiers

a98ee72

JLHwung requested a review from liuxingbaoyu November 13, 2025 15:04

liuxingbaoyu approved these changes Nov 14, 2025

View reviewed changes

JLHwung merged commit c92c491 into babel:main Nov 14, 2025
74 checks passed

JLHwung deleted the enable-prefer-string-starts-ends-with branch November 14, 2025 13:03

chrisbbreuer mentioned this pull request Jan 15, 2026

chore(deps): update all non-major dependencies stacksjs/bun-plugin-markdown#12

Closed

1 task

chrisbbreuer mentioned this pull request Jan 22, 2026

chore(deps): update all non-major dependencies stacksjs/bun-plugin-markdown#31

Closed

1 task

github-actions bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Feb 14, 2026

github-actions bot locked as resolved and limited conversation to collaborators Feb 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Unicode handling in code-frame tokenizer#17589

Improve Unicode handling in code-frame tokenizer#17589
JLHwung merged 4 commits intobabel:mainfrom
JLHwung:enable-prefer-string-starts-ends-with

JLHwung commented Nov 7, 2025

Uh oh!

babel-bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

ehoogeveen-medweb commented Nov 8, 2025

Uh oh!

JLHwung commented Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

JLHwung commented Nov 7, 2025

Uh oh!

babel-bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehoogeveen-medweb commented Nov 8, 2025

Uh oh!

JLHwung commented Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

babel-bot commented Nov 7, 2025 •

edited

Loading

pkg-pr-new bot commented Nov 7, 2025 •

edited

Loading