Summary
The engine produces different line breaks than the browser when CJK text is followed by opening brackets. The bracket sticks to the preceding CJK text instead of the following content, causing visually wrong line wrapping.
Input: "서울(Seoul)과 부산(Busan)" at 180px
Browser: 서울 ← breaks here, ( goes to next line with Seoul
(Seoul)과
Pretext: 서울( ← ( stuck to 서울
Seoul)과
This affects all CJK languages: 東京(Tokyo), 北京(Beijing), 인공지능(AI), etc.
Why this matters
Parenthesized annotations after CJK text are everyday patterns in Korean, Japanese, and Chinese — brand names, technical terms, romanizations, abbreviations. In narrow containers (mobile chat bubbles, card layouts), the bracket hanging at the wrong line end is clearly visible.
Reproduction
Minimal code to see the bug (before fix):
import { prepareWithSegments } from '@chenglou/pretext'
const prepared = prepareWithSegments('서울(Seoul)', '20px serif')
console.log(prepared.segments)
// Before fix: ['서', '울(', 'Seoul)'] ← ( merged into CJK segment
// After fix: ['서', '울', '(Seoul)'] ✓
Or run the oracle checker against Chrome:
bun run scripts/cjk-bracket-check.ts --browser=chrome
Why this only breaks CJK, not English
English and CJK text go through different segmentation paths:
- English: word segmenter result stays as-is.
AB(CD) → [AB(] [CD)] — even though ( is merged, the line breaker can still break within AB( at character boundaries, so it works out.
- CJK: word segmenter result goes through
buildBaseCjkUnits() which splits further by character. 서울( → [서] [울] [(] — the ( becomes an isolated unit, disconnected from Seoul). The (Seoul) group is broken apart.
Root cause
The analysis pipeline in src/analysis.ts has a first-pass merge step (~line 1006) with a rule: "punctuation that isn't word-like should stick to the previous text segment." This rule correctly handles closing punctuation like ), ., , — but it also catches opening brackets (, [, { because isEscapedQuoteClusterSegment() matches all kinsokuEnd characters.
The merge runs before the forward-sticky pass, so ( gets consumed backward before it can be attached forward to Seoul).
Word segmenter: 서울 | ( | Seoul | ) | 과
leftSticky merge: 서울 | ( | Seoul) | 과 ← ) sticks to Seoul (correct)
first-pass merge: 서울( | Seoul) | 과 ← ( sticks backward (BUG)
forward-sticky: (nothing to do — ( already gone)
Fix
Added !mergedContainsCJK[prevIndex] guard at two merge points. When the previous segment contains CJK, skip the backward merge. The bracket then reaches the forward-sticky pass which correctly attaches it to the next segment:
After fix:
first-pass merge: SKIPPED (prev is CJK)
forward-sticky: 서울 | (Seoul) | 과 ✓
Non-CJK behavior unchanged. PR: #148
Test results after fix
CJK Bracket Check — Chrome
────────────────────────────────────────────────────────────
✓ PASS A1: Korean parenthesized English [4 lines]
✓ PASS A2: Japanese parenthesized English [4 lines]
✓ PASS A3: Chinese parenthesized English [3 lines]
✓ PASS A4: Korean abbreviation bracket [3 lines]
✓ PASS A5: Japanese abbreviation bracket [3 lines]
✓ PASS A6: Chinese abbreviation bracket [3 lines]
✓ PASS A7: Korean square brackets [3 lines]
✓ PASS A8: Japanese square brackets [3 lines]
✓ PASS A9: Korean curly braces [3 lines]
✓ PASS A10: Mixed CJK + nested brackets [3 lines]
Summary: chrome 10/10 pass
Summary
The engine produces different line breaks than the browser when CJK text is followed by opening brackets. The bracket sticks to the preceding CJK text instead of the following content, causing visually wrong line wrapping.
This affects all CJK languages:
東京(Tokyo),北京(Beijing),인공지능(AI), etc.Why this matters
Parenthesized annotations after CJK text are everyday patterns in Korean, Japanese, and Chinese — brand names, technical terms, romanizations, abbreviations. In narrow containers (mobile chat bubbles, card layouts), the bracket hanging at the wrong line end is clearly visible.
Reproduction
Minimal code to see the bug (before fix):
Or run the oracle checker against Chrome:
Why this only breaks CJK, not English
English and CJK text go through different segmentation paths:
AB(CD)→[AB(][CD)]— even though(is merged, the line breaker can still break withinAB(at character boundaries, so it works out.buildBaseCjkUnits()which splits further by character.서울(→[서][울][(]— the(becomes an isolated unit, disconnected fromSeoul). The(Seoul)group is broken apart.Root cause
The analysis pipeline in
src/analysis.tshas a first-pass merge step (~line 1006) with a rule: "punctuation that isn't word-like should stick to the previous text segment." This rule correctly handles closing punctuation like),.,,— but it also catches opening brackets(,[,{becauseisEscapedQuoteClusterSegment()matches allkinsokuEndcharacters.The merge runs before the forward-sticky pass, so
(gets consumed backward before it can be attached forward toSeoul).Fix
Added
!mergedContainsCJK[prevIndex]guard at two merge points. When the previous segment contains CJK, skip the backward merge. The bracket then reaches the forward-sticky pass which correctly attaches it to the next segment:Non-CJK behavior unchanged. PR: #148
Test results after fix