Pass the inferred writing script to HarfBuzz, making `locl` effective by YDX-2147483647 · Pull Request #7415 · typst/typst

YDX-2147483647 · 2025-11-19T10:44:33Z

This PR lets the inline shaping engine infer the scripting script (e.g., hani/latn/…) from the surrounding context of the text being shaped, and pass it to HarfBuzz if appropriate.

It will make the OpenType locl (Localized Forms) feature also effective for edge cases.
For instance, the second period mark in the example below uses the wrong glyph (corner-justified form) with v0.14.0, but will use the correct glyph (centered form) with this PR.

#set text(lang: "zh", region: "TW", font: "Noto Serif CJK SC")
#set heading(numbering: "1")
= Heading <a>

句號。@a。@a 何故？

Before:

After: (the screenshot is trimmed)

Fixes #7396

Updating Noto CJK in dev assets

Note

No test is added because typst-dev-assets has no font supporting locl.
The latest Noto CJK supports locl, but the current version in typst-dev-assets are too old. See #7396 (comment) for details.

Discussions were moved to typst/typst-dev-assets#18.

Fixes typst#7396

laurmaedje · 2026-02-10T16:44:01Z

crates/typst-layout/src/inline/shaping.rs

+        // If all characters in the range have generic scripts, search beyond
+        // the range to determine a specific script. If it still fails, use the
+        // original (though generic) script.
+        let prior_script = std::iter::once(prior_script)
+            .chain(
+                // First backward then forward
+                (text[..range.start].chars().rev())
+                    .chain(text[range.end..].chars())
+                    .map(|c| c.script()),
+            )
+            .find(|&sc| !is_generic_script(sc))
+            .unwrap_or(prior_script);


Is there some precedent for this particular fallback approach in other software? Sometimes, we have to go with a custom approach, but generally I would always try to look for prior art.

Generally speaking, since we already do script segmentation below, we probably don't need to rely on Harfbuzz's segment property guessing at all... and could always provide the appropriate script here. Though I'm not sure whether that has any unintended consequences if applied naively.

Is there some precedent for this particular fallback approach

Well, I am not sure…

Regarding the overall direction, I discussed the issue briefly with a member of W3C Chinese Layout Task Force via email. I didn’t show him the code or any detailed algorithm, but he inferred that it should be fixed by the high-level engine, not the font or HarfBuzz.

As for the specific fallback approach, to be honest, I (kind of) came up with it on my own.

The root cause of the issue is that markups can interrupt the segmentation algorithm, making information beyond range necessary for shape_range(text, range, ...). If there wasn’t markup, those ranges would be passed to shape_range as a whole and everything would be okay.

To get around that, this PR incorporates information beyond range into shape_range and shape_segment as if the ranges had never been split by markups. In this sense, this PR does not propose any new approach.

Nonetheless, AI agents (Claude in English and Kimi in Chinese) claim that the current matches GTK/Pango, ICU, Skia, Qt QTextEngine, etc. I am not capable of verifying their claims. Here’s what I know:

UAX #24: Unicode Script Property:

A value of Inherited means that the character is treated as if it had the Script property value of a preceding base character.

This PR matches the above description, except that it also looks subsequent characters. That’s necessary because typst-layout/src/inline/shaping.rs does not distinguish between Inherited and Common, and （ in “（示例……）” is Common and hence requires looking forward.

ICU4C uscript_nextRun does not consider information beyond the range, and it has a subtle treatment for paired characters.

Pango’s codebase is too complicated for me to understand, but it looks like that it also has special treatment for paired characters.

How the script of 。 is determined in <strong>示例</strong>。? - DeepWiki | Search (in Chinese)

I didn’t check Skia or ICU4J in details.

we probably don't need to rely on Harfbuzz's segment property guessing

I agree. WebKit developers met a bug about in 2013: “Leaving direction to HarfBuzz to guess is really bad, but will do for now.”

However, considering that there are currently not many related issues, I think it's not worth refactoring at this time. It would be better to keep it as it is.

Letting HarfBuzz guess any segment properties is really bad, production code should never do that (in my opinion it is even a mistake that HarfBuzz exposes this API).

Script segmentation should be done to full paragraph text at once, just like bidi. So inline markup should not interrupt it.

Another implementation that might be easier to follow, is Raqm’s.

typst#7415 review questions) Co-authored-by: YDX-2147483647 <73375426+YDX-2147483647@users.noreply.github.com>

Pass the inferred writing script to HarfBuzz, making locl effective

4bbf7cc

Fixes typst#7396

YDX-2147483647 force-pushed the locl-script branch from ac766e0 to 4bbf7cc Compare November 19, 2025 11:09

YDX-2147483647 marked this pull request as ready for review November 19, 2025 11:13

laurmaedje added the waiting-on-review This PR is waiting to be reviewed. label Nov 19, 2025

YDX-2147483647 mentioned this pull request Nov 26, 2025

Update Noto CJK typst/typst-dev-assets#18

Open

laurmaedje added the text Related to the text category, which is all about text handling, shaping, etc. label Dec 4, 2025

YDX-2147483647 added 4 commits December 12, 2025 00:22

Merge branch 'main' into locl-script

ef3c506

Merge branch 'main' into locl-script

1f8d1f3

Merge branch 'main' into locl-script

e71b2a0

Merge branch 'main' into locl-script

c902594

laurmaedje reviewed Feb 10, 2026

View reviewed changes

laurmaedje added waiting-on-author Pull request waits on author and removed waiting-on-review This PR is waiting to be reviewed. labels Feb 10, 2026

Merge branch 'main' into locl-script

8415ea7

Copilot AI mentioned this pull request Mar 11, 2026

Pass inferred writing script to HarfBuzz for correct OpenType locl support YDX-2147483647/typst#1

Closed

Copilot AI added a commit to YDX-2147483647/typst that referenced this pull request Mar 11, 2026

Implement always-provide-script approach for locl support (addresses PR

17ea38f

typst#7415 review questions) Co-authored-by: YDX-2147483647 <73375426+YDX-2147483647@users.noreply.github.com>

laurmaedje added waiting-on-review This PR is waiting to be reviewed. and removed waiting-on-author Pull request waits on author labels Mar 11, 2026

Merge branch 'main' into locl-script

65da864

YDX-2147483647 mentioned this pull request Mar 14, 2026

tracking is not processed correctly for Japanese #7976

Open

1 task

Merge branch 'main' into locl-script

edd8dcf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass the inferred writing script to HarfBuzz, making `locl` effective#7415

Pass the inferred writing script to HarfBuzz, making `locl` effective#7415
YDX-2147483647 wants to merge 8 commits intotypst:mainfrom
YDX-2147483647:locl-script

YDX-2147483647 commented Nov 19, 2025 •

edited

Loading

Uh oh!

laurmaedje Feb 10, 2026

Uh oh!

laurmaedje Feb 10, 2026 •

edited

Loading

Uh oh!

YDX-2147483647 Mar 11, 2026

Uh oh!

khaledhosny Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

YDX-2147483647 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updating Noto CJK in dev assets

Uh oh!

laurmaedje Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

laurmaedje Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YDX-2147483647 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

khaledhosny Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YDX-2147483647 commented Nov 19, 2025 •

edited

Loading

laurmaedje Feb 10, 2026 •

edited

Loading