Update ICU dependencies by mohtakver · Pull Request #7412 · typst/typst

mohtakver · 2025-11-18T22:03:50Z

This PR updates the various ICU4X dependencies from version 1.4 to 2.1.1. This change will fix issue #1920:

#lorem(14) A test » \  
#lorem(14) A test !

I would not be surprised if it fixes other things as well given such a big version jump.

In this PR I opted to use the built-in compiled ICU data instead of the the runtime-loaded data from typst-assets. Correct me if I'm wrong, but the icu4x-datagen commands used to generate the data in that crate don't generate anything different from the built-in stuff, e.g. both cover all locales. Using the compiled data results in cleaner code, faster runtime, and importantly, more up to date CLDR language support.

eltos · 2026-01-05T10:09:22Z

Here is a test specifically for #1920 for you to cherry-pick @mohtakver, in case you find it's useful: 3539ef3

laurmaedje · 2026-01-07T20:22:45Z

The current blob data in typst-assets uses a patched ICU data generator. See #1009 and https://github.com/typst/typst-assets/blob/0bbb3f7ebe775b980a3ca4ae9c49d9ecdb7e021a/src/lib.rs#L14-L45 for background. I'm honestly not sure whether there was an attempt to upstream the patch in peng1999/icu4x@b9beb6c.

I fear that, as-is, this may regress Typst's CJK handling. Two CJK test references have been changed in this PR, which is probably related to that.

laurmaedje · 2026-01-07T20:24:21Z

It would also be worth investigating how this affects Typst's binary size. I'm not sure how exactly the blob data provider avoids bloat (if at all). The data generation commands inspect the built binary to determine exactly what it references. It's possible that the blob data provider simply relies on dead code elimination, but I wouldn't bet on that working perfectly.

laurmaedje · 2026-01-08T14:08:19Z

cc @peng1999 if you have the time, your input would be appreciated on the CJK topic. :)

peng1999 · 2026-01-08T15:04:30Z

I fear that, as-is, this may regress Typst's CJK handling. Two CJK test references have been changed in this PR, which is probably related to that.

Yes, that is a regression. The two tests shouldn’t be modified.

I'm honestly not sure whether there was an attempt to upstream the patch in peng1999/icu4x@b9beb6c.

See #1355 (comment) (you should expand the resolved review to show the comment). I think icu4x 2.0 supports locale in segmenter, so patched datagen is no longer needed.

YDX-2147483647 · 2026-01-24T07:39:38Z

I think icu4x 2.0 supports locale in segmenter, so patched datagen is no longer needed.

I created a minimal example of using icu_segmenter 2.1.2 in #6774 (comment).
Even if LineBreakOptions's content_locale is set to zh, the segementer without patch is still problematic. Specifically, ：“ should be allowed to break into ：\n“, but not.

So, what should we do now?
Request changes in icu4x, or update the patch to icu4x 2.1?

peng1999 · 2026-01-24T11:42:39Z

related issue: unicode-org/icu4x#5595

laurmaedje · 2026-02-09T11:12:06Z

So, what should we do now?
Request changes in icu4x, or update the patch to icu4x 2.1?

I think ideally both. The first for the long-term and the second to unblock us.

I'll close this PR since OP does not seem to be involved anymore.

@YDX-2147483647 If you'd be interested in picking this up, then feel free to! Otherwise, I will try to take a look sometime soon-ish.

YDX-2147483647 · 2026-02-09T11:28:32Z

If you'd be interested in picking this up

The icu4x repo and its relationship with the Unicode line breaking algorithm are more complicated than I thought.
I haven't taken any substantive actions at present, and it's likely that I'm incapable of doing it in the near future.

laurmaedje · 2026-02-09T11:51:17Z

Sure, no problem. Thanks for the quick response!

mohtakver added 3 commits November 19, 2025 10:17

Update ICU dependencies

2c3c476

Remove redundant closures

c90f58c

Remove missed closure

b2c250d

laurmaedje added the waiting-on-review This PR is waiting to be reviewed. label Nov 19, 2025

laurmaedje added text Related to the text category, which is all about text handling, shaping, etc. dependencies Pull requests that update a dependency file labels Dec 4, 2025

MDLC01 mentioned this pull request Jan 4, 2026

Add test for line breaks for french guillemets #7652

Draft

laurmaedje added waiting-on-author Pull request waits on author and removed waiting-on-review This PR is waiting to be reviewed. labels Jan 7, 2026

laurmaedje mentioned this pull request Jan 8, 2026

Remove spaces from newlines between CJK characters #7350

Open

YDX-2147483647 mentioned this pull request Jan 24, 2026

In Chinese typsetting, Interpunct (U+00B7) should not appear at line start and should have consistent width #6774

Open

1 task

laurmaedje closed this Feb 9, 2026

laurmaedje mentioned this pull request Feb 9, 2026

Update to icu4x 2.x #7834

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update ICU dependencies#7412

Update ICU dependencies#7412
mohtakver wants to merge 3 commits intotypst:mainfrom
mohtakver:update-icu

mohtakver commented Nov 18, 2025

Uh oh!

eltos commented Jan 5, 2026

Uh oh!

laurmaedje commented Jan 7, 2026

Uh oh!

laurmaedje commented Jan 7, 2026

Uh oh!

laurmaedje commented Jan 8, 2026

Uh oh!

peng1999 commented Jan 8, 2026 •

edited

Loading

Uh oh!

YDX-2147483647 commented Jan 24, 2026 •

edited

Loading

Uh oh!

peng1999 commented Jan 24, 2026

Uh oh!

laurmaedje commented Feb 9, 2026

Uh oh!

YDX-2147483647 commented Feb 9, 2026

Uh oh!

laurmaedje commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

mohtakver commented Nov 18, 2025

Uh oh!

eltos commented Jan 5, 2026

Uh oh!

laurmaedje commented Jan 7, 2026

Uh oh!

laurmaedje commented Jan 7, 2026

Uh oh!

laurmaedje commented Jan 8, 2026

Uh oh!

peng1999 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YDX-2147483647 commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peng1999 commented Jan 24, 2026

Uh oh!

laurmaedje commented Feb 9, 2026

Uh oh!

YDX-2147483647 commented Feb 9, 2026

Uh oh!

laurmaedje commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

peng1999 commented Jan 8, 2026 •

edited

Loading

YDX-2147483647 commented Jan 24, 2026 •

edited

Loading