Skip to content

Update ICU dependencies#7412

Closed
mohtakver wants to merge 3 commits intotypst:mainfrom
mohtakver:update-icu
Closed

Update ICU dependencies#7412
mohtakver wants to merge 3 commits intotypst:mainfrom
mohtakver:update-icu

Conversation

@mohtakver
Copy link
Copy Markdown

This PR updates the various ICU4X dependencies from version 1.4 to 2.1.1. This change will fix issue #1920:

#lorem(14) A test » \  
#lorem(14) A test !
content

I would not be surprised if it fixes other things as well given such a big version jump.

In this PR I opted to use the built-in compiled ICU data instead of the the runtime-loaded data from typst-assets. Correct me if I'm wrong, but the icu4x-datagen commands used to generate the data in that crate don't generate anything different from the built-in stuff, e.g. both cover all locales. Using the compiled data results in cleaner code, faster runtime, and importantly, more up to date CLDR language support.

@laurmaedje laurmaedje added the waiting-on-review This PR is waiting to be reviewed. label Nov 19, 2025
@laurmaedje laurmaedje added text Related to the text category, which is all about text handling, shaping, etc. dependencies Pull requests that update a dependency file labels Dec 4, 2025
@eltos
Copy link
Copy Markdown
Contributor

eltos commented Jan 5, 2026

Here is a test specifically for #1920 for you to cherry-pick @mohtakver, in case you find it's useful: 3539ef3

@laurmaedje
Copy link
Copy Markdown
Member

The current blob data in typst-assets uses a patched ICU data generator. See #1009 and https://github.com/typst/typst-assets/blob/0bbb3f7ebe775b980a3ca4ae9c49d9ecdb7e021a/src/lib.rs#L14-L45 for background. I'm honestly not sure whether there was an attempt to upstream the patch in peng1999/icu4x@b9beb6c.

I fear that, as-is, this may regress Typst's CJK handling. Two CJK test references have been changed in this PR, which is probably related to that.

@laurmaedje laurmaedje added waiting-on-author Pull request waits on author and removed waiting-on-review This PR is waiting to be reviewed. labels Jan 7, 2026
@laurmaedje
Copy link
Copy Markdown
Member

It would also be worth investigating how this affects Typst's binary size. I'm not sure how exactly the blob data provider avoids bloat (if at all). The data generation commands inspect the built binary to determine exactly what it references. It's possible that the blob data provider simply relies on dead code elimination, but I wouldn't bet on that working perfectly.

@laurmaedje
Copy link
Copy Markdown
Member

cc @peng1999 if you have the time, your input would be appreciated on the CJK topic. :)

@peng1999
Copy link
Copy Markdown
Contributor

peng1999 commented Jan 8, 2026

I fear that, as-is, this may regress Typst's CJK handling. Two CJK test references have been changed in this PR, which is probably related to that.

Yes, that is a regression. The two tests shouldn’t be modified.

I'm honestly not sure whether there was an attempt to upstream the patch in peng1999/icu4x@b9beb6c.

See #1355 (comment) (you should expand the resolved review to show the comment). I think icu4x 2.0 supports locale in segmenter, so patched datagen is no longer needed.

@YDX-2147483647
Copy link
Copy Markdown
Contributor

YDX-2147483647 commented Jan 24, 2026

I think icu4x 2.0 supports locale in segmenter, so patched datagen is no longer needed.

I created a minimal example of using icu_segmenter 2.1.2 in #6774 (comment).
Even if LineBreakOptions's content_locale is set to zh, the segementer without patch is still problematic. Specifically, :“ should be allowed to break into :\n“, but not.

So, what should we do now?
Request changes in icu4x, or update the patch to icu4x 2.1?

@peng1999
Copy link
Copy Markdown
Contributor

related issue: unicode-org/icu4x#5595

@laurmaedje
Copy link
Copy Markdown
Member

So, what should we do now?
Request changes in icu4x, or update the patch to icu4x 2.1?

I think ideally both. The first for the long-term and the second to unblock us.

I'll close this PR since OP does not seem to be involved anymore.

@YDX-2147483647 If you'd be interested in picking this up, then feel free to! Otherwise, I will try to take a look sometime soon-ish.

@laurmaedje laurmaedje closed this Feb 9, 2026
@YDX-2147483647
Copy link
Copy Markdown
Contributor

If you'd be interested in picking this up

The icu4x repo and its relationship with the Unicode line breaking algorithm are more complicated than I thought.
I haven't taken any substantive actions at present, and it's likely that I'm incapable of doing it in the near future.

@laurmaedje
Copy link
Copy Markdown
Member

Sure, no problem. Thanks for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file text Related to the text category, which is all about text handling, shaping, etc. waiting-on-author Pull request waits on author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants