Conversation
|
Here is a test specifically for #1920 for you to cherry-pick @mohtakver, in case you find it's useful: 3539ef3 |
|
The current blob data in I fear that, as-is, this may regress Typst's CJK handling. Two CJK test references have been changed in this PR, which is probably related to that. |
|
It would also be worth investigating how this affects Typst's binary size. I'm not sure how exactly the blob data provider avoids bloat (if at all). The data generation commands inspect the built binary to determine exactly what it references. It's possible that the blob data provider simply relies on dead code elimination, but I wouldn't bet on that working perfectly. |
|
cc @peng1999 if you have the time, your input would be appreciated on the CJK topic. :) |
Yes, that is a regression. The two tests shouldn’t be modified.
See #1355 (comment) (you should expand the resolved review to show the comment). I think icu4x 2.0 supports locale in segmenter, so patched datagen is no longer needed. |
I created a minimal example of using So, what should we do now? |
|
related issue: unicode-org/icu4x#5595 |
I think ideally both. The first for the long-term and the second to unblock us. I'll close this PR since OP does not seem to be involved anymore. @YDX-2147483647 If you'd be interested in picking this up, then feel free to! Otherwise, I will try to take a look sometime soon-ish. |
The icu4x repo and its relationship with the Unicode line breaking algorithm are more complicated than I thought. |
|
Sure, no problem. Thanks for the quick response! |
This PR updates the various ICU4X dependencies from version 1.4 to 2.1.1. This change will fix issue #1920:
I would not be surprised if it fixes other things as well given such a big version jump.
In this PR I opted to use the built-in compiled ICU data instead of the the runtime-loaded data from
typst-assets. Correct me if I'm wrong, but theicu4x-datagencommands used to generate the data in that crate don't generate anything different from the built-in stuff, e.g. both cover all locales. Using the compiled data results in cleaner code, faster runtime, and importantly, more up to date CLDR language support.