Implement sophisticated CJK punctuation adjustment#954
Implement sophisticated CJK punctuation adjustment#954laurmaedje merged 20 commits intotypst:mainfrom
Conversation
|
Thanks for your work on CJK support! The font is soo huge though :/ Due to git history we will then have 20MB + 40MB instead of already 20MB before. Not sure what to best do about it. |
|
use git LFS or auto download it in CI script. |
|
We used Git LFS before and it was horrible. Downloading before would work, but it would probably break packaging workflows (at least if we also do the same for fonts that are embedded into the binary) and complicate local development. |
Why the size of |
|
Maybe we can replace Noto Sans CJK with another font. Noto Sans CJK is one of the largest CJK font you can find, I mean it includes almost every characters and 80% of them are not used in real life. |
|
Here is a subsetted Noto Sans with most commonly used ~3000 chars in Chinese(and English letter, ascii symbols), it is now 1.7Mb: |
This is great! Do you also have a Traditional Chinese version? |
|
I replaced the Noto CJK font with subset fonts from https://github.com/CodePlayer/webfont-noto. Now there are three CJK font for Simplified Chinese, Traditional Chinese and Japanese, and they are small! @szdytom Thank you for your idea! @laurmaedje Does this seems right for you? |
|
That's great! I'll review the code changes soon. |
|
EDIT: after looking at the changed files, seems like the font is only included for testing. Would it be feasible to implement a font download script so that users can manually opt-in to a full version of Noto Sans CJK to render some uncommon CJK characters? This might be useful when the user has to use symbols that are not included in the stripped font. Or if the user already has a full Noto Sans CJK font installed on his/her machine, would it be possible for typst to fallback to use the system-provided Noto Sans CJK font to render missing glyphs? Thanks! |
|
Typst will automatically search for system fonts, so don't worry about that :) |
Main changes:
Apply context- and language-aware adjustability calculation.
Now the adjustability is stored in ShapedGlyph, and will be calculated after
track_and_spacecalculation. Currently, A glyph has non-zero adjustability only ifglyph.is_space() || glyph.is_cjk_adjustable()is true.For CJK punctuation, three adjustment is implemented:
ShapedGlyph::base_adjustability. This function behave differently in zh-HK and zh-TW because they use alternative punctuation style.calculate_adjustabilityExample figure (from w3c/clreq#221)
par::line.Additionally, linebreak cost function is further improved. See this diff
Noto CJK font in assets is replaced by Language Specific OTC version to support different CJK regions.(See comments)Related: #276