Skip to content

Implement sophisticated CJK punctuation adjustment#954

Merged
laurmaedje merged 20 commits intotypst:mainfrom
peng1999:cjk-punctuation
May 11, 2023
Merged

Implement sophisticated CJK punctuation adjustment#954
laurmaedje merged 20 commits intotypst:mainfrom
peng1999:cjk-punctuation

Conversation

@peng1999
Copy link
Copy Markdown
Contributor

@peng1999 peng1999 commented Apr 24, 2023

Main changes:

  • Apply context- and language-aware adjustability calculation.
    Now the adjustability is stored in ShapedGlyph, and will be calculated after track_and_space calculation. Currently, A glyph has non-zero adjustability only if glyph.is_space() || glyph.is_cjk_adjustable() is true.

    For CJK punctuation, three adjustment is implemented:

    1. Basic adjustability for more punctuation, defined in ShapedGlyph::base_adjustability. This function behave differently in zh-HK and zh-TW because they use alternative punctuation style.
    2. Consecutive punctuation adjustment. This is calculated in calculate_adjustability
      Example figure (from w3c/clreq#221)
    3. Adjust punctuation at line start and line end. This is implemented in par::line.
  • Additionally, linebreak cost function is further improved. See this diff

  • Noto CJK font in assets is replaced by Language Specific OTC version to support different CJK regions. (See comments)

Related: #276

@laurmaedje
Copy link
Copy Markdown
Member

Thanks for your work on CJK support! The font is soo huge though :/ Due to git history we will then have 20MB + 40MB instead of already 20MB before. Not sure what to best do about it.

@szdytom
Copy link
Copy Markdown
Contributor

szdytom commented Apr 24, 2023

@laurmaedje
Copy link
Copy Markdown
Member

We used Git LFS before and it was horrible. Downloading before would work, but it would probably break packaging workflows (at least if we also do the same for fonts that are embedded into the binary) and complicate local development.

@peng1999
Copy link
Copy Markdown
Contributor Author

peng1999 commented Apr 24, 2023

Due to git history we will then have 20MB + 40MB instead of already 20MB before.

Why the size of .git directory matters here? Is that clone time that matters? In that case you can always use git clone --filter=tree:0 to achieve fast clone.

@szdytom
Copy link
Copy Markdown
Contributor

szdytom commented Apr 24, 2023

Maybe we can replace Noto Sans CJK with another font. Noto Sans CJK is one of the largest CJK font you can find, I mean it includes almost every characters and 80% of them are not used in real life.

@szdytom
Copy link
Copy Markdown
Contributor

szdytom commented Apr 24, 2023

Here is a subsetted Noto Sans with most commonly used ~3000 chars in Chinese(and English letter, ascii symbols), it is now 1.7Mb:

NotoSerifCJKsc-Regular.subset.otf.zip

@peng1999
Copy link
Copy Markdown
Contributor Author

Here is a subsetted Noto Sans with most commonly used ~3000 chars in Chinese(and English letter, ascii symbols), it is now 1.7Mb:

This is great! Do you also have a Traditional Chinese version?

@peng1999
Copy link
Copy Markdown
Contributor Author

I replaced the Noto CJK font with subset fonts from https://github.com/CodePlayer/webfont-noto. Now there are three CJK font for Simplified Chinese, Traditional Chinese and Japanese, and they are small! @szdytom Thank you for your idea!

@laurmaedje Does this seems right for you?

@laurmaedje
Copy link
Copy Markdown
Member

That's great! I'll review the code changes soon.

@XieJiSS
Copy link
Copy Markdown

XieJiSS commented Apr 25, 2023

EDIT: after looking at the changed files, seems like the font is only included for testing.

Would it be feasible to implement a font download script so that users can manually opt-in to a full version of Noto Sans CJK to render some uncommon CJK characters? This might be useful when the user has to use symbols that are not included in the stripped font.

Or if the user already has a full Noto Sans CJK font installed on his/her machine, would it be possible for typst to fallback to use the system-provided Noto Sans CJK font to render missing glyphs? Thanks!

@szdytom
Copy link
Copy Markdown
Contributor

szdytom commented Apr 25, 2023

Typst will automatically search for system fonts, so don't worry about that :)

@peng1999 peng1999 force-pushed the cjk-punctuation branch from 9f3653f to e2c9390 Compare May 3, 2023 11:45
@peng1999 peng1999 requested a review from laurmaedje May 3, 2023 12:45
@peng1999 peng1999 requested a review from laurmaedje May 4, 2023 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants