More robust detection of subs / sups font support#7258
More robust detection of subs / sups font support#7258MDLC01 wants to merge 3 commits intotypst:mainfrom
subs / sups font support#7258Conversation
| // Now that we know which glyphs we use, test whether the font provides | ||
| // `subs`/`sups` version if needed. | ||
| let mut script_shift = Em::zero(); | ||
| let mut script_compensation = Em::zero(); | ||
| let mut scale = Em::one(); | ||
| if let Some(settings) = ctx.shift_settings { | ||
| if settings.typographic | ||
| && are_glyphs_covered(&buffer, &font, settings.kind.feature()) | ||
| { | ||
| // We can use the font feature :) | ||
| // Shape again, with the feature enabled. | ||
| ctx.features.push(Feature::new(settings.kind.feature(), 1, ..)); | ||
| buffer = shape_segment_with_font( | ||
| buffer.clear(), | ||
| text, | ||
| language(ctx.styles), | ||
| script, | ||
| dir, | ||
| &font, | ||
| &ctx.features, | ||
| ); | ||
| ctx.features.pop(); | ||
| } else { | ||
| // We have to synthesize the glyphs :( | ||
| let script_metrics = settings.kind.read_metrics(font.metrics()); | ||
| script_shift = settings.shift.unwrap_or(script_metrics.vertical_offset); | ||
| script_compensation = script_metrics.horizontal_offset; | ||
| scale = settings.size.unwrap_or(script_metrics.height); | ||
| } | ||
| }; |
There was a problem hiding this comment.
I felt like this was now small enough to be inlined here, but I can extract it back into a function if wanted.
| .flat_map(|lookup_index| { | ||
| gsub.lookups | ||
| .get(lookup_index) | ||
| .expect("font provided lookup index should be valid") |
There was a problem hiding this comment.
I'm unsure whether expect can fail if the font is malformed, or if that would have been caught earlier (maybe when parsing the font).
I can look into it later.
There was a problem hiding this comment.
It can definitely panic. Font parsing is very lazy with ttf-parser.
| // The cast should always work as per `glyph_id`'s doc comment. | ||
| // I'm surprised `hb_glyph_info_t::as_glyph`, which performs this exact | ||
| // cast, isn't public... | ||
| .all(|g| is_covered(GlyphId(g.glyph_id as u16))) |
There was a problem hiding this comment.
The fact that I have to do this cast manually makes me not-so-confident that this is right.
| } | ||
|
|
||
| /// Tests whether a feature covers all the glyphs of a buffer for a font. | ||
| fn are_glyphs_covered(glyphs: &GlyphBuffer, font: &Font, feature: Tag) -> bool { |
There was a problem hiding this comment.
This still does not take script and language into account.
| let mut scale = Em::one(); | ||
| if let Some(settings) = ctx.shift_settings { | ||
| if settings.typographic | ||
| && are_glyphs_covered(&buffer, &font, settings.kind.feature()) |
There was a problem hiding this comment.
Thinking about it more. This approach might make other things worse. For example if “fi” is used as superscript, or if smcp or onum features are enabled. With the new code, the glyphs will be reported uncovered (and rightly so) by sups feature, however in practice the sups feature actually works in these cases because the sups lookups come first in the font before liga, smcp and onum lookups, so when the feature is activated its substitution is applied first. But this depends on the font, and nothing prevents the font from having sups lookups later.
A possibly more robust solution is to actually shape the text with the sups/subs feature once enabled and once disabled, and if all the glyph IDs are different then use the feature, otherwise do the fallback. This has the advantage of checking what the font is actually doing (with all the complexities of OpenType script, language, and lookup order interactions) instead of second-guessing it, but it might be significantly slower (needs to be measured).
Alternatively, you can go back to checking coverage of unshaped glyphs as it seems to work better in this case, until someone hits a font where it doesn’t.
There was a problem hiding this comment.
I think the solution you proposed in your second paragraph can still cause an issue in the following case. If a font provides:
- A glyph A for Latin Small Letter I without
sups. - A glyph B for Superscript Latin Small Letter I without
sups, and the same glyph B for Latin Small Letter I withsups. - A glyph C for Superscript Latin Small Letter I with
sups.
When we want to shape "iⁱ" in superscript (Latin Small Letter I followed by Superscript Latin Small Letter I), the following happens:
- We shape it without
supsand get the glyphs A B. - We shape it with
supsand get the glyphs B C.
There is a glyph in common, so we reject, but actually the font fully supported that.
In this case, it might look like we can compare only the glyphs with the same indices in their respective sequences, but in general, shaping with and without sups may yield a different number of glyphs, so we have to compare all pair of glyphs.
Maybe as you mentioned it would be better to just check coverage of unshaped glyphs.
An alternate solution when we want to shape superscript text with a set of features F might be the following:
- Find which features
F_beforehave their lookups beforesupsin the font. - Shape with
F_before. - Test for coverage for
sups; ifsupsis fully covered at this point:- Shape with
F ∪ {"sups"}.
- Shape with
- Otherwise:
- Shape with
Fand synthesize the superscripts after.
- Shape with
Does this seem like a reasonable approach to you?
There was a problem hiding this comment.
I think the solution you proposed in your second paragraph can still cause an issue in the following case. If a font provides:
A glyph A for Latin Small Letter I without
sups.A glyph B for Superscript Latin Small Letter I without
sups, and the same glyph B for Latin Small Letter I withsups.A glyph C for Superscript Latin Small Letter I with
sups.When we want to shape
"iⁱ"in superscript (Latin Small Letter I followed by Superscript Latin Small Letter I), the following happens:
We shape it without
supsand get the glyphs A B.We shape it with
supsand get the glyphs B C.There is a glyph in common, so we reject, but actually the font fully supported that.
In this case, it might look like we can compare only the glyphs with the same indices in their respective sequences, but in general, shaping with and without
supsmay yield a different number of glyphs, so we have to compare all pair of glyphs.
My suggestion was not to compare the set of glyphs in each case (that obviously wouldn’t work), but to compare the output of each code point. This can be tricky, as you say, but not impossible. One would need to take clustering into account when doing the comparison (e.g. segment each buffer based on HarfBuzz clusters, and compare these).
An alternate solution when we want to shape superscript text with a set of features
Fmight be the following:
Find which features
F_beforehave their lookups beforesupsin the font.Shape with
F_before.Test for coverage for
sups; ifsupsis fully covered at this point:
- Shape with
F ∪ {"sups"}.Otherwise:
- Shape with
Fand synthesize the superscripts after.Does this seem like a reasonable approach to you?
The order in which lookups are applied is not only determined by the font. The shaper have some features that gets their lookups applied in specific order (depends on the script). It would be a non trivial logic to replicate.
You would need to get the |
subs / sups font support
|
I renamed this and opened #7462 because this PR is (1) not complete yet and (2) anyways a bit too large and dangerous for inclusion in a patch release, so I'd wanted to have a more minimal fix that can be backported either way. We can then carefully do the comprehensive fix here. |
|
I'll close this for now due to inactivity. If you end up getting back to this, I'm happy to reopen or have a new PR! |
Fixes #7249. Requires typst/typst-dev-assets#17 for a regression test.
As explained in #5777 (comment) and #5777 (comment), the issue was that we only considered the first subtable from the first lookup when testing for coverage.
As suggested in #5777 (comment), I now check for coverage given a set of glyphs instead of testing for each individual codepoint in the string. I reorganized the code a bit to implement this change. The drawback is that we now have to shape before testing for coverage, in which case we shape a second time. In the future, if we decide to implement synthesized smallcaps similarly, we will shape the same text three times if the font defines both
supsandsmcpand the user writes#super[smallcaps[...]].#5777 (comment) also suggested that we should take the script and language into account. I'm not sure how to do that when testing for coverage. However, we do it when reshaping because we just shape from scratch.
I would appreciate a review from @khaledhosny because I'm still learning about all this so I can't say for sure I didn't miss something.
Before/after comparison