More robust detection of `subs` / `sups` font support#7258

Closed

MDLC01 wants to merge 3 commits intotypst:mainfrom

MDLC01:sups-subs-coverage

Collaborator

MDLC01 commented Oct 29, 2025 •

edited

Loading

Fixes #7249. Requires typst/typst-dev-assets#17 for a regression test.

As explained in #5777 (comment) and #5777 (comment), the issue was that we only considered the first subtable from the first lookup when testing for coverage.

As suggested in #5777 (comment), I now check for coverage given a set of glyphs instead of testing for each individual codepoint in the string. I reorganized the code a bit to implement this change. The drawback is that we now have to shape before testing for coverage, in which case we shape a second time. In the future, if we decide to implement synthesized smallcaps similarly, we will shape the same text three times if the font defines both sups and smcp and the user writes #super[smallcaps[...]].

#5777 (comment) also suggested that we should take the script and language into account. I'm not sure how to do that when testing for coverage. However, we do it when reshaping because we just shape from scratch.

I would appreciate a review from @khaledhosny because I'm still learning about all this so I can't say for sure I didn't miss something.

Before/after comparison

#set text(font: "Source Serif 4")
#set super(typographic: true)
Foo#super[test] \
Foo#super[test1] \
Foo#super[(test)] \
// Source Serif 4 does not support `sups` for backticks, so this should be synthesized.
Foo#super[test\`]

Before	After

MDLC01 added 2 commits

October 29, 2025 21:20


          Consider all lookups and subtables when testing for feature coverage

ea1a5c2


          Add test

da3be43

MDLC01 commented

View reviewed changes

crates/typst-layout/src/inline/shaping.rs

Comment on lines +920 to +949

+                  // Now that we know which glyphs we use, test whether the font provides
+                  // `subs`/`sups` version if needed.
+                  let mut script_shift = Em::zero();
+                  let mut script_compensation = Em::zero();
+                  let mut scale = Em::one();
+                  if let Some(settings) = ctx.shift_settings {
+                      if settings.typographic
+                          && are_glyphs_covered(&buffer, &font, settings.kind.feature())
+                      {
+                          // We can use the font feature :)
+                          // Shape again, with the feature enabled.
+                          ctx.features.push(Feature::new(settings.kind.feature(), 1, ..));
+                          buffer = shape_segment_with_font(
+                              buffer.clear(),
+                              text,
+                              language(ctx.styles),
+                              script,
+                              dir,
+                              &font,
+                              &ctx.features,
+                          );
+                          ctx.features.pop();
+                      } else {
+                          // We have to synthesize the glyphs :(
+                          let script_metrics = settings.kind.read_metrics(font.metrics());
+                          script_shift = settings.shift.unwrap_or(script_metrics.vertical_offset);
+                          script_compensation = script_metrics.horizontal_offset;
+                          scale = settings.size.unwrap_or(script_metrics.height);
+                      }
+                  };

Collaborator Author

MDLC01 Oct 29, 2025

I felt like this was now small enough to be inlined here, but I can extract it back into a function if wanted.

crates/typst-layout/src/inline/shaping.rs

+                          .flat_map(|lookup_index| {
+                              gsub.lookups
+                                  .get(lookup_index)
+                                  .expect("font provided lookup index should be valid")

Collaborator Author

MDLC01 Oct 29, 2025

I'm unsure whether expect can fail if the font is malformed, or if that would have been caught earlier (maybe when parsing the font).

I can look into it later.

Member

laurmaedje Nov 25, 2025

It can definitely panic. Font parsing is very lazy with ttf-parser.

crates/typst-layout/src/inline/shaping.rs

Comment on lines +1151 to +1154

+                      // The cast should always work as per `glyph_id`'s doc comment.
+                      // I'm surprised `hb_glyph_info_t::as_glyph`, which performs this exact
+                      // cast, isn't public...
+                      .all(|g| is_covered(GlyphId(g.glyph_id as u16)))

Collaborator Author

MDLC01 Oct 29, 2025

The fact that I have to do this cast manually makes me not-so-confident that this is right.


          Fix typo

ed53b19

MDLC01 mentioned this pull request

Add Source Serif 4 typst/typst-dev-assets#17

Merged

khaledhosny reviewed

View reviewed changes

crates/typst-layout/src/inline/shaping.rs

+              }
+              /// Tests whether a feature covers all the glyphs of a buffer for a font.
+              fn are_glyphs_covered(glyphs: &GlyphBuffer, font: &Font, feature: Tag) -> bool {

khaledhosny Oct 30, 2025

This still does not take script and language into account.

crates/typst-layout/src/inline/shaping.rs

+                  let mut scale = Em::one();
+                  if let Some(settings) = ctx.shift_settings {
+                      if settings.typographic
+                          && are_glyphs_covered(&buffer, &font, settings.kind.feature())

khaledhosny Oct 30, 2025

Thinking about it more. This approach might make other things worse. For example if “fi” is used as superscript, or if smcp or onum features are enabled. With the new code, the glyphs will be reported uncovered (and rightly so) by sups feature, however in practice the sups feature actually works in these cases because the sups lookups come first in the font before liga, smcp and onum lookups, so when the feature is activated its substitution is applied first. But this depends on the font, and nothing prevents the font from having sups lookups later.

A possibly more robust solution is to actually shape the text with the sups/subs feature once enabled and once disabled, and if all the glyph IDs are different then use the feature, otherwise do the fallback. This has the advantage of checking what the font is actually doing (with all the complexities of OpenType script, language, and lookup order interactions) instead of second-guessing it, but it might be significantly slower (needs to be measured).

Alternatively, you can go back to checking coverage of unshaped glyphs as it seems to work better in this case, until someone hits a font where it doesn’t.

Collaborator Author

MDLC01 Oct 30, 2025

I think the solution you proposed in your second paragraph can still cause an issue in the following case. If a font provides:

A glyph A for Latin Small Letter I without sups.
A glyph B for Superscript Latin Small Letter I without sups, and the same glyph B for Latin Small Letter I with sups.
A glyph C for Superscript Latin Small Letter I with sups.

When we want to shape "iⁱ" in superscript (Latin Small Letter I followed by Superscript Latin Small Letter I), the following happens:

We shape it without sups and get the glyphs A B.
We shape it with sups and get the glyphs B C.

There is a glyph in common, so we reject, but actually the font fully supported that.

In this case, it might look like we can compare only the glyphs with the same indices in their respective sequences, but in general, shaping with and without sups may yield a different number of glyphs, so we have to compare all pair of glyphs.

Maybe as you mentioned it would be better to just check coverage of unshaped glyphs.

An alternate solution when we want to shape superscript text with a set of features F might be the following:

Find which features F_before have their lookups before sups in the font.
Shape with F_before.
Test for coverage for sups; if sups is fully covered at this point:
- Shape with F ∪ {"sups"}.
Otherwise:
- Shape with F and synthesize the superscripts after.

Does this seem like a reasonable approach to you?

khaledhosny Nov 2, 2025

I think the solution you proposed in your second paragraph can still cause an issue in the following case. If a font provides:

A glyph A for Latin Small Letter I without sups.

A glyph B for Superscript Latin Small Letter I without sups, and the same glyph B for Latin Small Letter I with sups.

A glyph C for Superscript Latin Small Letter I with sups.

When we want to shape "iⁱ" in superscript (Latin Small Letter I followed by Superscript Latin Small Letter I), the following happens:

We shape it without sups and get the glyphs A B.

We shape it with sups and get the glyphs B C.

There is a glyph in common, so we reject, but actually the font fully supported that.

In this case, it might look like we can compare only the glyphs with the same indices in their respective sequences, but in general, shaping with and without sups may yield a different number of glyphs, so we have to compare all pair of glyphs.

My suggestion was not to compare the set of glyphs in each case (that obviously wouldn’t work), but to compare the output of each code point. This can be tricky, as you say, but not impossible. One would need to take clustering into account when doing the comparison (e.g. segment each buffer based on HarfBuzz clusters, and compare these).

An alternate solution when we want to shape superscript text with a set of features F might be the following:

Find which features F_before have their lookups before sups in the font.

Shape with F_before.

Test for coverage for sups; if sups is fully covered at this point:

Shape with F ∪ {"sups"}.

Otherwise:

Shape with F and synthesize the superscripts after.

Does this seem like a reasonable approach to you?

The order in which lookups are applied is not only determined by the font. The shaper have some features that gets their lookups applied in specific order (depends on the script). It would be a non trivial logic to replicate.

khaledhosny commented Nov 2, 2025

#5777 (comment) also suggested that we should take the script and language into account. I'm not sure how to do that when testing for coverage. However, we do it when reshaping because we just shape from scratch.

You would need to get the Script from gsub.scripts (falling back to DFLT script), then get the Language from Script.languages (falling back to Script.default_language), then get the feature indices from Language.feature_indices and locate the feature with the requested feature tag.

laurmaedje added text fix labels

laurmaedje added the waiting-on-author label

laurmaedje changed the title ~~Consider all lookups and subtables when testing for feature coverage~~ More robust detection of subs / sups font support

laurmaedje mentioned this pull request

Consider all lookups and subtables in subs / sups coverage checks #7462

Merged

Member

laurmaedje commented Nov 25, 2025

I renamed this and opened #7462 because this PR is (1) not complete yet and (2) anyways a bit too large and dangerous for inclusion in a patch release, so I'd wanted to have a more minimal fix that can be backported either way. We can then carefully do the comprehensive fix here.

Member

laurmaedje commented Feb 9, 2026

I'll close this for now due to inactivity. If you end up getting back to this, I'm happy to reopen or have a new PR!

laurmaedje closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix text waiting-on-author