Skip to content

More robust detection of subs / sups font support#7258

Closed
MDLC01 wants to merge 3 commits intotypst:mainfrom
MDLC01:sups-subs-coverage
Closed

More robust detection of subs / sups font support#7258
MDLC01 wants to merge 3 commits intotypst:mainfrom
MDLC01:sups-subs-coverage

Conversation

@MDLC01
Copy link
Collaborator

@MDLC01 MDLC01 commented Oct 29, 2025

Fixes #7249. Requires typst/typst-dev-assets#17 for a regression test.

As explained in #5777 (comment) and #5777 (comment), the issue was that we only considered the first subtable from the first lookup when testing for coverage.

As suggested in #5777 (comment), I now check for coverage given a set of glyphs instead of testing for each individual codepoint in the string. I reorganized the code a bit to implement this change. The drawback is that we now have to shape before testing for coverage, in which case we shape a second time. In the future, if we decide to implement synthesized smallcaps similarly, we will shape the same text three times if the font defines both sups and smcp and the user writes #super[smallcaps[...]].

#5777 (comment) also suggested that we should take the script and language into account. I'm not sure how to do that when testing for coverage. However, we do it when reshaping because we just shape from scratch.

I would appreciate a review from @khaledhosny because I'm still learning about all this so I can't say for sure I didn't miss something.

Before/after comparison

#set text(font: "Source Serif 4")
#set super(typographic: true)
Foo#super[test] \
Foo#super[test1] \
Foo#super[(test)] \
// Source Serif 4 does not support `sups` for backticks, so this should be synthesized.
Foo#super[test\`]
Before After
image image

Comment on lines +920 to +949
// Now that we know which glyphs we use, test whether the font provides
// `subs`/`sups` version if needed.
let mut script_shift = Em::zero();
let mut script_compensation = Em::zero();
let mut scale = Em::one();
if let Some(settings) = ctx.shift_settings {
if settings.typographic
&& are_glyphs_covered(&buffer, &font, settings.kind.feature())
{
// We can use the font feature :)
// Shape again, with the feature enabled.
ctx.features.push(Feature::new(settings.kind.feature(), 1, ..));
buffer = shape_segment_with_font(
buffer.clear(),
text,
language(ctx.styles),
script,
dir,
&font,
&ctx.features,
);
ctx.features.pop();
} else {
// We have to synthesize the glyphs :(
let script_metrics = settings.kind.read_metrics(font.metrics());
script_shift = settings.shift.unwrap_or(script_metrics.vertical_offset);
script_compensation = script_metrics.horizontal_offset;
scale = settings.size.unwrap_or(script_metrics.height);
}
};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt like this was now small enough to be inlined here, but I can extract it back into a function if wanted.

.flat_map(|lookup_index| {
gsub.lookups
.get(lookup_index)
.expect("font provided lookup index should be valid")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure whether expect can fail if the font is malformed, or if that would have been caught earlier (maybe when parsing the font).

I can look into it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can definitely panic. Font parsing is very lazy with ttf-parser.

Comment on lines +1151 to +1154
// The cast should always work as per `glyph_id`'s doc comment.
// I'm surprised `hb_glyph_info_t::as_glyph`, which performs this exact
// cast, isn't public...
.all(|g| is_covered(GlyphId(g.glyph_id as u16)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that I have to do this cast manually makes me not-so-confident that this is right.

}

/// Tests whether a feature covers all the glyphs of a buffer for a font.
fn are_glyphs_covered(glyphs: &GlyphBuffer, font: &Font, feature: Tag) -> bool {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still does not take script and language into account.

let mut scale = Em::one();
if let Some(settings) = ctx.shift_settings {
if settings.typographic
&& are_glyphs_covered(&buffer, &font, settings.kind.feature())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it more. This approach might make other things worse. For example if “fi” is used as superscript, or if smcp or onum features are enabled. With the new code, the glyphs will be reported uncovered (and rightly so) by sups feature, however in practice the sups feature actually works in these cases because the sups lookups come first in the font before liga, smcp and onum lookups, so when the feature is activated its substitution is applied first. But this depends on the font, and nothing prevents the font from having sups lookups later.

A possibly more robust solution is to actually shape the text with the sups/subs feature once enabled and once disabled, and if all the glyph IDs are different then use the feature, otherwise do the fallback. This has the advantage of checking what the font is actually doing (with all the complexities of OpenType script, language, and lookup order interactions) instead of second-guessing it, but it might be significantly slower (needs to be measured).

Alternatively, you can go back to checking coverage of unshaped glyphs as it seems to work better in this case, until someone hits a font where it doesn’t.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the solution you proposed in your second paragraph can still cause an issue in the following case. If a font provides:

  • A glyph A for Latin Small Letter I without sups.
  • A glyph B for Superscript Latin Small Letter I without sups, and the same glyph B for Latin Small Letter I with sups.
  • A glyph C for Superscript Latin Small Letter I with sups.

When we want to shape "iⁱ" in superscript (Latin Small Letter I followed by Superscript Latin Small Letter I), the following happens:

  • We shape it without sups and get the glyphs A B.
  • We shape it with sups and get the glyphs B C.

There is a glyph in common, so we reject, but actually the font fully supported that.

In this case, it might look like we can compare only the glyphs with the same indices in their respective sequences, but in general, shaping with and without sups may yield a different number of glyphs, so we have to compare all pair of glyphs.

Maybe as you mentioned it would be better to just check coverage of unshaped glyphs.

An alternate solution when we want to shape superscript text with a set of features F might be the following:

  • Find which features F_before have their lookups before sups in the font.
  • Shape with F_before.
  • Test for coverage for sups; if sups is fully covered at this point:
    • Shape with F ∪ {"sups"}.
  • Otherwise:
    • Shape with F and synthesize the superscripts after.

Does this seem like a reasonable approach to you?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the solution you proposed in your second paragraph can still cause an issue in the following case. If a font provides:

  • A glyph A for Latin Small Letter I without sups.

  • A glyph B for Superscript Latin Small Letter I without sups, and the same glyph B for Latin Small Letter I with sups.

  • A glyph C for Superscript Latin Small Letter I with sups.

When we want to shape "iⁱ" in superscript (Latin Small Letter I followed by Superscript Latin Small Letter I), the following happens:

  • We shape it without sups and get the glyphs A B.

  • We shape it with sups and get the glyphs B C.

There is a glyph in common, so we reject, but actually the font fully supported that.

In this case, it might look like we can compare only the glyphs with the same indices in their respective sequences, but in general, shaping with and without sups may yield a different number of glyphs, so we have to compare all pair of glyphs.

My suggestion was not to compare the set of glyphs in each case (that obviously wouldn’t work), but to compare the output of each code point. This can be tricky, as you say, but not impossible. One would need to take clustering into account when doing the comparison (e.g. segment each buffer based on HarfBuzz clusters, and compare these).

An alternate solution when we want to shape superscript text with a set of features F might be the following:

  • Find which features F_before have their lookups before sups in the font.

  • Shape with F_before.

  • Test for coverage for sups; if sups is fully covered at this point:

    • Shape with F ∪ {"sups"}.
  • Otherwise:

    • Shape with F and synthesize the superscripts after.

Does this seem like a reasonable approach to you?

The order in which lookups are applied is not only determined by the font. The shaper have some features that gets their lookups applied in specific order (depends on the script). It would be a non trivial logic to replicate.

@khaledhosny
Copy link

#5777 (comment) also suggested that we should take the script and language into account. I'm not sure how to do that when testing for coverage. However, we do it when reshaping because we just shape from scratch.

You would need to get the Script from gsub.scripts (falling back to DFLT script), then get the Language from Script.languages (falling back to Script.default_language), then get the feature indices from Language.feature_indices and locate the feature with the requested feature tag.

@laurmaedje laurmaedje added text Related to the text category, which is all about text handling, shaping, etc. fix A bug fix. labels Nov 7, 2025
@laurmaedje laurmaedje added the waiting-on-author Pull request waits on author label Nov 17, 2025
@laurmaedje laurmaedje changed the title Consider all lookups and subtables when testing for feature coverage More robust detection of subs / sups font support Nov 25, 2025
@laurmaedje
Copy link
Member

I renamed this and opened #7462 because this PR is (1) not complete yet and (2) anyways a bit too large and dangerous for inclusion in a patch release, so I'd wanted to have a more minimal fix that can be backported either way. We can then carefully do the comprehensive fix here.

@laurmaedje
Copy link
Member

I'll close this for now due to inactivity. If you end up getting back to this, I'm happy to reopen or have a new PR!

@laurmaedje laurmaedje closed this Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix A bug fix. text Related to the text category, which is all about text handling, shaping, etc. waiting-on-author Pull request waits on author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Typst uses fake superscripts for some characters in Source Serif 4 despite available typographic glyphs

3 participants