Skip to content

Add SymbolElem to improve math character styling and fix $"a"$ vs. $a$#5738

Merged
laurmaedje merged 5 commits intotypst:mainfrom
wrzian:symbolelem
Jan 23, 2025
Merged

Add SymbolElem to improve math character styling and fix $"a"$ vs. $a$#5738
laurmaedje merged 5 commits intotypst:mainfrom
wrzian:symbolelem

Conversation

@wrzian
Copy link
Contributor

@wrzian wrzian commented Jan 23, 2025

This PR is a compromise to close #274 without doing a full overhaul of how Typst handles math text. That overhaul is still needed, but will happen after a design is ready, which will take some time. This path was agreed on in the call that closed #4638 so that we can get some pragmatic fixes in even if they're temporary.

However this still required many changes, as SymbolElem splits all textual content into the two options of itself or TextElem. Getting all the tests to pass was rough, but they now do!

I've taken care to separate the changes into discrete chunks, and the test suite should pass on each of the commits.

(random fact: this PR also puts us over 2500 tests in the test suite 🎉)

Commits

1. MathText

Adds the new syntaxkind for math text. I've chosen to use an enum to represent the difference between numbers and characters in the AST.

2. SymbolElem

I added the initial SymbolElem with small hacks so it converts into a TextElem before any other realization (which are reverted in the two layout commits).

I kept SymbolElem separate from the Symbol value to make this PR easier, this required putting it in the outer symbols.rs file to avoid duplicate names (Repr). I don't think this is really the right location, but it's the easiest one for now.

I'm not sure what we want for the documentation of SymbolElem, I feel like it should be more of a hidden element that isn't exposed to the user for now, and the value type Symbol should be the main visible API.

3. Using SymbolElem

Here we make use of symbol elem in all the places it should be and I also update the casting rules for char so that symbols can be used in expected locations, such as matrix delimiters.

Notably, the evaluation rules for shorthands and escapes now generate content instead of values.

4. Realization and Markup layout

Here we undo the realization hack and start to layout symbol elems in markup mode.

This also includes a change to textual show rules: if you show a SymbolElem, it will act like a regex rule, but the synthesized text element that we pass into regex rule recipes has to be updated. If we just pass a TextElem to recipes for all regex rules, then show rules on characters in math will lose their italic styling.

Example:

#show regex("x"): set text(fill: red)
This $"x"$ should be upright, but this $x$ should be italic.

We now pass either a SymbolElem or a TextElem to textual show rule recipes based on these criteria:

  1. If the selector was a symbol, pass a SymbolElem
  2. If a text or regex selector matched a lone SymbolElem, pass a SymbolElem
  3. Otherwise, pass a TextElem

I think this is the most intuitive solution, but it does mean that show rules on text can no longer rely on comparing against the text element function. The content-fields-complex test exposes this difference and needed to be updated for this.

5. Math layout

Here we no longer apply auto-italics to TextElems, but still do for SymbolElem. That should be the only real change, but I also refactored most of the functions in text.rs in an attempt to improve readability. I'm not sure that I really succeeded, let me know if you have suggestions.

One particular edge case was that the stretch_fragment layout function in stretch.rs no longer works on character strings, because they now generate FrameFragments instead of GlyphFragments. I spent a while trying to find a solution. But I couldn't make it work without changing some aspect of the layout.

Luckily the workaround is easy: create a SymbolElem by calling symbol("X"), writing a character escape, or writing the character raw in math. This is the reason for the changes in the math-lr-color and math-stretch-shorthand tests.

Review

I've done a first pass review myself, but still left a few questions in the code as TODO comments that I would appreciate feedback on. I'll try to be responsive the next few days so we can get this merged.

Let's put #274 to rest!

@laurmaedje
Copy link
Member

Thank you for implementing this! 🎉

Notes

I chose the review style of refactoring your PR myself to see what I think should be kept/changed. You can view the end result of it in this branch. (It's all squashed since I'm no jj voodoo master 😅.) Here are the things I think should be changed:

  • In non-math realizations, the SymbolElem should be eagerly transformed into TextElem in the else branch of visit_math_rules. This way, we

    • can skip much of the regex matching complexity
    • don't need any special handling of symbols in inline layout
    • can remove SymbolElem from the grouping rules
  • I'm not really comfortable with adding a symbol selector. But I think we don't actually need it! If we transform SymbolElem to TextElem eagerly in non-math as mentioned above, the only way to regex match a SymbolElem is from find_regex_match_in_text directly on a SymbolElem. For that, your lone symbol check should suffice. And we can also remove the TextualKind stuff. On top, I suggest that we call the field of SymbolElem text so that we have minimal breakage in existing text show rules that will now get a symbol. Since SymbolElem is temporary, having the cleanest name doesn't matter much.

  • I'm not comfortable with a generic Content -> char cast, since it is a fundamentally lossy operation. The existing thing in AccentElem is a hack and I would prefer for it to stay isolated to it. If we just let escapes and shorthands stay Symbol instead of SymbolElem (keeping the return type of the Evals at Value rather than Content), from my testing, we don't need this change to fix Single letter strings get parsed as italicized in math mode #274.

  • You had to update issue-math-realize-scripting because of missing spaces. The reason for that is that joining content and symbol still turned the symbol into a TextElem. Can be fixed by updating fn join in foundations/ops.rs.

  • The SymbolElem can go into foundations/symbol.rs and we just spell out crate::foundations::Repr in the impl.

  • In two places &c.to_string() can be replaced by c.encode_utf8(&mut [0; 4]) to skip an allocation.

  • The refactor in layout/math/text.rs looks fine!

  • I think the intent of the content-fields-complex isn't really kept since it tries to evaluate a normal-looking equation, but now there are various [..] thingies. If we switch the field of SymbolElem to text, we can replace func == text with elem.has("text") to fix it completely within let compute (alongside two of your changes)

As said, the branch linked above contains the result I ended with, containing all these changes. If you disagree with some of my proposed changes, then I'm curious to hear your thoughts. If not, I don't mind whether one of us pushes the squashed changes to this PR or whether you incorporate them into your cleaner commit history. I assume the individual commits were mostly for review convenience, so maybe it's unnecessary work.

Responses

I'm not sure what we want for the documentation of SymbolElem, I feel like it should be more of a hidden element that isn't exposed to the user for now, and the value type Symbol should be the main visible API.

No documentation is necessary. Even though SymbolElem is observable in a few cases, let's consider it internal for now. (SequenceElem is way more important and also not yet documented.)

TODO: Can we get away with not checking the RTL/LTR direction?

Not relevant if we transform to TextElem eagerly.

TODO: Update traversal with std::ops::ControlFlow to exit early.

Seems like a nice change in general, but unrelated to the PR. :)

TODO: Should we try to improve these error messages? [In cast! for char]

I'm not sure what change you are envisioning, but also probably not tied to this PR.

TODO: Not sure if this should change. [In Repr impl]

Having SymbolElem's repr be just like TextElem's is good for now.

@wrzian
Copy link
Contributor Author

wrzian commented Jan 23, 2025

Thanks! I've incorporated all of your changes 👍

Can be fixed by updating fn join in foundations/ops.rs.

Aha!

&c.to_string() can be replaced by c.encode_utf8(&mut [0; 4]) to skip an allocation.

I thought there should be a way to do this! Couldn't come up with the right incantation 😁


I rebased onto main and used jj squash <path> --to <commit> to put your changes into the right commits (after making a new commit matching your files; jj new, jj restore --from <commit>).

I did make some changes, but only to comments! You can see them on this commit in my repository (which matches the rebased update exactly). I mostly added more context or fixed typos.

If those look good, I'm ready to merge.

@laurmaedje laurmaedje added this pull request to the merge queue Jan 23, 2025
@laurmaedje laurmaedje removed this pull request from the merge queue due to a manual request Jan 23, 2025
@laurmaedje laurmaedje merged commit 6fe1e20 into typst:main Jan 23, 2025
6 checks passed
@laurmaedje
Copy link
Member

Fast-forward merged locally to keep the good commits. :)

Thank you!!

@wrzian wrzian deleted the symbolelem branch January 24, 2025 00:40
@gasche
Copy link

gasche commented Jan 24, 2025

As a user affected by #274: very nice, thanks!

janekfleper added a commit to janekfleper/typst-fancy-units that referenced this pull request Feb 23, 2025
Typst v0.13 slightly reduced the widths of single alphabetic characters in attachments and fractions. In older versions of Typst the width would change between a single character and multiple characters. The current behavior is therefore more consistent!
The other cases have a slightly reduced padding between the base unit and the exponent. This is however barely noticeable compared to the first change.
I think these changes are related to typst/typst#274 and typst/typst#5738. While the output will be slightly different in older versions of Typst, this is overall not a huge issue.
@qguv
Copy link

qguv commented May 6, 2025

Hi @laurmaedje @wrzian! The description of this PR hints at an upcoming overhaul of how Typst handles math text. I couldn't find another obvious place where ideas are being collected for that overhaul, so I'll include a small note here. If there's a more appropriate place to put this, could you direct me to it? Thanks :)

The current math mode behavior of removing spaces between consecutive variables makes typesetting expressions difficult in contexts where juxtaposition indicates some common operation, e.g. function application in lambda calculus or logic, or multiplication in arithmetic. The current workarounds are quite cumbersome, so it would be really nice if the math text overhaul allowed this space-removal between consecutive variables to be turned on or off with a #set rule. Alternatively, if the new math mode is able to provide a type for variables in math mode (e.g. the type of $a$ and $b$ in the expression $a b$) which could be targeted by a #show rule, then the style could be adjusted, solving the problem. I think it would be wise to think about this already at the design stage, as it might be more difficult to fix if the new design calcifies the current behavior.

@laurmaedje
Copy link
Member

If there's a more appropriate place to put this, could you direct me to it?

Much of the development discussions happen on Discord. There's a dedicated thread for math there.

@gasche
Copy link

gasche commented May 6, 2025

(I think it's fairly unfortunate that many of the design discussions around Typst are enclosed on Discord. Various people cannot access it easily, and it is not easily searchable by web engines, does not have public archives, etc. This trove of content is going to be lost in a few years when Discord enshittifies in response to the next popular chat app, and the early history of Typst is going to be basically lost.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Single letter strings get parsed as italicized in math mode

4 participants