Create VarElem type replacing TextElem in math (WIP)#1779
Create VarElem type replacing TextElem in math (WIP)#1779damaxwell wants to merge 9 commits intotypst:mainfrom
VarElem type replacing TextElem in math (WIP)#1779Conversation
|
No more text spoofing. The shaper can get properties from either text or var in a run. |
|
This is really great work! I don't want to rush the review here since the topic is very intricate and I really want to think through the problem and your solution. Since I'm currently quite busy working on web app issues, I have to postpone it a bit, but I'll try to get to reviewing it soon. |
|
This is amazing work! Thanks for the detailed explanation as well! As a side-note, please don't worry too much about the custom syntax. Considering it involves changes to the syntax (duh), and not to the layouting logic, I'm fairly sure this should only be implemented in a future, distant PR (assuming we get to a consensus about that). Having a functional |
cleanup `VarElem` interface, introduce `base-weight`
|
No worries about the delay reviewing this, @laurmaedje. I had noticed that you've been focussed lately on the web app side. This is a big change (if indeed something like it gets accepted) and should be examined carefullly. I've got a little more refining I want to do, and will document the updates with comments here. As for dedicated syntax, @PgBiel , I've come to the conclusion that there shouldn't be any. It's actually rare that you make a
It's only the last one that we don't have syntax for; we have It's not I could imagine a case to be made for syntax for Today's updateBug fix, so that
Some cleanup of the to get CM Math Book. It's the same as |
I think this is a fair point overall. As I see it, we should probably release it without the syntax when it's ready, and collect user feedback regarding usage of |
|
The last update made good on the principle that I'm happy with the parts of the PR that are solely about transferring back and forth between math and written text. The interface for this feels easy and natural. There is one major outstanding issue that needs to be addressed: regex. If it weren't for that, I'd be really happy with where this ended up. The issues (in the easiest case of
I think what's needed at this point is a concrete specification of how regex should work in the presence of two flavors of text, possibly with symbols that render to either depending on whether the context is math or not. What should be the rules? I don't have a solution in mind that's good enough that I'd want to implement it. One possibility is that bare strings and |
|
This reminds me of the problem that regex matches in raw/code blocks are also a problem that occurs from time to time. Also things like hyphenation affecting raw blocks. Is raw yet another flavor of text? One possible route would be regex flags for var/text/raw, and I guess normal strings would be just text, but I'm not sure I like it. Also |
|
Ok, so here's a potential approach. There are still things not to like, but this might be good enough for now.
Things I like:
Things I don't like:
While there is merit to I kinda like At any rate, if the principle going forward is that "math text and ordinary text have separate flavors, and regex is restricted to just one or the other", then I think this proposal allows it to happen with the most significant downside that simple var matching requires |
|
I'm closing this as there's still design work to do and PRs are not a great place to do it. It needs more quick discussion. I've opened a new topic in the contributors forum on Discord. |









This PR is a work in progress and is a partial implementation of option (2) of RFC #6, #1125 concerning text vs. math in formulas in an attempt to shake down what the core issues are. Sometimes you need to implement something to find out where the tricky parts lie. In this case, some of the (initial) hard stuff turns out to be considerations not discussed in #1125: symbols, and regex. The PR is a draft, and will likely just serve as guidance toward whatever the end solution is.
The core issue is that when displaying mathematics
Currently Typst has only one notion of text. In a formula environment, text consisting of a single character is treated very specially and corresponds to mathematical text. Runs of numbers have their own, analogous, special treatement as mathematical text, and everything else is essentially treated as ordinary text. Ordinary text is run through the same mechanism Typst uses to layout text in a paragraph, except that it's using a suboptimal math font to do ordinary text typesetting.
One natural attempt to remedy this is to introduce two different kinds of text that can appear simultaneously: ordinary and math. Given
$ x "w" $, we wantxto be math text and"w"to be ordinary, and the first thing to settle is which of these corresponds to the currently existingtextelement function. However, if the math text occupies the role oftext, I'm not sure how to get"w"to escape back to thetextsettings when the formula was started so that the document's text properties can be inherited (much less be altered on the fly).So it seems that the right thing to do is have
textmean the same thing inside and outside of a formula. This means there needs to be a different entity to represent mathematical text. This is option (2) of #1125, and the name of the entity for the sake of argument ismath.var.Basic design
The PR introduces two new element functions,
VarElemandSymbolElem, though onlyVarElemis exposed (asmath.var). When parsing a formula, material that used to parse asSyntaxKind::Textnow becomesSyntaxKind::MathVar; that would be singleton-characters and numbers. No other changes are made at the parser level and no syntax sugar is available for aVarElem.Upon evaluation, a
SyntaxKind::MathVarentity becomes aVarElem. Previously, raw symbol values (generated via thesymbolfunction, e.g.) used to evaluate directly asTextElem. This isn't an option anymore because mathematical symbols would then be forever stuck as text. The mushy solution landed upon is the introduction of theSymbolElem. When rendered into a document outside of a formula, it converts itself into aTextElem, and inside a formula it becomes aVarElem. Sounds reasonable enough, but see below.There were a number of places where
TextElems were inserted into formulas on the fly (e.g.dif). These seem to be all stamped out now and are allVarElems.When laying out a math formula, there are now two pathways for rendering text. When a
TextElemis encountered one path is chosen; for now it is similar to what Typst did before, except that there is no special casing for single characters or runs of numbers, and no math styling is applied to the characters to make them, e.g., fraktur. This is just text. Note, however, that text it is still laid out with the math font. When aVarElemis encountered, there is a separate pathway that accounts for large operators changing size, math styling to caligraphic or fraktur or bold, etc., and conversion to optical sized glyphs. Singleton characters still get special treatment so as not to break too much all at once, and all other runs of text are run through Typst's standard mechanism, except that the font's script is set tomathso that math features like optical sizing can be turned on.Even with this partial progress, there are some wins.
"a"vs"aa"doesn't bounce around between itialic or not.Regex
Ugh. I didn't see this one coming. Typst allows for regex-style show rules, and now there are two different kinds of text (three if you count the fact that symbols are a third entity that are neither fish nor fowl). In the current implementation, a regex pattern is neutral, and can be specified by a symbol, in which case the pattern is the character. When matching,
text,varor a symbol can all be matches. When matching on a symbol, it's just replaced. When matching on atextorvar, the remaining fragments have the same type as you started with.I have not kicked the tires on this too much. The tests pass. I saw some evidence of fragility (infinite recursion), but I think that some of this was already present. For example,
breaks on
mainalready.Tests
A number of reference images changed. Many are duplicates of the changes from pending PR #1774 (there has been some accidental mixing of CM Regular and Book faces by default in Typst). Some are things like newly optically sized numbers in exponents. I saw no degredations -- the changes are either very slight or are improvements.
Five
.typtest files needed small updates. The updates were all cases where a string (or something that became a string) needed to be wrapped invarso that it was laid out as math, not text. I didn't figure out how to get avarto construct from both a string and an int (I haven't wrapped my head around the cast system yet), so it only takes strings in this implementation. A couple of the changes to the.typfiles would have been smaller if something likevar(3)were possible, which presumably is easy enough to implement if you know what you are doing.Thoughts
Having two kinds of text adds model complexity. Sprinkling a few
vars around makes programming Typst just a bit harder. On the other hand, the cases where this was needed in the test files already had some sophistication.I don't like the solution in this implementation for symbols. They are the only thing that bridge the gap between
textandvarand they introduce all kinds of complexity. For example, consider$ [a #sym.sum b] $. Is the sum going to be rendered astextorvar? In fact, theaandbare text and the sum isvar. Ugh.The worst parts of this PR relate to the mixing of
textandvarvia symbols and via regex. Given this experience, I think a not awful path forward would be to just embrace the fact that there are two kinds of text, and there are along with them two kinds of symbols. Already, we haveemojiandsymwhich have little semantically in common. So introduce a math symbol as a primitive type. All thesymthings are one of these, and all theemojiare the other. Upon evaluation, thesymthings becomevarand theemojithings becometext(as symbols do currently). There is amath.symbolfunction to create math symbols, and it's entirely analogous the currentsymbolfunction, which would now createtextsymbols.Once this is committed to, for regex I think there is
textregex andvarregex. Not 100% on this.As for
var, it needs to grow up. It should become its own flavor oftextand wrap characters to render in an OpenType math font in the way thattextwraps characters to render in a usual OpenType font. Sovarhas some properties thattexthas (likesizeandfill-- both of which can be set toautowhich fall back to thetextvalues instead) and it doesn't have parameters that are irrelevant to math layout (looking at you,hyphenate) but in the course of time is the seat of things likecalandfrakandbb. With this in play, it would then be possible to do things like easily change the font used for caligraphic letters only.Some of the big work needed to make
vargrow up would be figure out how to repurpose the text rendering fromshaping.rsandpar.rsto the new setting. Maybe hijackingshape_rangeinpar.rswould suffice.Anyway, lots to consider here.