Conversation
Based on pulldown-cmark#622 and copied from https://github.com/ollpu/pulldown-cmark/tree/alt-math. Co-authored-by: rhysd <lin90162@yahoo.co.jp>
|
Wow this spec is very complete ! And I cannot disagree with any of the decisions that were made. The only choice that seems pretty arbitrary is the name of the css classes ( |
|
Thanks, it looks nice. I think any decision should be compared against Pandoc and mdBook math extensions. In addition, the mdBook math code is written in Rust and it uses pulldoown-cmark, so it could be reused or be inspiring. I also think involving mdBook developers in this discussion would be very interesting, maybe opening an issue in the mdBook repository is the best manner of reaching them? |
|
I think one of the main points to discuss is wether or not pulldown_cmark will allow to go further and to compile math to html. Pros
Cons
My take on thisI wrote the comparison for the sake of it, but IMO the drawbacks outweigh the pros. |
|
If I understand correctly, if math parsing is implemented here, then |
|
@rambip It should be relatively easy to change how the math is rendered by Given that it can be done just as well in a different crate, I see no reason to support further compilation in |
I'd like to add that displaying math is something that has no ideal solution:
Tectonic is a self contained engine that could be used but its HTML generation is a bit tricky. So I too agree that generating a span for further processing is the way to go because other options would unnecessarily bloat the crate in cases where dependents want to go another route.
What about emitting |
|
Hey! I’m not using this project but do care about markdown. That’s how it works in I do also recommend for compiling to ( Ref |
Thank's for your advice On of the points where your view disagree with the spec is the following: $$
x=
1
$$This will not be rendered as math, as it is the case with a lot of tools like pandoc and stackedit |
|
Right, as there is no spec yet, all tools are different. Making a spec will always result in it being different from some tools. |
|
The main purpose of this project is parsing Markdown, with a simple HTML converter (to be transformed in a sub-crate) doing only basic things, so rendering Math elements is totally out of scope. It seems to not exist a standard manner of rendering Math from HTML, so this is a critical point to evaluate. Again, looking at mdBook and Pandoc is a good idea. Pandoc allows to choose the Math engine and the URL of the JS included in the HTML for some of them, but it seems too complex to be a valid strategy for pulldown-cmark. |
|
I have checked the mdBook code and it does nothing with Math elements, but just inserts the MathJaX JavaScript if it is enabled. Therefore, a possible solution would be generating the Math events (including support for the MathJaX syntax) and leaving them untouched in the HTML generator. Another crate using pulldown-cmark could use KaTeX or anything else to transform Math formulas into SVG or whatever. I have opened the issue rust-lang/mdBook#2222 in the mdBook repository to ask feedback from mdBook developers and stakeholders. |
|
I have now updated the PR description with a longer write-up. |
|
Great summary of what's been going on ! |
|
Hey folks! It's super cool to see math support being added! I come here from google/mdbook-i18n-helpers#105, which I believe is ultimately about how pulldown-cmark-to-cmark turns backslashes in the Markdown AST into Markdown text. So this made me look around and find this issue. I read Writing mathematical expressions and it seems to gloss over the handling of escaped characters and in particular over the handling of escaped backslashes? It mentions "escape" once and does not mention "backslash" at all 🙂 Worse, the GitHub page it says
Does "same line" refer to the source file here? I hope not — but I wanted to double check with the people writing the implementation of the spec here. So, how does backslash work inside |
|
Yes, the intent with the spec in this PR is that backslash escapes are also disabled inside math. I should maybe add some prose comparing how code span vs math contents are treated. (They are almost the same, except newlines are preserved and the surrounding spaces rule ( GitHub's implementation is known to be buggy, because it just uses post-processing for the
We don't want to replicate this.
This I think just refers to the fact that dollar characters sometimes need to be escaped to prevent them from being recognized as math. If they're surrounded by spaces though, that's not a problem, even on GitHub.
|
|
Sorry, accidentally pressed the close button. |
|
This spec looks great. I am really hoping that it can be merged in with an implementation since not having math is blocking for many applications. Thanks for the great work. |
|
Thanks for confirming! I think it would be important to explicitly state how escapes work: LaTeX math normally has a lot of
Ah, I was worried that they were referring to the lines in the source file. A single newline is ignored in an inline code span is ignored. So foo \` `x y z` barand foo \`
`x y z` barhas the same AST: without newline, with newline. In other words, I have to escape the backtick in both cases, regardless of how my source file is wrapped. I was afraid that GitHub made the escaping of a dollar sign depend on how the source file is wrapped. |
- Display math can no longer be empty (aligns with commonmark-hs) - Add test case $x$$$$$$$y$$ from raphlinus#734 - Add description about how content is handled compared to code spans: newlines, surrounding spaces, and pipes inside tables. (Almost) implemented in https://github.com/ollpu/pulldown-cmark/tree/alt-math
|
Hello. What problems should be solved to further this spec? And which PR (#622 or #734) should we expect to be merged? I'm trying to build up a mdBook preprocessor that leverages typst to generate SVG of math expression. (mdbook-typst) Thus, I only want to get the math expression and send it to typst. That is, I support this solution:
Maybe something like converting |
Before merging Math spec, the 0.10 version must be released. There are few remaining tasks for this release, but these are extremely busy weeks for maintainers, so it will no happen until the end of the month. This spec (and alternative considerations) and pull requests must be carefully reviewed before taking a decision, so some problems may be detected or maybe some of these pull requests will be merged without changes, but nothing is clear now. |
|
Hello there, I maintain mdBook-KaTeX. Here are my opinions:
How does this help mdBook-KaTeX: I would probably just replace our current rudimentary heuristics to find math blocks with Regards, |
Opening this to discuss a spec for
$math$.I intend to write about the rationale for the decisions made in the initial spec written in this PR soon. EDIT: This is now written below.
For reference, my implementation of this spec lives in #622 / rhysd#1
TODO
commonmark-hsdoes not accept empty display math$$$$. I suppose we don't need to either.Motivation
Support for embedding LaTeX math markup in documents has been discussed for a while. There are two types of math items in LaTeX, "inline mode" and "display mode". Display mode renders larger and is put (centered) on its own row by default.
Most syntaxes have to be recognized by the parser in the same phase as e.g. code spans are parsed. Pre- or post-processing is not sufficient, which is why support has to be implemented in
pulldown-cmarkitself. Consider these cases (written in the plain$...$syntax):Post-processing would not work properly for cases 1 and 2.
Case 3 highlights why pre-processing is problematic. You could try to evade this with heuristics, but not all cases can reasonably be covered that way.
General syntax
A great overview of the existing syntaxes can be found here.
The basic syntax proposed in this PR is
$...$for inline math,$$...$$for display math.The choice of syntax is (and has been) a topic of endless debate. My main arguments in favor of this syntax are that:
\[...\]is now preferred for display math in LaTeX.)The main argument against this syntax is that dollar characters may be used in existing documents where math isn't intended. This is mitigated somewhat by whitespace/flanking rules, discussed later.
It is also worth noting that math parsing will be optional in
pulldown-cmark, not enabled by default. Personally, making the syntax more complicated only to avoid some existing false positives seems fraught, because it solves a (mostly) temporary problem at the expense of making the feature permanently less convenient. Not to mention the existing markdown content written for other parsers that already uses this syntax.Alternatives
Some remarks on potential alternative syntaxes which I consider relevant bringing up here.
$`...`$for inline modeTo my understanding, this was initially used by GitLab. It is most prominently supported by GitHub and GitLab, though both now also support the
$...$syntax.This is the only syntax that can almost be implemented in post-processing only: detect parsed code spans surrounded by
$text.Almost, because CommonMark code spans don't preserve line breaks. TeX comments for example, are sensitive to line breaks.
(Escaping the dollars poses a minor problem, too. Currently the
pulldown-cmarkparser output does not distinguish between a freestanding$and an escaped\$.)I am not opposed to supporting this in addition to
$...$, but it could be done in an external crate.GitHub seems to refer to this as a cop-out for when the plain
$...$syntax doesn't work for whatever reason (they parse math by post-processing). If we do things properly, it should essentially never be necessary to use to$`...`$.```mathblocks for display mode.This is quite simple to do in another crate or by the crate user directly.
\(...\)for inline mode or\[...\]for display mode.These are both also supported in LaTeX, though
\(...\)isn't very commonly used. They clash quite badly with how CommonMark allows escaping punctuation, so I'm not fond of them within markdown.$$\n...\n$$for display mode, mimicking fenced code blocks.It is very common in LaTeX to write a display mode math item all on one line:
Also refer to the Display Is Inline section below.
Specifics
Given the choice to go with the
$...$and$$...$$syntax,commonmark-hs's math extension seems like a good place to loan specifics from. Some of it is described well in its spec as well.commonmark-hswith math support is available in Pandoc under thecommonmark+mathandcommonmark_xmodes.Surrounding rules for inline math
To avoid some false positives in inline math items, we require this:
Pandoc recently (?) transitioned to this slightly more refined version, which is also a viable option:
GitHub has some very inconsistent rules for this, which I haven't been able to fully decipher. Even if we aim for GitHub compatibility, I don't think we should replicate its rules exactly.
Display Is Inline
Because of how display math is rendered centered as a block, one might assume that it should also be parsed as a block level item.
However, it is common in LaTeX to write even display math within paragraphs, as part of a sentence.
TeX/LaTeX also does not support empty lines within display math, as they signify a paragraph change:
Therefore it actually makes sense to parse display mode math as inline items as well. This is also easier to implement: inline and display mode math can reuse much of the same parsing logic.
Display math has a tendency to be split across multiple lines more liberally than inline math, however. CommonMark has a principle that all syntax that indicates block structure has higher priority than inline items. This does mean that display math can be broken by certain syntax:
This can be annoying, but I think it's an acceptable tradeoff.
commonmark-hsdoes this, and it's not too surprising because it follows the same principle that applies to e.g. code spans as well.Pandoc's own markdown flavor (different from
commonmark-hs) has the best of both worlds: display math is an inline element, but also isn't preempted by block level syntax. Actually the same applies for code spans and inline math in the Pandoc flavor as well.If we were to only make display math special and parse it in the first phase, cases like these would break:
So this is a wholesale deal: either all of the constructs are higher priority than block-level indicators, or none of them are.
What about special casing? If a paragraph were to only consist of one display math item, it is parsed as such in the first phase. Other kinds of display math items are parsed in the second phase. This is reminiscent of how HTML blocks work in CommonMark. The difference in behavior depending on subtle newline changes isn't so nice with this approach though, and HTML blocks already cause enough confusion.
Nested braces
commonmark-hsrecently also supports having$s within math items, as long as they are inside nested, balanced braces. (An escaped\$is not the same, because it would also be escaped in the LaTeX code, so rendered as a dollar symbol.)This is not proposed in this PR as it currently stands. I feel it adds complexity to the implementation for negligible gain. Dollars can generally be replaced by alternative syntax like
\( \) \[ \]. After seeing #734, I am on the fence about this.Output
The markdown parser only needs to recognize where there are math items. Any further processing or compilation is not in scope for this proposal. It should be relatively easy for a user of the crate to map the resulting
Item::Mathto whatever HTML they please.The default HTML rendering of math items is something we have to specify, however.
A common way to render LaTeX math in HTML is to do it on the browser side (MathJax/KaTeX). This is facilitated either by using heuristics to find
$...$/$$...$$etc., or by having<span>s with special classes. Since we've already parsed the math items, it doesn't make sense to emit them back without semantics only to be parsed by another set of rules again. Spans with classes are likely going to be what most users want.The choice of classes (
math inline,math display) is a little arbitrary though. For the spec, I copied what Pandoc does in the "KaTeX" math emitting mode. I have no strong opinions on this.