Correct handling of unbraced kerns followed by spaces.#751
Conversation
k4b7
left a comment
There was a problem hiding this comment.
Looks good. Small nit regarding how we consume spaces.
src/Parser.js
Outdated
| if (!optional && this.nextToken.text !== "{") { | ||
| res = this.parseRegexGroup( | ||
| /^[-+]? *(?:$|\d+|\d+\.\d*|\.\d*) *[a-z]{0,2}$/, "size"); | ||
| /^[-+]? *(?:$|\d+|\d+\.\d*|\.\d*) *[a-z]{0,2}\s*$/, "size"); |
There was a problem hiding this comment.
We use this.consumeSpaces(); in other places. It might be good to do the same here for consistency. It should be noted that consumeSpaces does not consume all whitespace, only plain old spaces. I wonder if it should be upgrade?
There was a problem hiding this comment.
There's no way to call consumeSpaces() here, this is before the tokens are created. It would need to be move into this.parseRegexGroup. And that would be a bigger project: ti would change the meaning. Please take commit as is.
There was a problem hiding this comment.
How will this \s* interact with newlines? I suppose KaTeX doesn't support the double newline -> \par conversion, but if it did, I suppose this regex might cause trouble...
There was a problem hiding this comment.
Agreed with @kohler on consumeSpaces -- that's only for when spaces have already been parsed by the tokenizer.
There was a problem hiding this comment.
To avoid further discussion on hypothetical future problems or inconsistencies, I will make the regex spaces only. :)
There was a problem hiding this comment.
And spaces-only is certainly more consistent with the rest of the regex! Thanks for pushing further.
There was a problem hiding this comment.
Actually, I think the behavior you've implemented with \s* is more consistent with the rest of KaTeX, which treats \n and spaces identically. (just did some tests on https://khan.github.io/KaTeX/ ) I think it would be confusing for \kern 1em\n... to fail.
There was a problem hiding this comment.
I would actually argue for \s* in both places. The following renders like x\kern 1em y in LaTeX.
x\kern
1em
yThere was a problem hiding this comment.
Ah, but whitespace conversion is already handled by KaTeX's lexer, which changes all [ \r\n\t]+ sequences to a single space character.
New commit has test making this clear.
There was a problem hiding this comment.
Ah, nice! (And that's where double-newline handling would go.)
| const muKern = "\\kern 1mu"; | ||
| const abKern1 = "a\\mkern1mub"; | ||
| const abKern2 = "a\\kern-1mub"; | ||
| const abKern3 = "a\\kern-1mu b"; |
There was a problem hiding this comment.
Thanks for filling out the tests. Only abKern3 would fail without the changes to Parser.js.
|
I didn't even realize that unbraced size arguments were supported by KaTeX! (because I always added extra spaces, it seems) Thanks for fixing this. |
Did not realize that `Parser.nextToken.text` can contain spaces (it can). Handle that.
07c527e to
922db7d
Compare
|
I approve this PR, but I'll leave it to @kevinbarabash to confirm and push the button. |
Example:
a\mkern-1mu bshould be parsed likea\mkern{-1mu}b. It wasn't, now it is.Did not realize that
Parser.nextToken.textcan contain spaces (it can). Handle that.This may allow, for instance, #727 to use the actual TeX definitions, rather than adding spurious braces.