Skip to content

Correct handling of unbraced kerns followed by spaces.#751

Merged
k4b7 merged 1 commit intoKaTeX:masterfrom
kohler:fix-unbraced-kern
Jun 30, 2017
Merged

Correct handling of unbraced kerns followed by spaces.#751
k4b7 merged 1 commit intoKaTeX:masterfrom
kohler:fix-unbraced-kern

Conversation

@kohler
Copy link
Copy Markdown
Collaborator

@kohler kohler commented Jun 30, 2017

Example: a\mkern-1mu b should be parsed like a\mkern{-1mu}b. It wasn't, now it is.

Did not realize that Parser.nextToken.text can contain spaces (it can). Handle that.

This may allow, for instance, #727 to use the actual TeX definitions, rather than adding spurious braces.

Copy link
Copy Markdown
Member

@k4b7 k4b7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Small nit regarding how we consume spaces.

src/Parser.js Outdated
if (!optional && this.nextToken.text !== "{") {
res = this.parseRegexGroup(
/^[-+]? *(?:$|\d+|\d+\.\d*|\.\d*) *[a-z]{0,2}$/, "size");
/^[-+]? *(?:$|\d+|\d+\.\d*|\.\d*) *[a-z]{0,2}\s*$/, "size");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this.consumeSpaces(); in other places. It might be good to do the same here for consistency. It should be noted that consumeSpaces does not consume all whitespace, only plain old spaces. I wonder if it should be upgrade?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way to call consumeSpaces() here, this is before the tokens are created. It would need to be move into this.parseRegexGroup. And that would be a bigger project: ti would change the meaning. Please take commit as is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this \s* interact with newlines? I suppose KaTeX doesn't support the double newline -> \par conversion, but if it did, I suppose this regex might cause trouble...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @kohler on consumeSpaces -- that's only for when spaces have already been parsed by the tokenizer.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid further discussion on hypothetical future problems or inconsistencies, I will make the regex spaces only. :)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And spaces-only is certainly more consistent with the rest of the regex! Thanks for pushing further.

Copy link
Copy Markdown
Member

@edemaine edemaine Jun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the behavior you've implemented with \s* is more consistent with the rest of KaTeX, which treats \n and spaces identically. (just did some tests on https://khan.github.io/KaTeX/ ) I think it would be confusing for \kern 1em\n... to fail.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually argue for \s* in both places. The following renders like x\kern 1em y in LaTeX.

x\kern
1em
y

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but whitespace conversion is already handled by KaTeX's lexer, which changes all [ \r\n\t]+ sequences to a single space character.

New commit has test making this clear.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice! (And that's where double-newline handling would go.)

const muKern = "\\kern 1mu";
const abKern1 = "a\\mkern1mub";
const abKern2 = "a\\kern-1mub";
const abKern3 = "a\\kern-1mu b";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for filling out the tests. Only abKern3 would fail without the changes to Parser.js.

@k4b7 k4b7 self-assigned this Jun 30, 2017
@edemaine
Copy link
Copy Markdown
Member

I didn't even realize that unbraced size arguments were supported by KaTeX! (because I always added extra spaces, it seems) Thanks for fixing this.

Did not realize that `Parser.nextToken.text` can contain spaces
(it can). Handle that.
@kohler kohler force-pushed the fix-unbraced-kern branch from 07c527e to 922db7d Compare June 30, 2017 15:28
@edemaine
Copy link
Copy Markdown
Member

I approve this PR, but I'll leave it to @kevinbarabash to confirm and push the button.

@k4b7
Copy link
Copy Markdown
Member

k4b7 commented Jun 30, 2017

@kohler thanks for the changes. @edemaine thanks for looking this over. Always good to have a second pair of eyes.

@k4b7 k4b7 merged commit 87b9123 into KaTeX:master Jun 30, 2017
@kohler kohler deleted the fix-unbraced-kern branch June 30, 2017 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants