Add `raw` string group, move comment parsing to Parser, change URL group parser by ylemkimon · Pull Request #1711 · KaTeX/KaTeX

ylemkimon · 2018-09-12T08:07:50Z

This adds a raw string group parser (parseStringGroup with the last argument true), which returns the raw string inside a group and allows:

single character string without braces, e.g., \command x, \command#
nested braces, e.g., \command{a={[b,{c}],d}}
percent sign(%)
delegating argument group parser to individual functions: useful for Support \includegraphics #1620, Add \class and \cssId on non-strict mode #1437, Adding \data function to attach data attributes to HTML elements #1698

This also moves comment parsing to Parser, which makes possible to allow percent sign at the Parser level and contextless (settingless) Lexer.

Furthermore, this changes URL parser to use raw string group, which makes

Lexer and Parser much simpler, as it doesn't require defining special cases and duplicating function parsing code
possible to use URL argument without modifying the code, in other places like Support \includegraphics #1620

codecov · 2018-09-12T08:11:23Z

Codecov Report

Merging #1711 into master will decrease coverage by <.01%.
The diff coverage is 95.08%.

@@            Coverage Diff             @@
##           master    #1711      +/-   ##
==========================================
- Coverage   93.89%   93.89%   -0.01%     
==========================================
  Files          78       78              
  Lines        4569     4585      +16     
  Branches      805      811       +6     
==========================================
+ Hits         4290     4305      +15     
  Misses        246      246              
- Partials       33       34       +1

Flag	Coverage Δ
#screenshotter	`88.5% <32.78%> (-0.26%)`	⬇️
#test	`85.27% <95.08%> (+0.02%)`	⬆️

Impacted Files	Coverage Δ
src/parseNode.js	`84.21% <ø> (ø)`	⬆️
src/MacroExpander.js	`95.68% <100%> (ø)`	⬆️
src/Lexer.js	`100% <100%> (ø)`	⬆️
src/SourceLocation.js	`100% <100%> (ø)`	⬆️
src/Parser.js	`97.04% <94.64%> (-0.09%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba8e224...f17b571. Read the comment docs.

ronkok · 2018-09-15T15:24:27Z

I applaud this PR. I also think it will be possible to recover use of "%" as a URL escape character. I plan to write a PR which does that after this PR lands. My general intent is to:

Edit the Lexer RegEx so that it matches only a %, not %[^\n]*(?:\n|$).
Add a parseComment function to Parser.js to replace the functionality of the old RegEx pattern
Edit Parser's parseString function so that any non-URL groups that contain a % will throw an error. URL groups will not throw the error. That will enable the URL escape.

ylemkimon · 2018-09-15T15:46:15Z

@ronkok Nice idea! If you don't mind, I'd like give it a try.

Edit Parser's parseString function so that any non-URL groups that contain a % will throw an error. URL groups will not throw the error. That will enable the URL escape.

Instead of throwing an error, the LaTeX behavior seems to be continue parsing to the next line.

ronkok · 2018-09-15T15:53:01Z

I'd like give it a try.

Please, feel free.

LaTeX behavior seems to be continue parsing to the next line.

I haven't written any of this idea into code yet. But I want to keep the behavior that @edemaine created for comments. So in an expression like \frac{a}{b % s}, the group does not close and it should throw an error. That's my intent. Any way we get there would be okay with me.

ylemkimon · 2018-09-15T19:44:14Z

@ronkok Thank you for the suggestion! I’ve implemented it and it have made code much simpler.

k4b7

Requesting more tests for the new raw arg type.

k4b7 · 2018-09-22T23:32:07Z

src/Parser.js

+                        firstToken.range(lastToken, str));
+                case "%":
+                    if (!raw) { // allow % in raw string group
+                        this.consumeComment();


I guess that's kind of hard to do that lever level.

@kevinbarabash Yes, it's context-dependent.

k4b7 · 2018-09-22T23:35:08Z

src/Parser.js

+        this.expect(groupBegin);
        let str = "";
        const firstToken = this.nextToken;
+        let nested = 0; // allow nested braces in raw string group


If it's truly "raw" we might want to allow mismatched braces. In practice that's probably not all that useful so maybe add a comment describing the limitations of raw arg types.

@kevinbarabash If we allow unmatched braces, it's impossible to determine where the group ends, without forbidding a right brace.

k4b7 · 2018-09-22T23:35:32Z

src/Parser.js

-        if (optional && this.nextToken.text !== "[") {
-            return null;
+        const groupBegin = optional ? "[" : "{";
+        const groupEnd = optional ? "]" : "}";


k4b7 · 2018-09-22T23:41:53Z

src/Parser.js

+    consumeComment() {
+        // the newline character is normalized in Lexer, check original source
+        while (this.nextToken.text !== "EOF" && this.nextToken.loc &&
+                this.nextToken.loc.getSource().indexOf("\n") === -1) {


Why not this.nextToken.text.indexOf("\n") === -1?

@kevinbarabash Lexer normalizes whitespaces to a single space( ). I think it's better to look at the source, instead of separating whitespace token and newline token.

That makes sense. It might be worth adding a comment here about that.

k4b7 · 2018-09-22T23:44:43Z

src/types.js

 //                 argument is parsed normally)
 //   - Mode: Node group parsed in given mode.
-export type ArgType = "color" | "size" | "url" | "original" | Mode;
+export type ArgType = "color" | "size" | "url" | "raw" | "original" | Mode;


Now that we have raw, I wonder if we should move the parsing of color, size, and url into functions/*.js.

k4b7 · 2018-09-22T23:49:49Z

test/katex-spec.js

+        expect("\\kern{1 %kern\nem}").toParse();
+        expect("\\kern1 %kern\nem").toParse();
+        expect("\\color{#f00%red\n}").toParse();
+    });


We should probably have some tests for parsing raw arg types. It would be cool if we had a test that defines a function that accepts one raw arg and one optional raw arg then we could test that including [] or {} in the raw string works as expected.

@kevinbarabash Currently, it seems there is no way to define a function cleanly in the tests. I've updated existing test case below.

Maybe that's something we can do in the future. I've opened an issue for it #1740.

k4b7 · 2018-10-13T01:10:59Z

test/katex-spec.js


    it("should allow balanced braces in url", function() {
-        const url = "http://example.org/{too}";
+        const url = "http://example.org/{{}t{oo}}";


k4b7

Sorry for the delay. Looks great!

ylemkimon mentioned this pull request Sep 12, 2018

Add \class and \cssId on non-strict mode #1437

Closed

ylemkimon added the GH Review: review-needed label Sep 12, 2018

ronkok mentioned this pull request Sep 13, 2018

Support \includegraphics #1620

Merged

ylemkimon added 3 commits September 16, 2018 04:10

Add raw string group

f7a0ba0

Move comment parsing to Parser

6525f02

Use raw string group in URL group parser

80d6c1d

ylemkimon changed the title ~~Add raw string group, change URL group parser~~ Add raw string group, move comment parsing to Parser, change URL group parser Sep 15, 2018

Update types.js

f0b8adb

k4b7 requested changes Sep 22, 2018

View reviewed changes

Add multi-level nested url test

7121802

k4b7 reviewed Oct 13, 2018

View reviewed changes

k4b7 approved these changes Oct 13, 2018

View reviewed changes

Merge branch 'master' into raw-string-group

f17b571

k4b7 merged commit 3907545 into KaTeX:master Oct 13, 2018

ghost mentioned this pull request Oct 13, 2018

Adding \data function to attach data attributes to HTML elements #1698

Closed

ylemkimon deleted the raw-string-group branch March 21, 2019 14:27

snyk-bot mentioned this pull request Feb 23, 2020

[Snyk] Upgrade katex from 0.9.0 to 0.11.1 saurabharch/Rocket.Chat#1

Open

Uh oh!

Conversation

ylemkimon commented Sep 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ronkok commented Sep 15, 2018

Uh oh!

ylemkimon commented Sep 15, 2018

Uh oh!

ronkok commented Sep 15, 2018

Uh oh!

ylemkimon commented Sep 15, 2018

Uh oh!

k4b7 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k4b7 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ylemkimon commented Sep 12, 2018 •

edited

Loading

codecov bot commented Sep 12, 2018 •

edited

Loading