Skip to content

Add raw string group, move comment parsing to Parser, change URL group parser#1711

Merged
k4b7 merged 6 commits intoKaTeX:masterfrom
ylemkimon:raw-string-group
Oct 13, 2018
Merged

Add raw string group, move comment parsing to Parser, change URL group parser#1711
k4b7 merged 6 commits intoKaTeX:masterfrom
ylemkimon:raw-string-group

Conversation

@ylemkimon
Copy link
Member

@ylemkimon ylemkimon commented Sep 12, 2018

This adds a raw string group parser (parseStringGroup with the last argument true), which returns the raw string inside a group and allows:

This also moves comment parsing to Parser, which makes possible to allow percent sign at the Parser level and contextless (settingless) Lexer.

Furthermore, this changes URL parser to use raw string group, which makes

  • Lexer and Parser much simpler, as it doesn't require defining special cases and duplicating function parsing code
  • possible to use URL argument without modifying the code, in other places like Support \includegraphics #1620

@codecov
Copy link

codecov bot commented Sep 12, 2018

Codecov Report

Merging #1711 into master will decrease coverage by <.01%.
The diff coverage is 95.08%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1711      +/-   ##
==========================================
- Coverage   93.89%   93.89%   -0.01%     
==========================================
  Files          78       78              
  Lines        4569     4585      +16     
  Branches      805      811       +6     
==========================================
+ Hits         4290     4305      +15     
  Misses        246      246              
- Partials       33       34       +1
Flag Coverage Δ
#screenshotter 88.5% <32.78%> (-0.26%) ⬇️
#test 85.27% <95.08%> (+0.02%) ⬆️
Impacted Files Coverage Δ
src/parseNode.js 84.21% <ø> (ø) ⬆️
src/MacroExpander.js 95.68% <100%> (ø) ⬆️
src/Lexer.js 100% <100%> (ø) ⬆️
src/SourceLocation.js 100% <100%> (ø) ⬆️
src/Parser.js 97.04% <94.64%> (-0.09%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba8e224...f17b571. Read the comment docs.

@ronkok
Copy link
Collaborator

ronkok commented Sep 15, 2018

I applaud this PR. I also think it will be possible to recover use of "%" as a URL escape character. I plan to write a PR which does that after this PR lands. My general intent is to:

  1. Edit the Lexer RegEx so that it matches only a %, not %[^\n]*(?:\n|$).
  2. Add a parseComment function to Parser.js to replace the functionality of the old RegEx pattern
  3. Edit Parser's parseString function so that any non-URL groups that contain a % will throw an error. URL groups will not throw the error. That will enable the URL escape.

@ylemkimon
Copy link
Member Author

@ronkok Nice idea! If you don't mind, I'd like give it a try.

  1. Edit Parser's parseString function so that any non-URL groups that contain a % will throw an error. URL groups will not throw the error. That will enable the URL escape.

Instead of throwing an error, the LaTeX behavior seems to be continue parsing to the next line.

@ronkok
Copy link
Collaborator

ronkok commented Sep 15, 2018

I'd like give it a try.

Please, feel free.

LaTeX behavior seems to be continue parsing to the next line.

I haven't written any of this idea into code yet. But I want to keep the behavior that @edemaine created for comments. So in an expression like \frac{a}{b % s}, the group does not close and it should throw an error. That's my intent. Any way we get there would be okay with me.

@ylemkimon ylemkimon changed the title Add raw string group, change URL group parser Add raw string group, move comment parsing to Parser, change URL group parser Sep 15, 2018
@ylemkimon
Copy link
Member Author

@ronkok Thank you for the suggestion! I’ve implemented it and it have made code much simpler.

Copy link
Member

@k4b7 k4b7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting more tests for the new raw arg type.

firstToken.range(lastToken, str));
case "%":
if (!raw) { // allow % in raw string group
this.consumeComment();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's kind of hard to do that lever level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinbarabash Yes, it's context-dependent.

this.expect(groupBegin);
let str = "";
const firstToken = this.nextToken;
let nested = 0; // allow nested braces in raw string group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's truly "raw" we might want to allow mismatched braces. In practice that's probably not all that useful so maybe add a comment describing the limitations of raw arg types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinbarabash If we allow unmatched braces, it's impossible to determine where the group ends, without forbidding a right brace.

if (optional && this.nextToken.text !== "[") {
return null;
const groupBegin = optional ? "[" : "{";
const groupEnd = optional ? "]" : "}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

consumeComment() {
// the newline character is normalized in Lexer, check original source
while (this.nextToken.text !== "EOF" && this.nextToken.loc &&
this.nextToken.loc.getSource().indexOf("\n") === -1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not this.nextToken.text.indexOf("\n") === -1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinbarabash Lexer normalizes whitespaces to a single space( ). I think it's better to look at the source, instead of separating whitespace token and newline token.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. It might be worth adding a comment here about that.

// argument is parsed normally)
// - Mode: Node group parsed in given mode.
export type ArgType = "color" | "size" | "url" | "original" | Mode;
export type ArgType = "color" | "size" | "url" | "raw" | "original" | Mode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have raw, I wonder if we should move the parsing of color, size, and url into functions/*.js.

expect("\\kern{1 %kern\nem}").toParse();
expect("\\kern1 %kern\nem").toParse();
expect("\\color{#f00%red\n}").toParse();
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have some tests for parsing raw arg types. It would be cool if we had a test that defines a function that accepts one raw arg and one optional raw arg then we could test that including [] or {} in the raw string works as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinbarabash Currently, it seems there is no way to define a function cleanly in the tests. I've updated existing test case below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's something we can do in the future. I've opened an issue for it #1740.


it("should allow balanced braces in url", function() {
const url = "http://example.org/{too}";
const url = "http://example.org/{{}t{oo}}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Copy link
Member

@k4b7 k4b7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants