fix(grammar): allow text block ending in escaped backslash before closing """#5017
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5017 +/- ##
===========================================
Coverage 58.711% 58.711%
Complexity 2592 2592
===========================================
Files 702 702
Lines 40217 40217
Branches 7327 7327
===========================================
Hits 23612 23612
Misses 13634 13634
Partials 2971 2971
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
Is this an AI-generated PR? |
| */ | ||
| @Test | ||
| void textBlockEndingInDoubleBackslashOnSeparateLineParses() { | ||
| TextBlockLiteralExpr textBlock = parseStatement("String s = \"\"\"\n" + " foo\\\\\n" + " \"\"\";") |
There was a problem hiding this comment.
The test can be simplified by parsing an expression directly.
| */ | ||
| @Test | ||
| void textBlockEndingInDoubleBackslashAdjacentToCloserParses() { | ||
| TextBlockLiteralExpr textBlock = parseStatement( |
There was a problem hiding this comment.
The test can be simplified by parsing an expression directly.
It is partially. I had it check my work and make improvements, mostly the comments, tests, and PR markdown so I do not miss anything |
35ce067 to
fcb12f6
Compare
| // which might match that doublequote with following doublequotes. The ~[] is needed so the | ||
| // backslash escape ("\\" followed by any character) atomically so | ||
| // the escape's trailing character (most importantly, the " in \") | ||
| // cannot be re-used as part of the closing """ delimiter. The first alternative | ||
| // wins by JavaCC's longest-match rule whenever the current char is a backslash; | ||
| // the second alternative is the fallback for ordinary text-block characters. | ||
| <IN_TEXT_BLOCK> MORE :{ <TEXT_BLOCK_CONTENT: ( "\\" ~[] | ~[] ) > } |
There was a problem hiding this comment.
One thing to note is we could make this more explicit by listing all possible escape sequences instead of use a ~[] to represent all possible second characters in a escape sequence. I think what we have is more simple and less risky. The counter argument to that is listing all escape sequences would be closer to the actual Java spec.
2bdfcda to
c7652d8
Compare
…kslash-is-parsed-incorrectly
|
Sorry for the spam and poor quality tests |
|
Thank you for this PR. |
Fixes #4894.
Summary
A text block whose content ends in an escaped backslash (
\\) immediately beforethe closing
"""failed to parse, e.g.:produced
Lexical error ... Encountered: <EOF> after : "".javacaccepts thisform per JLS §3.10.6.
Root cause
In
javaparser-core/src/main/javacc/java.jj, theTEXT_BLOCK_CONTENTrule onlytreated
\"as an atomic escape:Given input
\\""", JavaCC's longest-match consumed the first\as a 1-char~[]match, then matched the second\together with the first"of theclosing delimiter as a 2-char
\"escape — leaving only"", which can neversatisfy the 3-quote
TEXT_BLOCK_LITERALtoken. The lexer ran off the end ofinput.
Fix
Recognise any
\X(backslash + any character) as a single atomic chunk:So
\\is now consumed as one unit and the closing"""stays intact.\"continues to work (still a 2-char match, still wins over the 1-char fallback),
and bare
"inside the block is unaffected because the 3-charTEXT_BLOCK_LITERALtoken outranks~[]at the closer.Tests
Two regression tests added in
javaparser-core-testing/.../ast/expr/TextBlockLiteralExprTest.java:textBlockEndingInDoubleBackslashAdjacentToCloserParses— canonical failingform from the issue (
\\immediately before""").textBlockEndingInDoubleBackslashOnSeparateLineParses— the\\on its ownline variant, asserting both raw value and
translateEscapes()output.