Skip to content

fix(grammar): allow text block ending in escaped backslash before closing """#5017

Merged
jlerbsc merged 2 commits into
javaparser:masterfrom
ethan-godden:issue-4894-text-block-ending-in-double-backslash-is-parsed-incorrectly
May 19, 2026
Merged

fix(grammar): allow text block ending in escaped backslash before closing """#5017
jlerbsc merged 2 commits into
javaparser:masterfrom
ethan-godden:issue-4894-text-block-ending-in-double-backslash-is-parsed-incorrectly

Conversation

@ethan-godden

Copy link
Copy Markdown
Contributor

Fixes #4894.

Summary

A text block whose content ends in an escaped backslash (\\) immediately before
the closing """ failed to parse, e.g.:

String s = """
    arbitrary text
    \\""";

produced Lexical error ... Encountered: <EOF> after : "". javac accepts this
form per JLS §3.10.6.

Root cause

In javaparser-core/src/main/javacc/java.jj, the TEXT_BLOCK_CONTENT rule only
treated \" as an atomic escape:

<IN_TEXT_BLOCK> MORE : { <TEXT_BLOCK_CONTENT: ( "\\" "\"" | ~[] ) > }

Given input \\""", JavaCC's longest-match consumed the first \ as a 1-char
~[] match, then matched the second \ together with the first " of the
closing delimiter as a 2-char \" escape — leaving only "", which can never
satisfy the 3-quote TEXT_BLOCK_LITERAL token. The lexer ran off the end of
input.

Fix

Recognise any \X (backslash + any character) as a single atomic chunk:

<IN_TEXT_BLOCK> MORE : { <TEXT_BLOCK_CONTENT: ( "\\" ~[] | ~[] ) > }

So \\ is now consumed as one unit and the closing """ stays intact. \"
continues to work (still a 2-char match, still wins over the 1-char fallback),
and bare " inside the block is unaffected because the 3-char
TEXT_BLOCK_LITERAL token outranks ~[] at the closer.

Tests

Two regression tests added in
javaparser-core-testing/.../ast/expr/TextBlockLiteralExprTest.java:

  • textBlockEndingInDoubleBackslashAdjacentToCloserParses — canonical failing
    form from the issue (\\ immediately before """).
  • textBlockEndingInDoubleBackslashOnSeparateLineParses — the \\ on its own
    line variant, asserting both raw value and translateEscapes() output.

@codecov

codecov Bot commented May 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.711%. Comparing base (b4e443b) to head (fbfd811).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##              master     #5017   +/-   ##
===========================================
  Coverage     58.711%   58.711%           
  Complexity      2592      2592           
===========================================
  Files            702       702           
  Lines          40217     40217           
  Branches        7327      7327           
===========================================
  Hits           23612     23612           
  Misses         13634     13634           
  Partials        2971      2971           
Flag Coverage Δ
AlsoSlowTests 58.711% <ø> (ø)
javaparser-core 58.711% <ø> (ø)
javaparser-symbol-solver 58.711% <ø> (ø)
jdk-10 58.284% <ø> (+0.002%) ⬆️
jdk-11 58.280% <ø> (-0.003%) ⬇️
jdk-12 58.280% <ø> (ø)
jdk-13 58.283% <ø> (ø)
jdk-14 58.514% <ø> (+0.002%) ⬆️
jdk-15 58.514% <ø> (ø)
jdk-16 58.489% <ø> (ø)
jdk-17 58.638% <ø> (ø)
jdk-18 58.638% <ø> (+0.002%) ⬆️
jdk-8 58.120% <ø> (ø)
jdk-9 58.279% <ø> (-0.003%) ⬇️
macos-latest 58.686% <ø> (ø)
ubuntu-latest 58.681% <ø> (ø)
windows-latest 58.694% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d779be...fbfd811. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jlerbsc

jlerbsc commented May 18, 2026

Copy link
Copy Markdown
Collaborator

Is this an AI-generated PR?

*/
@Test
void textBlockEndingInDoubleBackslashOnSeparateLineParses() {
TextBlockLiteralExpr textBlock = parseStatement("String s = \"\"\"\n" + " foo\\\\\n" + " \"\"\";")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test can be simplified by parsing an expression directly.

@ethan-godden ethan-godden May 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in c7652d8

*/
@Test
void textBlockEndingInDoubleBackslashAdjacentToCloserParses() {
TextBlockLiteralExpr textBlock = parseStatement(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test can be simplified by parsing an expression directly.

@ethan-godden ethan-godden May 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in c7652d8

@ethan-godden

ethan-godden commented May 18, 2026

Copy link
Copy Markdown
Contributor Author

Is this an AI-generated PR?

@jlerbsc

It is partially. I had it check my work and make improvements, mostly the comments, tests, and PR markdown so I do not miss anything

@ethan-godden ethan-godden force-pushed the issue-4894-text-block-ending-in-double-backslash-is-parsed-incorrectly branch 4 times, most recently from 35ce067 to fcb12f6 Compare May 19, 2026 05:08
Comment on lines +843 to +849
// which might match that doublequote with following doublequotes. The ~[] is needed so the
// backslash escape ("\\" followed by any character) atomically so
// the escape's trailing character (most importantly, the " in \")
// cannot be re-used as part of the closing """ delimiter. The first alternative
// wins by JavaCC's longest-match rule whenever the current char is a backslash;
// the second alternative is the fallback for ordinary text-block characters.
<IN_TEXT_BLOCK> MORE :{ <TEXT_BLOCK_CONTENT: ( "\\" ~[] | ~[] ) > }

@ethan-godden ethan-godden May 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note is we could make this more explicit by listing all possible escape sequences instead of use a ~[] to represent all possible second characters in a escape sequence. I think what we have is more simple and less risky. The counter argument to that is listing all escape sequences would be closer to the actual Java spec.

@ethan-godden ethan-godden force-pushed the issue-4894-text-block-ending-in-double-backslash-is-parsed-incorrectly branch from 2bdfcda to c7652d8 Compare May 19, 2026 05:23
@ethan-godden

Copy link
Copy Markdown
Contributor Author

Sorry for the spam and poor quality tests

@jlerbsc jlerbsc merged commit 6962fa6 into javaparser:master May 19, 2026
35 checks passed
@jlerbsc jlerbsc added this to the next release milestone May 19, 2026
@jlerbsc jlerbsc added the PR: Fixed A PR that offers a fix or correction label May 19, 2026
@jlerbsc

jlerbsc commented May 19, 2026

Copy link
Copy Markdown
Collaborator

Thank you for this PR.

@ethan-godden ethan-godden deleted the issue-4894-text-block-ending-in-double-backslash-is-parsed-incorrectly branch May 20, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Fixed A PR that offers a fix or correction

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Text block ending in double backslash is parsed incorrectly

2 participants