Skip to content

feat: capture and improve lexer error messages for invalid escape sequences#420

Merged
nawforce merged 2 commits into
mainfrom
feature/string-literal-escape-validation
Feb 22, 2026
Merged

feat: capture and improve lexer error messages for invalid escape sequences#420
nawforce merged 2 commits into
mainfrom
feature/string-literal-escape-validation

Conversation

@kjonescertinia

Copy link
Copy Markdown
Contributor

Summary

  • Add lexer error collection via CollectingErrorListener on both JVM and JS platforms
  • Transform cryptic ANTLR "token recognition error" messages into user-friendly "Invalid escape sequence '\x' in string" messages
  • Fix OuterBlock caching to not cache BlockContext when parsing has errors, ensuring syntax errors are consistently reported on re-validation

Changes

  1. CodeParser (JVM & JS): Attach CollectingErrorListener to the lexer as well as the parser to capture lexer errors
  2. CollectingErrorListener (JVM & JS): Add regex-based message transformation to improve escape sequence error messages
  3. Statements.scala: Only cache BlockContext in OuterBlock when parsing succeeds without errors
  4. LiteralTypeTest: Add tests for valid and invalid escape sequences

Test plan

  • All existing tests pass (2513 JVM tests)
  • scalafmt check passes
  • Manual testing with ApexFlow to verify consistent error display on file re-save

Related to #409

🤖 Generated with Claude Code

…uences

- Add lexer error collection via CollectingErrorListener on both JVM and JS
- Transform cryptic "token recognition error" messages into user-friendly
  "Invalid escape sequence '\x' in string" messages
- Fix OuterBlock caching to not cache BlockContext when parsing has errors,
  ensuring syntax errors are consistently reported on re-validation
- Add tests for valid and invalid escape sequences
@kjonescertinia

Copy link
Copy Markdown
Contributor Author

Future improvement note

If we want to further improve escape sequence validation (e.g., more Salesforce-like error messages or better error recovery), we should first address apex-dev-tools/outline-parser#17 which tracks adding escape sequence validation to the OutlineParser.

The current implementation captures lexer errors from ANTLR parsing, but with the OutlineParser being the default (and intended to become the only parser), the long-term solution should be in the OutlineParser itself.

It would also be worth studying what switching to the combined Outline/ANTLR model from a pure ANTLR model means for error reporting in general.

@nawforce

Copy link
Copy Markdown
Contributor

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea1e996e2e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

val improvedMsg = msg match {
case tokenErrorPattern(content) =>
content match {
case escapePattern(escape) =>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Report the offending escape instead of the first one

The new escapePattern path in syntaxError captures the first \\. sequence in the lexer error payload, not the one that actually caused lexing to fail. ANTLR token-recognition messages include text from token start to failure, so a literal like '\\n\\q' can be reported as Invalid escape sequence '\\n' in string even though \\q is the invalid escape, which gives users a wrong diagnostic target. Please extract the escape nearest the failure point (or otherwise identify the offending escape) instead of the first match; the same logic is mirrored in the JS listener.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex fix this

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Testing

  • ⚠️ curl -L --max-time 20 "https://duckduckgo.com/html/?q=..." | head -n 40 (web search attempt blocked by network/proxy with CONNECT tunnel failed, response 403)
  • sbt scalafmtAll
  • sbt "jvm/testOnly com.nawforce.apexlink.cst.LiteralTypeTest" (invalid SBT project syntax)
  • sbt "jvmTest / testOnly com.nawforce.apexlink.cst.LiteralTypeTest" (invalid SBT project syntax)
  • sbt "apexlsJVM / testOnly com.nawforce.apexlink.cst.LiteralTypeTest"

View task →

@kjonescertinia kjonescertinia marked this pull request as ready for review February 22, 2026 12:22
@nawforce nawforce merged commit 540405c into main Feb 22, 2026
1 check passed
@nawforce nawforce deleted the feature/string-literal-escape-validation branch February 22, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants