feat: capture and improve lexer error messages for invalid escape sequences#420
Conversation
…uences - Add lexer error collection via CollectingErrorListener on both JVM and JS - Transform cryptic "token recognition error" messages into user-friendly "Invalid escape sequence '\x' in string" messages - Fix OuterBlock caching to not cache BlockContext when parsing has errors, ensuring syntax errors are consistently reported on re-validation - Add tests for valid and invalid escape sequences
Future improvement noteIf we want to further improve escape sequence validation (e.g., more Salesforce-like error messages or better error recovery), we should first address apex-dev-tools/outline-parser#17 which tracks adding escape sequence validation to the OutlineParser. The current implementation captures lexer errors from ANTLR parsing, but with the OutlineParser being the default (and intended to become the only parser), the long-term solution should be in the OutlineParser itself. It would also be worth studying what switching to the combined Outline/ANTLR model from a pure ANTLR model means for error reporting in general. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea1e996e2e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| val improvedMsg = msg match { | ||
| case tokenErrorPattern(content) => | ||
| content match { | ||
| case escapePattern(escape) => |
There was a problem hiding this comment.
Report the offending escape instead of the first one
The new escapePattern path in syntaxError captures the first \\. sequence in the lexer error payload, not the one that actually caused lexing to fail. ANTLR token-recognition messages include text from token start to failure, so a literal like '\\n\\q' can be reported as Invalid escape sequence '\\n' in string even though \\q is the invalid escape, which gives users a wrong diagnostic target. Please extract the escape nearest the failure point (or otherwise identify the offending escape) instead of the first match; the same logic is mirrored in the JS listener.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Summary
- Updated JVM lexer error extraction to capture the last escape sequence in ANTLR token-recognition payloads, so diagnostics point at the offending escape nearest failure instead of the first escape in the string. jvm/src/main/scala/com/nawforce/runtime/parsers/CollectingErrorListener.scalaL26-L49
- Applied the same fix in the Scala.js listener to keep JVM/JS behavior aligned. js/src/main/scala/com/nawforce/runtime/parsers/CollectingErrorListener.scalaL26-L49
- Added a regression test for
'\\n\\q'to assert the reported invalid escape is\\qand not\\n. jvm/src/test/scala/com/nawforce/apexlink/cst/LiteralTypeTest.scalaL190-L202 - Committed as
fix: report offending invalid escape sequence(bb4933d) and created a follow-up PR message viamake_prsummarizing this delta.
Testing
⚠️ curl -L --max-time 20 "https://duckduckgo.com/html/?q=..." | head -n 40(web search attempt blocked by network/proxy withCONNECT tunnel failed, response 403)- ✅
sbt scalafmtAll - ❌
sbt "jvm/testOnly com.nawforce.apexlink.cst.LiteralTypeTest"(invalid SBT project syntax) - ❌
sbt "jvmTest / testOnly com.nawforce.apexlink.cst.LiteralTypeTest"(invalid SBT project syntax) - ✅
sbt "apexlsJVM / testOnly com.nawforce.apexlink.cst.LiteralTypeTest"
Summary
CollectingErrorListeneron both JVM and JS platformsOuterBlockcaching to not cacheBlockContextwhen parsing has errors, ensuring syntax errors are consistently reported on re-validationChanges
CollectingErrorListenerto the lexer as well as the parser to capture lexer errorsBlockContextinOuterBlockwhen parsing succeeds without errorsTest plan
Related to #409
🤖 Generated with Claude Code