feat: add lexer/parser support for Summer '26 multi-line strings (#102)#104
Conversation
Add MultilineStringLiteral token for the Summer '26 triple-quoted string
syntax: '''<NL>...'''. The body must start on a new line after the opening
triple quote, matching platform behaviour confirmed against a Summer '26
pre-release org.
Strict newline enforcement is deliberate. Without it, malformed code like
'''abc''' would be silently accepted as a multi-line literal that the
platform later rejects. With strict enforcement, ANTLR falls back to
lexing it as three legacy StringLiteral tokens ('', 'abc', ''), preserving
a recognisable token pattern for downstream tooling. apex-ls#443 will
detect this pattern and surface a targeted "did you mean a multi-line
string?" diagnostic with quick fix.
The new token is accepted alongside StringLiteral at all 9 parser sites
(literal, whenLiteral, SOQL DISTANCE/value, SOSL WITH clauses,
networkList) via inline alternation, preserving parse-tree shape for
existing consumers.
Closes #102
|
Cross-reference: outline-parser also affected — filed as outline-parser#27. The hand-written tokenizer in outline-parser pairs quotes greedily two-at-a-time, so multi-line bodies with an odd number of single quotes (e.g. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 299f1d06b1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Declared before StringLiteral so longest-match prefers '''...''' over the | ||
| // degenerate ''+'...'+'' fallback when a newline follows the opening. | ||
| MultilineStringLiteral | ||
| : '\'\'\'' [\r\n] ( EscapeSequence | . )*? '\'\'\'' |
There was a problem hiding this comment.
Disallow invalid backslash escapes in multiline literals
MultilineStringLiteral currently uses ( EscapeSequence | . )*?, and . matches a raw backslash, so malformed escapes like '''\n\q''' (or any \<invalid>) are accepted as a valid token with no lexer error. That is looser than existing StringLiteral behavior and undermines the commit’s stated goal of matching Apex string semantics, because invalid escapes in multiline strings will now silently parse instead of being flagged. Restrict the fallback branch so backslashes are only consumed via EscapeSequence.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — fixed in 8e1958d.
Verified against the Summer '26 pre-release org: the platform rejects \q inside a '''<NL>...<NL>''' body with the same "Illegal character sequence" error it raises for single-quoted strings. So strict matches platform behaviour.
Rule changed from ( EscapeSequence | . )*? to ( EscapeSequence | '\'' | ~['\\] )*?. Backslashes are now only consumable via EscapeSequence (matching StringLiteral semantics); single quotes are handled separately since ~['\\] excludes them; the *? plus '\'\'\'' terminator still allows 1- or 2-quote runs in the body. Added a lexer test for an invalid escape in a multi-line body.
There was a problem hiding this comment.
Codex Review: Didn't find any major issues. Delightful!
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex review on #104 flagged that the rule used `( EscapeSequence | . )*?`, where `.` matches any char including a raw backslash. A malformed escape like `\q` inside a `'''<NL>...<NL>'''` body silently consumed the backslash as a regular char — looser than `StringLiteral`, which rejects bad escapes via `~['\\] | EscapeSequence`. Verified against the Summer '26 pre-release org: the platform rejects `\q` in multi-line bodies with the same "Illegal character sequence" error it raises for single-quoted strings. Our lexer should match that. Replace `.` with `'\'' | ~['\\]` so backslashes are only consumable via EscapeSequence. Single quotes are handled separately because `~['\\]` excludes them; the surrounding `*?` plus the `'\'\'\''` terminator ensures 1- or 2-quote runs in the body still parse cleanly while a 3-quote run terminates the literal. Add a lexer test covering an invalid escape inside a multi-line body.
Summary
Adds
MultilineStringLiteraltoken support for Salesforce Summer '26 triple-quoted string syntax ('''<NL>...'''). Closes #102.Empirical findings (verified against Summer '26 pre-release org)
'''abc'''(no newline after open)''''''(six quotes)'''<NL>'''(empty body)""'''<NL>...<NL>''''and double''in body\t\'\\çescapes'''${var}+.template(map)${...}is plain literal text in the parser; no parser work neededIndent stripping is a runtime string semantic, not a lexer concern — the token captures raw multi-line text and downstream consumers can resolve as needed.
Design: strict newline enforcement
The lexer rule requires
[\r\n]immediately after the opening''':This is deliberately strict to match platform behaviour. The alternative — a permissive lexer accepting
'''abc'''— would silently lex it as a single multi-line literal that the parser accepts but the platform later rejects, hiding the user's mistake.With strict enforcement, ANTLR falls back gracefully on malformed input:
'''abc'''lexes as three legacyStringLiteraltokens ('','abc',''). That's a recognisable, unambiguous token pattern — apex-ls#443 will detect it and surface a targeted "did you mean a multi-line string?" diagnostic with quick fix.apex-parser stays minimal: just the grammar, no special-case diagnostics. The helpful UX is layered in the language server where it has user context.
Parser integration
MultilineStringLiteralis accepted alongsideStringLiteralat all 9 sites via inline alternation:literal,whenLiteral(Apex)value,DISTANCE(...)WITH DIVISION,WITH NETWORK,WITH PRICEBOOKID,WITH METADATA,networkListThis preserves parse-tree shape — no rule restructure, no breaking change for tree-walking consumers.
Test plan
'''abc''', six-quote fallback, unterminated literalliteralrule, in class body (JSON example), in concatenation expressionsf apex runFollow-up