Attempt error recovery in the TerminatedBy parsers#567
Attempt error recovery in the TerminatedBy parsers#567AntonyBlakey merged 4 commits intoNomicFoundation:mainfrom
Conversation
|
950d580 to
c9cad99
Compare
.changeset/wicked-beds-buy.md
Outdated
| "@nomicfoundation/slang": minor | ||
| --- | ||
|
|
||
| We attempt simple CST error recovery for common "terminated-by" and "delimited-by" groups instead of bailing out on incomplete/invalid input |
There was a problem hiding this comment.
nit: "terminated-by" and "delimited-by" are internal grammar concepts. I suggest using user-facing terms. Maybe something like "terminators (semicolon)" and "delimiters (brackets)" instead?
| expect(errors).toHaveLength(1); | ||
|
|
||
| const report = errors[0]?.toErrorReport("test.sol", source, /* withColor */ false); | ||
| let reports = errors.map((error: any) => error.toErrorReport("test.sol", source, /* withColor */ false)); |
There was a problem hiding this comment.
I wonder why the change here? we already asserted it has a length of 1 above. Also, why would we use any here?
There was a problem hiding this comment.
The change was here as at some points there were two errors emitted, and I blessed the snapshots everytime for consistency; didn't revert that after it was only one error. The any was because the type was not inferred but now I see that it's unresolved in Node.js rather than being pointed to parse_output.ParseOutput. I can fix that in a follow-up and I can revert the code change here.
There was a problem hiding this comment.
because the type was not inferred but now I see that it's unresolved in Node.js
Is the generated .d.ts is broken somehow? not blocking to this PR, but I would have expected the earlier test to fail if it cannot resolve the types, since we compile with struct: true?
| Rule(RuleKind), | ||
| Token(TokenKind), | ||
| Trivia(TokenKind), | ||
| Error, |
There was a problem hiding this comment.
I suggest storing/rendering the error message from ErrorNode as well, instead of dropping it.
There was a problem hiding this comment.
I understand the consensus is to drop the Node, so this probably won't end up in a CST
|
|
||
| Tree: | ||
| - Block (Rule): # 0..24 "{ unchecked { x = 1; } }" | ||
| - Block (Rule): # 0..22 "{ unchecked { x = 1; }" |
There was a problem hiding this comment.
The Block changed here from 24 to 22 bytes, as we dropped the last }. Is that intentional?
The CST should always be complete (covers the entire input).
There was a problem hiding this comment.
I can bring back the SKIPPED token handling for the incomplete parse
| - FunctionDefinition (Rule): # 18..28 " function" | ||
| - FunctionKeyword (Token): "function" # 20..28 | ||
| - SKIPPED (Token): "" # 28..28 | ||
| - Error: "" # 28..28 |
There was a problem hiding this comment.
Two empty error nodes here. Is that expected?
Do we need to create them in this case?
There was a problem hiding this comment.
If this a WIP result - maybe we should split this into the commits before and after. I'd prefer not to publish this CST result.
OmarTawfik
left a comment
There was a problem hiding this comment.
Left a few questions on the structure/test output.
| if_true | ||
| } else { | ||
| let flags = self.iter().map(|vqr| { | ||
| let flag = format_ident!( |
There was a problem hiding this comment.
Not keen on this change - the existing code has the version checks in the scanner/parser 'inner loop' as simple bool tests.
In any case, I don't think this change should be in this PR.
There was a problem hiding this comment.
Wanted to reduce ad-hoc identifier generation since it makes the harness/code harder to follow and adds unnecessary state to the language IMO (these should be initialized and read-only) but I can revert that; I don't have strong feelings about this
| @@ -0,0 +1,165 @@ | |||
| use crate::{ | |||
There was a problem hiding this comment.
I very much like this.
| /// a full match if possible, otherwise on the best incomplete match. | ||
| pub struct ChoiceHelper { | ||
| result: ParserResult, | ||
| is_done: bool, |
There was a problem hiding this comment.
The fact that the result value does double duty as the done sentinel might be more efficient in some way, but is less easy to comprehend at a glance.
There was a problem hiding this comment.
I didn't think about efficiency here but more about being impossible to encode invalid state (e.g. it's impossible to have is_done == true && matches!(result, IncompleteResult(..)))).
The code is ultimately a reduction that settles on a particular ParserResult state, so I wanted to make it clear that is_done is derived from the result and not used alongside it. Maybe the finished_state! is a bit of an overkill, I can try to make it more obvious but I'd rather not reintroduce is_done as state as it made me harder to follow what uses/mutates/depends on what.
| let mut helper = SequenceHelper::new(); | ||
| loop { | ||
| // Poor man's try block | ||
| std::iter::once(()).try_for_each(|_| { |
There was a problem hiding this comment.
IMO should be a macro. It might better than loop, but the hack is uglier. At it still abuses the concept of repetition.
There was a problem hiding this comment.
This was cleaned up later on
|
|
||
| impl Language { | ||
| const VERSIONS: &'static [&'static str] = &[ | ||
| pub const SUPPORTED_VERSIONS: &[Version] = &[ |
| - FunctionDefinition (Rule): # 18..28 " function" | ||
| - FunctionKeyword (Token): "function" # 20..28 | ||
| - SKIPPED (Token): "" # 28..28 | ||
| - Error: "" # 28..28 |
There was a problem hiding this comment.
If this a WIP result - maybe we should split this into the commits before and after. I'd prefer not to publish this CST result.
| - FunctionKeyword (Token): "function" # 95..103 | ||
| - Error: "" # 103..103 | ||
| - Error: "nobody() public\n\nfunction noargs public {\n\tbreak" # 104..152 | ||
| - Semicolon (Token): ";" # 152..153 |
There was a problem hiding this comment.
We can't have this on main if this doesn't produce a complete CST covering every char of the input.
The 59645e6 change is unrelated but I hope it's simple enough that I can squeeze it in as well here. This includes some smaller, some bigger refactorings while I worked on #567. I hope they are useful on their own and this will help with later work related to #567, as I'll need more lexer-related utilities (i.e. skipping or consuming until a token is found) and it got somewhat unwieldy when I left it in the language.tera. Next up, I can separate lexer-related bits out of the language.tera into a dedicated lexer.tera that may contain `LexicalContext` (cc @AntonyBlakey wrt #567 (comment)) if if you think it's worthwhile.
Most importantly, this documents and tries to simplify the parser helpers and improves legibility of the generated parse functions. Done as part of the #567
d0f601d to
b13bdce
Compare
|
Updated the PR, see OP #567 (comment). |
|
|
||
| use super::super::text_index::TextIndex; | ||
|
|
||
| pub struct Stream<'s> { |
There was a problem hiding this comment.
Given that this is now more than a stream, it should be renamed to ParserContext - either including a Stream or having trait Stream. And wherever &mut Stream is used the param name changed as well.
The 59645e669c72a8468f510531268bb6ed57ae1a29 change is unrelated but I hope it's simple enough that I can squeeze it in as well here. This includes some smaller, some bigger refactorings while I worked on #567. I hope they are useful on their own and this will help with later work related to #567, as I'll need more lexer-related utilities (i.e. skipping or consuming until a token is found) and it got somewhat unwieldy when I left it in the language.tera. Next up, I can separate lexer-related bits out of the language.tera into a dedicated lexer.tera that may contain `LexicalContext` (cc @AntonyBlakey wrt NomicFoundation/slang#567 (comment)) if if you think it's worthwhile.
Based on #564
EDIT:
As asked, this was split into supporting PRs (#579, #580, merged now), doesn't use the Error CST node anymore and only contains terminated-by recovery.
To support error side-channel in a backtracking scenario,
Vec<ParseError>was added to theStreamwhich serves more as a parse context now. To backtrack, we first record the position of the stream and how many errors are there; then, we reset the errors and the position to the first recordedMarkerstruct (basically copying whatchumskydoes).This wasn't changed everywhere as
Stream::set_positionin the lexer will never emit errors (only parsers do), nor will the optional trivia parsers. I'm happy to polish the lexer interface a bit to accommodate for this change later, if that's okay. First, I wanted to make sure the approach is fine and accepted, until we proceed with the DelimitedBy error recovery.