Implement initial delimited error recovery#586
Implement initial delimited error recovery#586AntonyBlakey merged 44 commits intoNomicFoundation:mainfrom
Conversation
|
| /// At which token was the stream pointing at when we bailed | ||
| pub found: TokenKind, | ||
| /// Token we expected to find | ||
| pub expected_tokens: Vec<TokenKind>, |
There was a problem hiding this comment.
This is a Vec to deduplicate code in the parser_function.rs but I can rework it so that it's only a single TokenKind
There was a problem hiding this comment.
Better to have correct types constraining the data than optimising for dedup code IMO. Like recent is_done in helpers. Very strong believer in type constraint (in spite of the fact that I was on the wrong side of is_done).
| if is_single_token_with_trivia | ||
| && found_token.map_or(false, |t| running.expected_tokens.contains(&t.kind)) | ||
| { |
There was a problem hiding this comment.
This is a bit hacky as this couples a bit the recovery and the sequence helper logic here; Also trivia handling seems wrong when I tried to plug this same mechanism to the TerminatedBy. Will need to investigate further, but maybe it's good enough for just DelimitedBy for now?
Fixed the trivia handling and the termination recovery uses the same recovery mechanism. The only thing that's left is the coupling between the SequenceHelper and this case.
Decided to leave this here, as both terminated-by and delimited-by uses this recovery mechanism and separated-by will likely follow.
Previously, the body could've short-circuited (e.g. when we got an IncompleteMatch), so use RAII pattern to ensure that we will always pop the closing delimiter when escaping the sequence scope.
|
Updated, see the OP. |
…#591) Does what it says on the tin. Helps recover past unrecognized characters while still stopping at EOF, see test change in a9777a5 (#586). cc @AntonyBlakey since we talked about this
58feead to
696a869
Compare
This reverts commit 6bad240. Separated this into NomicFoundation#592.
This also adds coverage to the sequence SkippedUntil + Match case, where the Match is empty.
643772b to
0469c69
Compare
|
This should be ready for review - I polished the PR, added few tests and explanatory comments. This is the initial delimited error recovery, which falls flat when recovering from no matches at all, e.g. |
This changed because of the NomicFoundation#586 that was merged in the meantime.
Fixes #553
This roughly corresponds to our idea that we talked about:
ParserContexta. We keep a local stack, push and pop if we see the corresponding open/close delimiters
b. Only recover when the local stack is empty and the expected delimiter is encountered
This helps us recover from some of the simpler cases; it'd be great to follow up with better recovery in the terminated/delimited scenario and I need to put more thought how we can best recover from cases like
{ ( } )EDIT:
After noticing that
skips the
1 + 2 * 3in the final CST, I had to go back and fix how eager we are when attempting the recovery.The main fix is in e647ba8 and the rest is mainly fixing the resulting regressions.
When backtracking/in a choice, we now will pick parses that skip less input. Without recovery, we had the following:
and so, the choice is obvious - match > incomplete match (longer prefix is better) > no match.
However, this PR introduces a new kind of a recovered match. For the purposes of the old system, we treat it as a match, but we disambiguate by recursively checking if the node contains a
TokenKind::SKIPPED(in which case, we attempted a recovery). This is necessary, since now the recovered matches can have skipped tokens at any point inside and we need to pick the best matching one, so that we always still pick the "correct" parse over a recovered one.This is suboptimal and we may optimize that in a follow-up by introducing a
RecoveredMatchvariant and propagate it upwards and/or also keep the non-skipped text length of the descendent tokens to avoid recalculating that with each choice consideration. I hope that this can be simplified once we cut down on the backtracking at some point.I checked with the smoke test suite and it looks that we don't seem to have misparses anymore (although it would be great to get a second set of eyes on the results) and that the errors greatly stem from other issues (e.g.
from/errorcontextual keywords not being handled, incorrectpragma solidityannotations in source).