Implement initial delimited error recovery by Xanewok · Pull Request #586 · NomicFoundation/slang

Xanewok · 2023-08-31T14:52:07Z

Fixes #553

This roughly corresponds to our idea that we talked about:

we keep the stack of closing delimiters in the ParserContext
push the expected closing delimiter (context-dependent) after we enter the delimited group
Attempt to recover from the delimited body parse by using greedy scan, in case it's incomplete or there's no closing delimiter immediately following
a. We keep a local stack, push and pop if we see the corresponding open/close delimiters
b. Only recover when the local stack is empty and the expected delimiter is encountered

This helps us recover from some of the simpler cases; it'd be great to follow up with better recovery in the terminated/delimited scenario and I need to put more thought how we can best recover from cases like { ( } )

EDIT:
After noticing that

function func() {
  uint a = 1 + 2 * 3;
}

skips the 1 + 2 * 3 in the final CST, I had to go back and fix how eager we are when attempting the recovery.

The main fix is in e647ba8 and the rest is mainly fixing the resulting regressions.
When backtracking/in a choice, we now will pick parses that skip less input. Without recovery, we had the following:

match (a full match)
incomplete match (partial prefix match)
no match

and so, the choice is obvious - match > incomplete match (longer prefix is better) > no match.

However, this PR introduces a new kind of a recovered match. For the purposes of the old system, we treat it as a match, but we disambiguate by recursively checking if the node contains a TokenKind::SKIPPED (in which case, we attempted a recovery). This is necessary, since now the recovered matches can have skipped tokens at any point inside and we need to pick the best matching one, so that we always still pick the "correct" parse over a recovered one.

This is suboptimal and we may optimize that in a follow-up by introducing a RecoveredMatch variant and propagate it upwards and/or also keep the non-skipped text length of the descendent tokens to avoid recalculating that with each choice consideration. I hope that this can be simplified once we cut down on the backtracking at some point.

I checked with the smoke test suite and it looks that we don't seem to have misparses anymore (although it would be great to get a second set of eyes on the results) and that the errors greatly stem from other issues (e.g. from/error contextual keywords not being handled, incorrect pragma solidity annotations in source).

changeset-bot · 2023-08-31T14:52:10Z

⚠️ No Changeset found

Latest commit: 0469c69

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

crates/codegen/parser/generator/src/code_generator.rs

crates/codegen/parser/runtime/src/cst.rs

crates/codegen/parser/runtime/src/kinds.rs

Xanewok · 2023-08-31T14:54:02Z

crates/codegen/parser/runtime/src/support/parser_result.rs

+    /// At which token was the stream pointing at when we bailed
+    pub found: TokenKind,
+    /// Token we expected to find
+    pub expected_tokens: Vec<TokenKind>,


This is a Vec to deduplicate code in the parser_function.rs but I can rework it so that it's only a single TokenKind

Better to have correct types constraining the data than optimising for dedup code IMO. Like recent is_done in helpers. Very strong believer in type constraint (in spite of the fact that I was on the wrong side of is_done).

Agreed; changed in 662693b

crates/codegen/parser/runtime/src/support/recovery.rs

Xanewok · 2023-08-31T14:57:54Z

crates/codegen/parser/runtime/src/support/sequence_helper.rs

+                    if is_single_token_with_trivia
+                        && found_token.map_or(false, |t| running.expected_tokens.contains(&t.kind))
+                    {


This is a bit hacky as this couples a bit the recovery and the sequence helper logic here; ~~Also trivia handling seems wrong when I tried to plug this same mechanism to the TerminatedBy. Will need to investigate further, but maybe it's good enough for just DelimitedBy for now?~~

~~Fixed the trivia handling and the termination recovery uses the same recovery mechanism. The only thing that's left is the coupling between the SequenceHelper and this case.~~

Decided to leave this here, as both terminated-by and delimited-by uses this recovery mechanism and separated-by will likely follow.

Per NomicFoundation#586 (comment)

Previously, the body could've short-circuited (e.g. when we got an IncompleteMatch), so use RAII pattern to ensure that we will always pop the closing delimiter when escaping the sequence scope.

Xanewok · 2023-09-04T21:10:15Z

Updated, see the OP.

…mited

@AntonyBlakey

…#591) Does what it says on the tin. Helps recover past unrecognized characters while still stopping at EOF, see test change in a9777a5 (#586). cc @AntonyBlakey since we talked about this

This reverts commit 6bad240. Separated this into NomicFoundation#592.

This also adds coverage to the sequence SkippedUntil + Match case, where the Match is empty.

Xanewok · 2023-09-09T00:53:00Z

This should be ready for review - I polished the PR, added few tests and explanatory comments.

This is the initial delimited error recovery, which falls flat when recovering from no matches at all, e.g. if (KEYWORD == ...) where the inner expression doesn't even return an incomplete parse. I left that for a subsequent PR.

This changed because of the NomicFoundation#586 that was merged in the meantime.

Xanewok added 7 commits August 31, 2023 15:50

Collect delimiters for lexical contexts

6d4680c

Inline DelimitedBy parse function

de1bc65

Add ParserContext::closing_delimiters

1ec07da

WIP: Added recovery function

e48e604

Introduce new ParserResult::SkippedUntil variant

f67fe20

Properly recover skipped + match in sequences

5e8e299

Start recovering from full delim. parses followed by unexpected tokens

dd4f965

Xanewok requested a review from a team as a code owner August 31, 2023 14:52

Xanewok commented Aug 31, 2023

View reviewed changes

Xanewok added 20 commits September 1, 2023 16:10

Remove context trivia TODO

458f8af

Per NomicFoundation#586 (comment)

Remove unused cst::Node::unwrap_* APIs

c195d35

Note about expanded RuleKind::is_trivia

31dd8e3

Use a single token in SkippedUntil::expected

662693b

Simplify a check in sequence skipped recovery

989196a

fix: Include trailing trivia when delim-recovering

22a7182

Add simple YulBlock CST tests

708616c

lexer: Skip over unrecognized characters and only return None for EOF

a9777a5

tests: Add a regression test for greedy terminator scan

b1154c6

tests: Add few more failing tests

8c7e187

tests: Add some regression tests

b3932fe

feat(infra): Run smoke tests in --release mode

6bad240

feat: Respect delimiters in terminator error recovery

97edb90

fix: Don't pick recovered matches over full ones in Choice

e647ba8

refactor(choice): Better convey invariants of the control flow

2286095

fix(choice): Replace with skipped only if is a better match

9af3303

fix: Don't replace recovered matches with no matches in choice

412649f

fix: Retain recovered errors even if we backtrack in Choice

510785c

refactor: impl Sum for TextIndex

95841b1

fix: Properly advance stream when picking ParserResult::SkippedUntil

5a98437

Xanewok added 3 commits September 4, 2023 22:29

fix: Don't include parsed trivia when recovery failed

a7b484d

fix: Use RAII to ensure that the closing delimiter is properly popped

8c9da16

Previously, the body could've short-circuited (e.g. when we got an IncompleteMatch), so use RAII pattern to ensure that we will always pop the closing delimiter when escaping the sequence scope.

Add sanity check for succesful parses

4149683

Xanewok mentioned this pull request Sep 4, 2023

lexer: Skip over unrecognized characters and only return None for EOF #591

Merged

Xanewok added 3 commits September 5, 2023 09:30

chore: Remove unused now ParserResult::try_recover_with

c913277

chore: Cleanup and deduplicate recover_until_with_nested_delims

75fa300

Merge remote-tracking branch 'upstream/main' into error-recovery-deli…

195adb9

…mited

tests: Remove irrelevant or duplicate CST tests

696a869

Xanewok force-pushed the error-recovery-delimited branch from 58feead to 696a869 Compare September 6, 2023 13:43

Xanewok added 10 commits September 6, 2023 19:55

Revert "feat(infra): Run smoke tests in --release mode"

4ca5050

This reverts commit 6bad240. Separated this into NomicFoundation#592.

cleanup: Merge imports in sequence_helper.rs

01e370e

fix: Don't overwrite partial matches with less complete recoveries

4da07c7

refactor: Clean up the ChoiceHelper::attempt_pick

b88e068

tests: Add a CST regression test for delimiter recovery

4cf8d79

tests: Add struct definition CST test

bf00c93

fix: Return early if the first item in RepetitionHelper is recovering

d38d4a5

refactor: Simplify ChoiceHelper wrt flow and backtracking

943f35d

refactor: Clarify error recovery invariants in SequenceHelper

42fc38a

tests: Add a mismatched delimiter recovery test

0469c69

This also adds coverage to the sequence SkippedUntil + Match case, where the Match is empty.

Xanewok force-pushed the error-recovery-delimited branch from 643772b to 0469c69 Compare September 9, 2023 00:42

Xanewok mentioned this pull request Sep 9, 2023

Attempt error recovery for no-matches as well #594

Closed

AntonyBlakey approved these changes Sep 15, 2023

View reviewed changes

AntonyBlakey added this pull request to the merge queue Sep 15, 2023

Merged via the queue into NomicFoundation:main with commit 5276e4e Sep 15, 2023

Xanewok added a commit to Xanewok/slang that referenced this pull request Sep 15, 2023

Bless tests

f23047b

This changed because of the NomicFoundation#586 that was merged in the meantime.

Xanewok deleted the error-recovery-delimited branch September 15, 2023 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement initial delimited error recovery#586

Implement initial delimited error recovery#586
AntonyBlakey merged 44 commits intoNomicFoundation:mainfrom
Xanewok:error-recovery-delimited

Xanewok commented Aug 31, 2023 •

edited

Loading

Uh oh!

changeset-bot bot commented Aug 31, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xanewok Aug 31, 2023

Uh oh!

AntonyBlakey Aug 31, 2023

Uh oh!

Xanewok Sep 1, 2023

Uh oh!

Uh oh!

Uh oh!

Xanewok Aug 31, 2023 •

edited

Loading

Uh oh!

Xanewok commented Sep 4, 2023

Uh oh!

Xanewok commented Sep 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Xanewok commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xanewok Aug 31, 2023

Choose a reason for hiding this comment

Uh oh!

AntonyBlakey Aug 31, 2023

Choose a reason for hiding this comment

Uh oh!

Xanewok Sep 1, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Xanewok Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xanewok commented Sep 4, 2023

Uh oh!

Xanewok commented Sep 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Xanewok commented Aug 31, 2023 •

edited

Loading

changeset-bot bot commented Aug 31, 2023 •

edited

Loading

Xanewok Aug 31, 2023 •

edited

Loading