Skip to content

Implement initial delimited error recovery#586

Merged
AntonyBlakey merged 44 commits intoNomicFoundation:mainfrom
Xanewok:error-recovery-delimited
Sep 15, 2023
Merged

Implement initial delimited error recovery#586
AntonyBlakey merged 44 commits intoNomicFoundation:mainfrom
Xanewok:error-recovery-delimited

Conversation

@Xanewok
Copy link
Copy Markdown
Contributor

@Xanewok Xanewok commented Aug 31, 2023

Fixes #553

This roughly corresponds to our idea that we talked about:

  1. we keep the stack of closing delimiters in the ParserContext
  2. push the expected closing delimiter (context-dependent) after we enter the delimited group
  3. Attempt to recover from the delimited body parse by using greedy scan, in case it's incomplete or there's no closing delimiter immediately following
    a. We keep a local stack, push and pop if we see the corresponding open/close delimiters
    b. Only recover when the local stack is empty and the expected delimiter is encountered

This helps us recover from some of the simpler cases; it'd be great to follow up with better recovery in the terminated/delimited scenario and I need to put more thought how we can best recover from cases like { ( } )

EDIT:
After noticing that

function func() {
  uint a = 1 + 2 * 3;
}

skips the 1 + 2 * 3 in the final CST, I had to go back and fix how eager we are when attempting the recovery.

The main fix is in e647ba8 and the rest is mainly fixing the resulting regressions.
When backtracking/in a choice, we now will pick parses that skip less input. Without recovery, we had the following:

  • match (a full match)
  • incomplete match (partial prefix match)
  • no match

and so, the choice is obvious - match > incomplete match (longer prefix is better) > no match.

However, this PR introduces a new kind of a recovered match. For the purposes of the old system, we treat it as a match, but we disambiguate by recursively checking if the node contains a TokenKind::SKIPPED (in which case, we attempted a recovery). This is necessary, since now the recovered matches can have skipped tokens at any point inside and we need to pick the best matching one, so that we always still pick the "correct" parse over a recovered one.

This is suboptimal and we may optimize that in a follow-up by introducing a RecoveredMatch variant and propagate it upwards and/or also keep the non-skipped text length of the descendent tokens to avoid recalculating that with each choice consideration. I hope that this can be simplified once we cut down on the backtracking at some point.

I checked with the smoke test suite and it looks that we don't seem to have misparses anymore (although it would be great to get a second set of eyes on the results) and that the errors greatly stem from other issues (e.g. from/error contextual keywords not being handled, incorrect pragma solidity annotations in source).

@Xanewok Xanewok requested a review from a team as a code owner August 31, 2023 14:52
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Aug 31, 2023

⚠️ No Changeset found

Latest commit: 0469c69

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

/// At which token was the stream pointing at when we bailed
pub found: TokenKind,
/// Token we expected to find
pub expected_tokens: Vec<TokenKind>,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Vec to deduplicate code in the parser_function.rs but I can rework it so that it's only a single TokenKind

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have correct types constraining the data than optimising for dedup code IMO. Like recent is_done in helpers. Very strong believer in type constraint (in spite of the fact that I was on the wrong side of is_done).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; changed in 662693b

Comment on lines +133 to +135
if is_single_token_with_trivia
&& found_token.map_or(false, |t| running.expected_tokens.contains(&t.kind))
{
Copy link
Copy Markdown
Contributor Author

@Xanewok Xanewok Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hacky as this couples a bit the recovery and the sequence helper logic here; Also trivia handling seems wrong when I tried to plug this same mechanism to the TerminatedBy. Will need to investigate further, but maybe it's good enough for just DelimitedBy for now?

Fixed the trivia handling and the termination recovery uses the same recovery mechanism. The only thing that's left is the coupling between the SequenceHelper and this case.

Decided to leave this here, as both terminated-by and delimited-by uses this recovery mechanism and separated-by will likely follow.

Previously, the body could've short-circuited (e.g. when we got an
IncompleteMatch), so use RAII pattern to ensure that we will always pop
the closing delimiter when escaping the sequence scope.
@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Sep 4, 2023

Updated, see the OP.

github-merge-queue bot pushed a commit that referenced this pull request Sep 6, 2023
…#591)

Does what it says on the tin.

Helps recover past unrecognized characters while still stopping at EOF,
see test change in
a9777a5
(#586).

cc @AntonyBlakey since we talked about this
@Xanewok Xanewok force-pushed the error-recovery-delimited branch from 58feead to 696a869 Compare September 6, 2023 13:43
@Xanewok Xanewok force-pushed the error-recovery-delimited branch from 643772b to 0469c69 Compare September 9, 2023 00:42
@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Sep 9, 2023

This should be ready for review - I polished the PR, added few tests and explanatory comments.

This is the initial delimited error recovery, which falls flat when recovering from no matches at all, e.g. if (KEYWORD == ...) where the inner expression doesn't even return an incomplete parse. I left that for a subsequent PR.

@AntonyBlakey AntonyBlakey added this pull request to the merge queue Sep 15, 2023
Merged via the queue into NomicFoundation:main with commit 5276e4e Sep 15, 2023
Xanewok added a commit to Xanewok/slang that referenced this pull request Sep 15, 2023
This changed because of the NomicFoundation#586 that was merged in the meantime.
@Xanewok Xanewok deleted the error-recovery-delimited branch September 15, 2023 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

implement Delimited Error Recovery

2 participants