Avoid consuming trailing whitespace during re-lexing#11933
Merged
dhruvmanila merged 1 commit intomainfrom Jun 19, 2024
Merged
Conversation
dhruvmanila
commented
Jun 19, 2024
Comment on lines
1377
to
1391
| for ch in reverse_chars { | ||
| if is_python_whitespace(ch) { | ||
| new_position -= ch.text_len(); | ||
| current_position -= ch.text_len(); | ||
| } else if matches!(ch, '\n' | '\r') { | ||
| has_newline |= true; | ||
| new_position -= ch.text_len(); | ||
| current_position -= ch.text_len(); | ||
| newline_position = Some(current_position); | ||
| } else { | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| // The lexer should only be moved if there's a newline character which needs to be | ||
| // re-lexed. | ||
| if new_position != current_position && has_newline { | ||
| if let Some(newline_position) = newline_position { | ||
| // Earlier we reduced the nesting level unconditionally. Now that we know the lexer's |
Member
Author
There was a problem hiding this comment.
Now, we encode both information in a single variable:
- The lexer needs to be moved
- The position the lexer needs to be moved to
Contributor
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR updates the re-lexing logic to avoid consuming the trailing whitespace and move the lexer explicitly to the last newline character encountered while moving backwards.
Consider the following code snippet as taken from the test case highlighted with whitespace (
.) and newline (\n) characters:The parser is at
ywhen it's trying to recover from an unclosed{, so it calls into the re-lexing logic which tries to move the lexer back to the end of the previous line. But, as it consumed all whitespaces it moved the lexer to the location marked by^in the above code snippet. But, those whitespaces are part of the comment token. This means that the range for the two tokens were overlapping which introduced the panic.Note that this is only a bug when there's a comment with a trailing whitespace otherwise it's fine to move the lexer to the whitespace character. This is because the lexer would just skip the whitespace otherwise. Nevertheless, this PR updates the logic to move it explicitly to the newline character in all cases.
fixes: #11929
Test Plan
Add test cases and update the snapshot. Make sure that it doesn't panic on the code snippet in the linked issue.