Skip to content

Avoid consuming trailing whitespace during re-lexing#11933

Merged
dhruvmanila merged 1 commit intomainfrom
dhruv/re-lex-comment-trailing-whitespace
Jun 19, 2024
Merged

Avoid consuming trailing whitespace during re-lexing#11933
dhruvmanila merged 1 commit intomainfrom
dhruv/re-lex-comment-trailing-whitespace

Conversation

@dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Jun 19, 2024

Summary

This PR updates the re-lexing logic to avoid consuming the trailing whitespace and move the lexer explicitly to the last newline character encountered while moving backwards.

Consider the following code snippet as taken from the test case highlighted with whitespace (.) and newline (\n) characters:

# There are trailing whitespace before the newline character but those whitespaces are
# part of the comment token
f"""hello {x # comment....\n
#                     ^
y = 1\n

The parser is at y when it's trying to recover from an unclosed {, so it calls into the re-lexing logic which tries to move the lexer back to the end of the previous line. But, as it consumed all whitespaces it moved the lexer to the location marked by ^ in the above code snippet. But, those whitespaces are part of the comment token. This means that the range for the two tokens were overlapping which introduced the panic.

Note that this is only a bug when there's a comment with a trailing whitespace otherwise it's fine to move the lexer to the whitespace character. This is because the lexer would just skip the whitespace otherwise. Nevertheless, this PR updates the logic to move it explicitly to the newline character in all cases.

fixes: #11929

Test Plan

Add test cases and update the snapshot. Make sure that it doesn't panic on the code snippet in the linked issue.

@dhruvmanila dhruvmanila added bug Something isn't working parser Related to the parser labels Jun 19, 2024
@dhruvmanila dhruvmanila requested a review from MichaReiser as a code owner June 19, 2024 02:52
Comment on lines 1377 to 1391
for ch in reverse_chars {
if is_python_whitespace(ch) {
new_position -= ch.text_len();
current_position -= ch.text_len();
} else if matches!(ch, '\n' | '\r') {
has_newline |= true;
new_position -= ch.text_len();
current_position -= ch.text_len();
newline_position = Some(current_position);
} else {
break;
}
}

// The lexer should only be moved if there's a newline character which needs to be
// re-lexed.
if new_position != current_position && has_newline {
if let Some(newline_position) = newline_position {
// Earlier we reduced the nesting level unconditionally. Now that we know the lexer's
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, we encode both information in a single variable:

  1. The lexer needs to be moved
  2. The position the lexer needs to be moved to

@github-actions
Copy link
Contributor

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

@dhruvmanila dhruvmanila merged commit cdc7c71 into main Jun 19, 2024
@dhruvmanila dhruvmanila deleted the dhruv/re-lex-comment-trailing-whitespace branch June 19, 2024 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working parser Related to the parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rule E302 cause panic

2 participants