Reset cursor at dedent after whitespace#11630
Reset cursor at dedent after whitespace#11630dhruvmanila merged 1 commit intodhruv/parser-phase-2from
Conversation
CodSpeed Performance ReportMerging #11630 will create unknown performance changesComparing Summary
Benchmarks breakdown
|
|
| code | total | + violation | - violation | + fix | - fix |
|---|---|---|---|---|---|
| E117 | 152372 | 0 | 152372 | 0 | 0 |
| E113 | 73488 | 0 | 73488 | 0 | 0 |
| E116 | 6080 | 0 | 6080 | 0 | 0 |
| RUF100 | 1 | 1 | 0 | 0 | 0 |
Formatter (stable)
✅ ecosystem check detected no format changes.
Formatter (preview)
✅ ecosystem check detected no format changes.
2d34e13 to
e8ab801
Compare
4faed6a to
b600b42
Compare
MichaReiser
left a comment
There was a problem hiding this comment.
There are in total 5 usages of TextRange::empty(offset)) in the existing Lexer code. Would you mind reviewing if we need to make the same change in other places?
| // | ||
| // Here, the cursor is at `^` and the `indentation` contains the whitespaces before | ||
| // the `pass` token. | ||
| self.cursor.start_token(); |
There was a problem hiding this comment.
I must admit that reading self.cursor.start_token() is hard to understand. Why are we starting a token when we're just about to return it. Your comment certainly helps. That makes me wonder if we can instead add a method on Lexer with a more meaningful name.
self.update_token_start_to_current_offset() // A bi tlong
self.reset_token_start()but I must admit, I don't find them much better than what we have. But maybe you have an idea for a better name?
There was a problem hiding this comment.
Yeah, I agree. Let me think about this.
I did review them, sorry for not mentioning in the PR description. This isn't needed in all places, it's only require if the lexer skips whitespaces. For posterity, the usages are: ruff/crates/ruff_python_parser/src/lexer.rs Lines 853 to 863 in 685d11a ⬆️ This emits any pending dedent tokens. There are no whitespaces consumed by the lexer before this, so no need to reset the cursor. ruff/crates/ruff_python_parser/src/lexer.rs Lines 1031 to 1041 in 685d11a ⬆️ We already reset the cursor before this function is called ⬇️ ruff/crates/ruff_python_parser/src/lexer.rs Lines 877 to 898 in b600b42 |
## Summary This PR fixes a bug to reset the cursor before emitting the `Dedent` token which had preceding whitespaces. Previously, we would just use `TextRange::empty(self.offset())` because the range for each token would be computed at the place where it was emitted. Now, the lexer handles this centrally in `next_token` which means we require special handling at certain places. This is one such case. Computing the range centrally in `next_token` has a benefit that the return type of lexer methods is a simple `TokenKind` but it does add a small complexity to handle such cases. I think it's worth doing this but if anyone thinks otherwise, feel free to comment. ## Test Plan Add a test case and update the snapshot.
Summary
This PR fixes a bug to reset the cursor before emitting the
Dedenttoken which had preceding whitespaces.Previously, we would just use
TextRange::empty(self.offset())because the range for each token would be computed at the place where it was emitted. Now, the lexer handles this centrally innext_tokenwhich means we require special handling at certain places. This is one such case.Computing the range centrally in
next_tokenhas a benefit that the return type of lexer methods is a simpleTokenKindbut it does add a small complexity to handle such cases. I think it's worth doing this but if anyone thinks otherwise, feel free to comment.Test Plan
Add a test case and update the snapshot.