Use speculative parsing for with-items#11770
Conversation
|
992da43 to
cbd6962
Compare
| | | ||
| 9 | with (item := 10 as f): ... | ||
| 10 | with (item1, item2 := 10 as f): ... | ||
| 11 | with (x for x in range(10), item): ... | ||
| | ^ Syntax Error: Expected ',', found ')' | ||
| 12 | with (item, x for x in range(10)): ... | ||
| | ^^^^^^^^^^^^^^^^^^^^ Syntax Error: Unparenthesized generator expression cannot be used here |
There was a problem hiding this comment.
The problem here is that the speculative parsing failed so we fallback to considering parenthesized expression. The terminator token for with-items parsing with the kind being parenthesized expression is :, so it'll expect a , after a with item (item) but not at the end (:).
I think I want to improve the logic of parsing comma-separated list from (element comma) (element comma) ... to (element) (comma element) (comma element) ....
| | | ||
| 1 | # `)` followed by a newline | ||
| 2 | with (item1, item2) | ||
| | ^ Syntax Error: Expected ':', found newline | ||
| 3 | pass | ||
| | |
There was a problem hiding this comment.
This is an improvement: https://play.ruff.rs/3c1fd581-e5df-4aa4-8528-7550e6d63e4f
| 3 | | ||
| 4 | with (item1, item2),: ... | ||
| | ^ Syntax Error: Expected an expression | ||
| | ^ Syntax Error: Trailing comma not allowed | ||
| 5 | with (item1, item2), as f: ... |
There was a problem hiding this comment.
I haven't included this logic (
ruff/crates/ruff_python_parser/src/parser/statement.rs
Lines 1910 to 1940 in 9b2cf56
with (item1, item2), item3,: ...There was a problem hiding this comment.
I couldn't find a simple way to do it and I'm not sure if it's worth it. The "trailing comma" error is raised by list parsing while the "expect an expression" is raised by the with-item parsing logic. Even if I'm able to add the logic, both errors will be displayed because they're raised at separate location but they both contradict each other. So, we should only raise one of them.
There was a problem hiding this comment.
I think both errors are equally good and ask the same fo the user. Remove the comma or add an expression.
There was a problem hiding this comment.
Nice! I find the new code much easier to reason about!
The only thought I have is if we should restrict error recovery while we're doing speculative parsing to avoid that the error recovery breaks our validation if the with items is what we think it is... I do think it isn't necessary because the error recovery never eats over a character that we use to determine whether our assumption was correct () or : both exit the recovery logic).
Edit: I recommend you to run the fuzzer for a while in the background. It proved useful last time ;)
I should read the summary more carefully. You already did that.
| }, | ||
| ], | ||
| body: [], | ||
| body: [ |
| items: [ | ||
| WithItem { | ||
| range: 6..6, | ||
| range: 5..6, |
There was a problem hiding this comment.
It seems that this PR changed whether the with item includes the range of the parentheses. Is this expected?
There was a problem hiding this comment.
Yes. This is for the following code:
with (Here, the speculative parsing fails and thus it becomes a parenthesized expression.
| ExprName { | ||
| range: 149..154, | ||
| id: "item2", | ||
| range: 141..154, |
There was a problem hiding this comment.
The old parse tree here was probably slightly better.
There was a problem hiding this comment.
Yeah, missing closing parenthesis is difficult to recover from because it's occurring in list parsing which skips over the : token.
There was a problem hiding this comment.
Should the recovery skip over the :? Shouldn't it stop when seeing a : because it is part of the with items recovery set?
There was a problem hiding this comment.
For the initial assumption of parenthesized with-items, the : is not part of the recovery set (maybe it should?) and so we don't stop at : which means we can't reliably detect whether our assumption was correct. This is why it falls back to parenthesized expression and parsing it as a tuple expression.
There was a problem hiding this comment.
Yeah, i think it probably should. Or we risk running over the end of the case header
## Summary This PR is a follow-up to this discussion (#11770 (comment)) which adds the `:` token in the terminator set for parenthesized with items. The main motivation is to avoid parsing too much in speculative mode. This is evident with the following _before_ and _after_ parsed with items list for the following code: ```py with (item1, item2: foo ``` <table> <tr> <th>Before (3 items)</th> <th>After (2 items)</th> </tr> <tr> <td> <pre> parsed_with_items: [ ParsedWithItem { item: WithItem { range: 6..11, context_expr: Name( ExprName { range: 6..11, id: "item1", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 13..18, context_expr: Name( ExprName { range: 13..18, id: "item2", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 24..27, context_expr: Name( ExprName { range: 24..27, id: "foo", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ] </pre> </td> <td> <pre> parsed_with_items: [ ParsedWithItem { item: WithItem { range: 6..11, context_expr: Name( ExprName { range: 6..11, id: "item1", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ParsedWithItem { item: WithItem { range: 13..18, context_expr: Name( ExprName { range: 13..18, id: "item2", ctx: Load, }, ), optional_vars: None, }, is_parenthesized: false, }, ] </pre> </td> </tr> </table> ## Test Plan `cargo insta test`
Summary
This PR updates the with-items parsing logic to use speculative parsing instead.
Existing logic
First, let's understand the previous logic:
(, it doesn't know whether it's part of a parenthesized with items or a parenthesized expressionHere, in (3) there are lots of edge cases which we've to deal with:
ifexpressions after (3)Consider other cases like
And, this is all possible only if we allow parsing these expressions in the with item parsing logic.
Speculative parsing
With #11457 merged, we can simplify this logic by changing the step (3) from above to just rewind the parser back to the
(if our assumption (parenthesized with-items) was incorrect and then continue parsing it considering parenthesized expression.This also behaves a lot similar to what a PEG parser does which is to consider the first grammar rule and if it fails consider the second grammar rule and so on.
resolves: #11639
Test Plan