Implement ForVariableUsedAfterBlock (based on wemake_python_styleguide ControlVarUsedAfterBlockViolation WPS441)#11769
Conversation
``` cargo run -p ruff -- check crates/ruff_linter/resources/test/fixtures/pylint/control_var_used_after_block.py --no-cache --select WPS441 ```
Also we can rely on undefined detection rules for the case where people use a variable before a loop that is also defined as a loop var below.
|
| code | total | + violation | - violation | + fix | - fix |
|---|---|---|---|---|---|
| RUF032 | 117 | 117 | 0 | 0 | 0 |
Fix loop var being used outside block. Before this change, we were always using the last room_id's retention policy for all events being filtered. I found this with astral-sh/ruff#11769
…lVarUsedAfterBlockViolation` `WPS441`) (#17272) Fix loop var being used outside block. Before this change, we were always using the last room_id's retention policy for all events being filtered. I found this bug with the [new lint rule, `ControlVarUsedAfterBlockViolation` `WPS441`](astral-sh/ruff#11769), that I re-implemented in `ruff`. Shout-out to @reivilibre for all the help in the beginning! ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
…lVarUsedAfterBlockViolation` `WPS441`) (element-hq#17272) Fix loop var being used outside block. Before this change, we were always using the last room_id's retention policy for all events being filtered. I found this bug with the [new lint rule, `ControlVarUsedAfterBlockViolation` `WPS441`](astral-sh/ruff#11769), that I re-implemented in `ruff`. Shout-out to @reivilibre for all the help in the beginning! ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
| @@ -0,0 +1,90 @@ | |||
| for global_var in []: | |||
There was a problem hiding this comment.
Friendly ping @charliermarsh 🙇
Just looking for the next steps here (whether that be in a queue, pinging another person, etc)
MichaReiser
left a comment
There was a problem hiding this comment.
Thanks for the contribution.
What's the motivation for limiting the rule to with and for only? Should the rule apply to all compound statements?
@AlexWaygood mentioned that there reasons to access variables from inside a with block. So maybe we should exclude with statements.
...ruff_linter/resources/test/fixtures/wemake_python_styleguide/control_var_used_after_block.py
Show resolved
Hide resolved
|
We discussed this internally and we agreed on:
The fact that |
A commit merged to the CPython repo earlier this morning has several examples in it: python/cpython@0697188. Here's a slightly simpler example than any of the functions that commit touched, from CPython's import unittest
from dataclasses import dataclass
class FrozenTests(unittest.TestCase):
def test_non_frozen_normal_derived_from_empty_frozen(self):
@dataclass(frozen=True)
class D:
pass
class S(D):
pass
s = S()
self.assertFalse(hasattr(s, 'x'))
s.x = 5
self.assertEqual(s.x, 5)
del s.x
self.assertFalse(hasattr(s, 'x'))
with self.assertRaises(AttributeError) as cm:
del s.x
self.assertNotIsInstance(cm.exception, FrozenInstanceError)The I think tests are the places where this comes up most often. You also see this pattern a lot with |
Conflicts: crates/ruff_linter/src/codes.rs crates/ruff_python_semantic/src/model.rs
Updated ✅ |
CodSpeed Performance ReportMerging #11769 will degrade performances by 5.26%Comparing Summary
Benchmarks breakdown
|
crates/ruff_linter/src/rules/ruff/rules/for_variable_used_after_block.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/for_variable_used_after_block.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/for_variable_used_after_block.rs
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/for_variable_used_after_block.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/ruff/rules/for_variable_used_after_block.rs
Outdated
Show resolved
Hide resolved
The [`known_good_reference_node_ids`](astral-sh@eaf2a35) evolved first to solve the shadowed cases but my [fix later on to allow previous references](astral-sh@a7de1cb) also ended up covering the shadowed cases without me realizing (much more elegant too). See: - astral-sh#11769 (comment) - astral-sh#11769 (comment)
Conflicts: crates/ruff_linter/src/codes.rs
ControlVarUsedAfterBlockViolation (based on wemake_python_styleguide WPS441)ForVariableUsedAfterBlock (based on wemake_python_styleguide WPS441)
ForVariableUsedAfterBlock (based on wemake_python_styleguide WPS441)ForVariableUsedAfterBlock (based on wemake_python_styleguide ControlVarUsedAfterBlockViolation WPS441)
| /// The statement in which the [`Binding`] was defined. `None` if the [`Binding`] | ||
| /// comes from a built-in. |
There was a problem hiding this comment.
Please double-check the accuracy of these comments. In any case, I'd like to document the cases so we don't have to run through this uncertainty in the future.
|
Some interesting cases from the ecosystem check Variable declared outside the loop I think the code here is correct because the variable is first defined outside the loop as being (many more) An easy fix here could be to skip the rule if a binding for the same name was already defined before the for loop. But it may also be intentional that we catch this as part of this rule. What do you think? Use in combination with try..except It seems to be a somewhat common pattern to use the for loop variable outside the for-block when combined with Multiple usages in the same statement It would be great if we only flagged each variable once (at least per statement?)
**Delete statements ** Control flow This here seems like a clear false positive but detecting it would be hard because it requires some form of control flow analysis https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/pandas/plotting/_matplotlib/boxplot.py#L520-L561 |
MichaReiser
left a comment
There was a problem hiding this comment.
The code changes look good to me but there are some common patterns that the rule isn't handling well which makes me a bit wary of shipping the rule as is. @AlexWaygood any thoughts on the examples?
MadLittleMods
left a comment
There was a problem hiding this comment.
Split the conversation from #11769 up so we can better discuss each issue and resolve.
| # Assign a variable before the loop | ||
| room_id = 3 | ||
| _ = room_id | ||
| # Use the same variable name in a loop | ||
| for room_id in []: | ||
| _ = room_id | ||
| pass | ||
|
|
||
| # ❌ After the loop is not ok because the value is probably not what you expect | ||
| _ = room_id | ||
|
|
||
| # ❌ Augmented assignment is not allowed because the value is probably not what you expect | ||
| room_id += 1 | ||
|
|
||
| # Assigning again after the loop is ok | ||
| room_id = 5 | ||
| room_id += 1 | ||
| _ = room_id |
There was a problem hiding this comment.
Variable declared outside the loop
But it may also be intentional that we catch this as part of this rule. What do you think?
I think it's intentional that we catch the indico example as part of the rule. We have an example in the test fixture file for this.
The other example from pandas could be addressed with some fancy control flow analysis (see other discussion)
There was a problem hiding this comment.
So, I think there are two different reasons why you want to avoid using a loop variable after the loop has finished:
- The first reason is that the loop might be empty, which means that the variable is never assigned:
Catching these feels like a "correctness" issue, that we should hopefully be able to do without very many false positives.
iterable = [] for x in iterable: pass print(x) # NameError
- The second is cases where we can be sure that the variable is defined, but we still forbid this pattern, perhaps on grounds of readability/style (it might not be obvious to the reader that the variable is mutated; there might be clearer ways of writing the code), or because there's a chance that the author of the code might not be familiar with Python's scoping rules, and might not have intended to reassign the variable by using the loop:
Sometimes code like this can indicate a bug, but it also might be working exactly as the author of the code intended; there's a lot of code out there that deliberately mutates a variable using a loop like this.
x = 42 for x in range(5): pass print(x) # oops, now it's 4?? The author of the code expected it to still be 42
The conclusion I draw from this is that we in fact need two rules here:
- One rule to catch instances where a loop variable is used after the loop and was not defined before the loop. This is almost always bad practice, as the loop variable might not be defined.
- One more opinionated rule that catches instances where the loop variable is definitely defined, but the mutation of the variable via the loop might not be desired. This might be a bug and might not be, depending on how experienced the developer is with Python's semantics.
For now I think I'd implement the first rule; we can consider the second rule as a followup.
| room_id = 5 | ||
| room_id += 1 | ||
| _ = room_id | ||
|
|
There was a problem hiding this comment.
Use in combination with try..except
It seems to be a somewhat common pattern to use the for loop variable outside the for-block when combined with
try..except
The trio example really lends itself well to the convenience that the non-block scoping provides. I think this is probably best resolved by adding a # noqa: RUF032 comment for the lint rule to explicitly mark the case as fine.
| room_id = 5 | ||
| room_id += 1 | ||
| _ = room_id | ||
|
|
There was a problem hiding this comment.
Multiple usages in the same statement
It would be great if we only flagged each variable once (at least per statement?)
- src/_pytest/assertion/rewrite.py:492:40: RUF032
forloop variable i is used outside of block- src/_pytest/assertion/rewrite.py:492:53: RUF032
forloop variable i is used outside of block- src/_pytest/assertion/rewrite.py:492:66: RUF032
forloop variable i is used outside of block
Seems like a decent improvement. Do we do this sort of deduplication for other rules? For the violation Diagnostic, do we combine the TextRange to cover the whole area?
| room_id = 5 | ||
| room_id += 1 | ||
| _ = room_id | ||
|
|
There was a problem hiding this comment.
Delete statements
We may explicitly want to allow usages in delete statements (example)
Seems harmless to allow 👍
There was a problem hiding this comment.
Usages in del statements are just as problematic as other usages, since they also result in NameErrors if the variable is undefined:
>>> iterable = []
>>> for item in iterable:
... pass
...
>>> del item
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'item' is not defined. Did you mean: 'iter'?| room_id = 5 | ||
| room_id += 1 | ||
| _ = room_id | ||
|
|
There was a problem hiding this comment.
Control flow
This here seems like a clear false positive but detecting it would be hard because it requires some form of control flow analysis
The pandas example makes sense to allow if we can detect the context via control flow analysis. The other pandas example makes sense for this as well since it has a return statement to guard it every leaking below.
But I think it's good that we catch bokeh examples as sketchy. I don't see how we could use control flow analysis to see that they're fine?
I'm assuming we don't want to actually dive into implementing the control flow analysis in this first iteration?
Summary
Add
ForVariableUsedAfterBlock(RUF032) based onControlVarUsedAfterBlockViolation(WPS441) from thewemake_python_styleguide. This rule prevents you from usingfor-loop variables outside of the loop block(or a.withcontext variable)Example:
withstatementPart of #3845
This is spawning from a real-world pain where I accidentally used one for-loop control variable in another loop, see element-hq/synapse#17187 (comment)
Relevant wtfPython around loop variables leaking out, https://github.com/satwikkansal/wtfPython?tab=readme-ov-file#-loop-variables-leaking-out
Please bear in mind that this is my first Rust code so I'm probably doing a few things wrong 🙇. Shout out to @reivilibre who helped me in the beginning!
Test Plan
Tested against the fixture file:
Previous command before the rename
Also tested against the https://github.com/element-hq/synapse codebase. It does trigger in several spots that work because Python works this way but could be refactored to align with this rule. Some of the usages seem slightly sketchy but it requires some more context/time to determine whether they are bugs in the Synapse code.
Here is a real world bug that it caught: element-hq/synapse#17272
Dev notes
Running
ruffagainst your test file with your new rule:Getting started
CONTRIBUTING.md#prerequisitesCONTRIBUTING.md#example-adding-a-new-lint-ruleRunning the
pre-commithooks without installing to your system:python -m venv /home/eric/Downloads/ruff-venv source /home/eric/Downloads/ruff-venv/bin/activate pip install pre-commitDebug
NodeId/node_idAdd the following method to
crates/ruff_python_semantic/src/model.rsand useprintln!("nodes {:#?}", checker.semantic().nodes());. It would be really nice if the node index/ID was printed next to each one but you can manually number things to get by.Inspect the AST
Helpful related
ruffrulesunused_variableredefined_loop_nametoo_many_nested_blocksunused_loop_control_variable