Skip to content

[refurb] Fix FURB101 and FURB103 false positives when I/O variable is used later#23542

Merged
amyreese merged 20 commits intoastral-sh:mainfrom
chirizxc:FURB10_{1,3}
Mar 4, 2026
Merged

[refurb] Fix FURB101 and FURB103 false positives when I/O variable is used later#23542
amyreese merged 20 commits intoastral-sh:mainfrom
chirizxc:FURB10_{1,3}

Conversation

@chirizxc
Copy link
Contributor

Summary

Fix #21483

Test Plan

cargo nextest run furb

@astral-sh-bot
Copy link

astral-sh-bot bot commented Feb 24, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+5 -6 violations, +0 -0 fixes in 4 projects; 52 projects unchanged)

apache/superset (+1 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview --select ALL

+ superset/db_engine_specs/lint_metadata.py:676:18: FURB103 [*] `open` and `write` should be replaced by `Path(args.output).write_text(output_text)`
- superset/db_engine_specs/lint_metadata.py:676:18: FURB103 [*] `open` and `write` should be replaced by `Path(args.output).write_text(output_text)`

ibis-project/ibis (+1 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview

+ .github/workflows/algolia/upload-algolia-api.py:165:14: FURB101 `open` and `read` should be replaced by `Path(qmd).read_text()`
- .github/workflows/algolia/upload-algolia-api.py:165:14: FURB101 `open` and `read` should be replaced by `Path(qmd).read_text()`

zulip/zulip (+1 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview --select ALL

+ zerver/lib/test_helpers.py:766:10: FURB103 [*] `open` and `write` should be replaced by `Path(filepath).write_text("zulip!")`
- zerver/lib/test_helpers.py:766:10: FURB103 [*] `open` and `write` should be replaced by `Path(filepath).write_text("zulip!")`

astropy/astropy (+2 -3 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview

+ astropy/io/ascii/tests/test_read.py:1776:30: FURB101 `open` and `read` should be replaced by `Path(fpath).read_text()`
- astropy/io/ascii/tests/test_read.py:1776:30: FURB101 `open` and `read` should be replaced by `Path(fpath).read_text()`
+ astropy/io/ascii/tests/test_read.py:2164:14: FURB101 [*] `open` and `read` should be replaced by `Path(filename).read_text()`
- astropy/io/ascii/tests/test_read.py:2164:14: FURB101 [*] `open` and `read` should be replaced by `Path(filename).read_text()`
- astropy/utils/tests/test_data.py:605:14: FURB103 [*] `open` and `write` should be replaced by `Path(f_name).write_text("old")`

Changes by rule (2 rules affected)

code total + violation - violation + fix - fix
FURB101 6 3 3 0 0
FURB103 5 2 3 0 0

@chirizxc chirizxc changed the title [refurb] Fix FURB101, FURB103 [refurb] Fix FURB101 and FURB103 false positives when I/O variable is used later Feb 24, 2026
@chirizxc chirizxc changed the title [refurb] Fix FURB101 and FURB103 false positives when I/O variable is used later [refurb] Fix FURB101 and FURB103 false positives when I/O variable is used later Feb 24, 2026
@chirizxc

This comment was marked as resolved.

Copy link
Member

@amyreese amyreese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Overall I think this is a very large amount of code. Is all of this actually necessary just to see if a variable is used somewhere else? Can we skip checking if it's rebound elsewhere? Erring on the side of a false negative would generally be better than false positives, and if that could greatly reduce the amount of logic necessary, that may be worth it.

Comment on lines +92 to +97
let following_statements = following_statements_after_with(
with,
checker.semantic().current_statement_parent(),
checker.python_ast(),
);
let candidates = find_file_opens(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing up-front work to find all the statements for every with block, even if it has nothing to do with opening a file, this should first see if there are any candidates, and only then fetch the following statements and filter the list of candidates afterwards.

Or at the very least, the call to get following statements should happen further inside of say, resolve_file_open where the data will actually be used, at the point that it will actually be needed and no sooner.

@amyreese
Copy link
Member

amyreese commented Feb 24, 2026

Also, looking at the ecosystem results, it seems we're already pushing towards false negatives here, so I'm not sure how much this is actually benefiting the rule. Were there any false positives in the ecosystem report that should have been resolved with this PR?

@amyreese amyreese added the fixes Related to suggested fixes for violations label Feb 24, 2026
@chirizxc
Copy link
Contributor Author

Also, looking at the ecosystem results, it seems we're already pushing towards false negatives here, so I'm not sure how much this is actually benefiting the rule. Were there any false positives in the ecosystem report that should have been resolved with this PR?

I looked at the changes in the ecosystem from the bot's post, none of these changes are false negatives, since the variables below are used in f.name or f.is_file()

@amyreese
Copy link
Member

amyreese commented Feb 24, 2026

I looked at the changes in the ecosystem from the bot's post, none of these changes are false negatives, since the variables below are used in f.name or f.is_file()

maybe I'm missing something, but f in the airflow ecosystem examples are rebound as part of the comprehensions, so they were legit positives that have been silenced with this PR:

with open(file_path, "w") as f:
    f.write("hello world")

...

files_in_repo = {f.name for f in bundle.path.iterdir() if f.is_file()}

note the for f in ... part of the comprehension means f.name and f.is_file() are not referring to the same f from the context manager above.

@amyreese amyreese self-assigned this Feb 25, 2026
@chirizxc chirizxc marked this pull request as draft February 25, 2026 07:08
@chirizxc
Copy link
Contributor Author

chirizxc commented Feb 25, 2026

I looked at the changes in the ecosystem from the bot's post, none of these changes are false negatives, since the variables below are used in f.name or f.is_file()

maybe I'm missing something, but f in the airflow ecosystem examples are rebound as part of the comprehensions, so they were legit positives that have been silenced with this PR:

with open(file_path, "w") as f:
    f.write("hello world")

...

files_in_repo = {f.name for f in bundle.path.iterdir() if f.is_file()}

note the for f in ... part of the comprehension means f.name and f.is_file() are not referring to the same f from the context manager above.

Yes, you were right, when I looked at these bot ecosystem results, I didn't pay attention

@chirizxc chirizxc marked this pull request as ready for review February 26, 2026 18:55
@amyreese
Copy link
Member

Overall I'm still concerned about the amount of code here, especially when it resolves only a single false positive in the ecosystem report. Would prefer that this is simplified, and lean more on the existing features of the semantic model rather than reimplementing its own version of semantic model and control flow. Brent suggested that only_binding might be helpful here, or maybe consider just return early if the binding is used for anything beyond the read call, without trying to map control flow?

@chirizxc
Copy link
Contributor Author

Overall I'm still concerned about the amount of code here, especially when it resolves only a single false positive in the ecosystem report. Would prefer that this is simplified, and lean more on the existing features of the semantic model rather than reimplementing its own version of semantic model and control flow. Brent suggested that only_binding might be helpful here, or maybe consider just return early if the binding is used for anything beyond the read call, without trying to map control flow?

I will try to reduce the amount of code, but ecosystem analysis projects do not involve all Python code, so I think there is more than one case.

@ntBre
Copy link
Contributor

ntBre commented Mar 2, 2026

I was mostly surprised that we need a new Visitor here. Would moving the rule to a different analysis phase help? I thought we had pretty nice helpers for checking if a variable is used later without tracking that information in a new visitor.

@chirizxc
Copy link
Contributor Author

chirizxc commented Mar 2, 2026

I was mostly surprised that we need a new Visitor here. Would moving the rule to a different analysis phase help? I thought we had pretty nice helpers for checking if a variable is used later without tracking that information in a new visitor.

Hmm, I didn't even know about that. It seems more logical now. Thank you very much.

Copy link
Member

@amyreese amyreese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better to me, but I'm not quite familiar with the analysis bits.

@amyreese amyreese requested a review from ntBre March 3, 2026 20:11
@amyreese
Copy link
Member

amyreese commented Mar 3, 2026

Will wait for @ntBre to take a look.

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Yes, this makes sense to me. I'm slightly wary of deferring too many checks, especially since I made a similar suggestion twice in quick succession after not using it much before, but this does seem better than trying to restructure the rules to fit into deferred_scopes. I also don't see any negative perf effects of the deferral, which makes sense to me given how lightweight the snapshots are.

I think this is fine as-is, but if we defer any more types of statements, it probably makes sense to have a single deferred_statements with a match on the statement kind instead of adding more deferred_for_loops, deferred_with_statements type functions and fields.

@ntBre ntBre added the preview Related to preview mode features label Mar 3, 2026
@amyreese amyreese merged commit d740e84 into astral-sh:main Mar 4, 2026
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fixes Related to suggested fixes for violations preview Related to preview mode features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FURB101 and FURB103 shouldn’t apply when the I/O variable is used later

3 participants