Fix RUF027 false negative with escape sequences in format specs#23525
Fix RUF027 false negative with escape sequences in format specs#23525FrederickStempfle wants to merge 2 commits intoastral-sh:mainfrom
Conversation
| let spec_str = if spec.elements.interpolations().next().is_none() { | ||
| // Use decoded literal values so escape sequences like `\x64` | ||
| // are resolved before parsing. | ||
| spec_text = spec.elements.literals().map(|lit| &*lit.value).collect::<String>(); |
There was a problem hiding this comment.
I think this is over-fitting on the specific case I included in the issue. How does this PR handle a case like:
f'{1:"\n"}'I suspect that this is an issue with backslashes in the format specifier not unique to hex escapes. This case obviously causes a runtime error as written, but classes can override __format__ to make this work:
>>> class C:
... def __format__(self, spec):
... return str(self) + spec
...
>>> c = C()
>>> print(f"{c:'\n'}")
<__main__.C object at 0x7f3ad2ff4980>'
'
>>>There was a problem hiding this comment.
Good point. Worth clarifying: the fix isn't specific to hex escapes. It decodes all escape sequences by using the AST's already-decoded literal values instead of slicing the raw source so it handles \n, \t, \x64, etc. uniformly.
For the f'{1:"\n"}' case since RUF027 only applies to non-f-strings the relevant scenario would be a regular string like '{c:"\n"}'. With my fix the format spec gets decoded to an actual newline which FormatSpec::parse rejects so the string doesn't get flagged. No false positive.
As for custom __format__ overrides those are hard to handle without type information. A custom spec could be arbitrary so I think it's acceptable to leave that case undetected rather than introduce false positives trying to guess.
That said if you think the right fix is broader (e.g., skipping format spec validation entirely when the spec contains any backslash escape) I'm happy to explore that. The trade-off would be potentially flagging things like "{c:\n}" that aren't actually convertible to f-strings in most real-world code.
There was a problem hiding this comment.
I think this is a false negative not a false positive, so this spec shouldn't be rejected as invalid and RUF027 should be emitted.
If RUF027 wants additional heuristics that would be orthogonal to the issue, which is that these specs should be parsed as valid like in CPython.
There was a problem hiding this comment.
Good point. Removed the FormatSpec::parse check entirely — since Python doesn't validate spec content at the f-string level and just passes it straight to format, any string is technically a valid spec. The check was stricter than CPython and causing false negatives like the newline case. Added a test for it too.
There was a problem hiding this comment.
This is definitely not what I meant. We use the spec parsing as a way to avoid false positives for the rule, as one of the modified test cases shows. We absolutely should not delete that from RUF027.
I'm somewhat inclined to close this PR because it's not really addressing the issue at all, which doesn't even mention RUF027.
There was a problem hiding this comment.
Sorry i missed that. Should the fix just be in FormatSpec::parse itself to decode escape sequences before parsing? And then keep the RUF027 like before?
There was a problem hiding this comment.
I would check how our actual parser handles format specs and try to bring FormatSpec into alignment with that. Decoding escape sequences before parsing doesn't sound right to me, but again I'd have to check what our parser does.
But yes, I was not picturing any RUF027-specific changes when I filed the issue, although the behavior of FormatSpec obviously affects RUF027.
Python does not validate format spec content at the f-string level; it delegates that entirely to __format__. Using FormatSpec::parse to gate whether a string is an f-string candidate was therefore too strict and caused false negatives for specs with unusual but valid content (e.g. escape sequences that decode to non-standard characters like newlines). Remove the check entirely and add a test case covering a newline in a format spec, which should now correctly trigger RUF027.
Fixes #23360
RUF027 was extracting format specs as raw source text, so something like
"{a:\x64}"would pass\x64toFormatSpec::parseinstead of the decodedd. This meant it wouldn't recognize the string as a valid f-string candidate.The parser already decodes escape sequences in the AST literal elements, so I switched to using those instead of slicing the source directly.
Added a test case covering this.