Start tracking quoting style in the AST#10298
Conversation
|
Performance is neutral on all benchmarks according to Codspeed. |
|
MichaReiser
left a comment
There was a problem hiding this comment.
Nice. It would be nice to use the new string flags in the formatter to avoid parsing them from source. But that's nothing for this PR.
ruff/crates/ruff_python_formatter/src/string/mod.rs
Lines 121 to 144 in 72bf1c2
I posted a solution that avoids the need to change FStringStart. I would prefer if we keep it as is to reduce the diff size.
crates/ruff_python_ast/src/nodes.rs
Outdated
| } | ||
|
|
||
| /// Does the string have a `u` or `U` prefix? | ||
| pub const fn is_u_string(&self) -> bool { |
There was a problem hiding this comment.
Nit,
| pub const fn is_u_string(&self) -> bool { | |
| pub const fn is_unicode(&self) -> bool { |
Potentially without the string postfix because that's given by the enclosing type. E.g it's also is_triple_quoted and not is_triple_quoted_string. E.g let's compare some call-sites: string_literal.is_unicode_string() seems repetivie in comparison to string_literal.is_unicode()
| impl From<ruff_python_ast::str::QuoteStyle> for Quote { | ||
| fn from(value: ruff_python_ast::str::QuoteStyle) -> Self { | ||
| match value { | ||
| ruff_python_parser::QuoteStyle::Double => Self::Double, | ||
| ruff_python_parser::QuoteStyle::Single => Self::Single, | ||
| ruff_python_ast::str::QuoteStyle::Double => Self::Double, | ||
| ruff_python_ast::str::QuoteStyle::Single => Self::Single, | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Can we replace the Quote type with QuoteStyle? I would prefer to minimize the number of Quotelike types we have.
There was a problem hiding this comment.
We could do, but it would require making the ruff_python_literal crate have a dependency on the ruff_python_ast crate (or vice versa), due to this:
ruff/crates/ruff_python_codegen/src/stylist.rs
Lines 123 to 130 in 965adbe
If we used ruff_python_ast::str::QuoteStyle there instead of ruff_python_codegen::stylist::Quote, we wouldn't be able to have the implementation of that trait in the ruff_python_codegen crate, since the QuoteStyle enum comes from the ruff_python_literal crate. ruff_python_literal does not currently depend on ruff_python_ast (nor vice versa).
| .with_value(capital_env_var.into_boxed_str()) | ||
| .with_prefix({ | ||
| if env_var.is_unicode() { | ||
| StringLiteralPrefix::UString |
There was a problem hiding this comment.
Nit: I think I would prefer Unicode, it avoids the string repetition
There was a problem hiding this comment.
I dislike Unicode, because all strings in Python are unicode... the u prefix does nothing at runtime, it's purely there for Python 2 compatibility. (In fact, it was originally removed in Python 3.0, but they added it back in a later Python 3 version because it was so difficult to write code that worked on both Python 2 and Python 3 if the prefix was gone in Python 3.)
There was a problem hiding this comment.
That's fair. But it is the unicode flag ;) Can you do a quick check on what we call it in other places in the code base? It would be nice if it is consistent.
There was a problem hiding this comment.
I did a grep -- there's a pyupgrade rule called UnicodeKindPrefix, but other than that, I can't see much use of the term Unicode across the codebase to describe this prefix...
I'd still rather keep this as-is -- I personally would find it really confusing to refer to these strings as "unicode strings", given that all strings are unicode strings, even without the prefix
c8bdaf0 to
7ab7c5a
Compare
7ab7c5a to
703dcf3
Compare
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information: - The quoting style used (double or single quotes) - Whether the string is triple-quoted or not - Whether the string is raw or not This PR is a followup to astral-sh#10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the `ExprGenerator` in a followup PR, which will allow us to provide a fix for astral-sh#7799. The information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on `Tok::String`, `Tok::FStringStart` and `Tok::FStringMiddle` tokens in astral-sh#10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node. Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a `u` prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.
Summary
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information:
This PR is a followup to #10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the
ExprGeneratorin a followup PR, which will allow us to provide a fix for #7799.The PR should be easiest to review commit-by-commit:
ast::StringLiteralnodesast::BytesLiteralnodesast::FStringnodesThe information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on
Tok::String,Tok::FStringStartandTok::FStringMiddletokens in #10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node.Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a
uprefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.Test Plan
cargo test