[refurb] Count codepoints not bytes for slice-to-remove-prefix-or-suffix (FURB188)#13631
[refurb] Count codepoints not bytes for slice-to-remove-prefix-or-suffix (FURB188)#13631MichaReiser merged 9 commits intoastral-sh:mainfrom
slice-to-remove-prefix-or-suffix (FURB188)#13631Conversation
|
|
Another subtlety worth testing is strings with surrogates. In Python, each surrogate counts as 1 and surrogate pairs are not special so they count as 2; for example, |
TIL @dscorbett - neat! Added a test for this, and it appears to be handled correctly (I think this happens in the guts of the parser, so by the time I'm looking at |
|
I think the reason it works is that Ruff’s representation of a Python string as a Rust string replaces surrogates with replacement characters. That is fine for counting the code points but could be a problem for other rules. |
MichaReiser
left a comment
There was a problem hiding this comment.
Nice, thanks. I only have two nit comments.
| .and_then(ast::Int::as_u32) | ||
| .and_then(|x| usize::try_from(x).ok()) |
There was a problem hiding this comment.
I suggest converting to a u64 considering that you have to use usize::try_from anyways (for 32 bit platforms)
| .and_then(ast::Int::as_u32) | |
| .and_then(|x| usize::try_from(x).ok()) | |
| .and_then(ast::Int::as_u64) | |
| .and_then(|x| usize::try_from(x).ok()) |
There was a problem hiding this comment.
Or you could consider adding a as_usize method to ast::Int
There was a problem hiding this comment.
went with the latter
| // Only support prefix removal for size at most `u32::MAX` | ||
| .and_then(ast::Int::as_u32) | ||
| .and_then(|x| usize::try_from(x).ok()) | ||
| .is_some_and(|x| x == string_val.to_str().chars().count()), |
There was a problem hiding this comment.
| .is_some_and(|x| x == string_val.to_str().chars().count()), | |
| .is_some_and(|x| x == string_val.chars().count()), |
| .and_then(ast::Int::as_u32) | ||
| .is_some_and(|x| x == string_val.to_str().text_len().to_u32()) | ||
| .and_then(|x| usize::try_from(x).ok()) | ||
| .is_some_and(|x| x == string_val.to_str().chars().count()) |
There was a problem hiding this comment.
| .is_some_and(|x| x == string_val.to_str().chars().count()) | |
| .is_some_and(|x| x == string_val.chars().count()) |
This PR fixes the calculation of string length for the purposes of verifying when to suggest
removeprefix/removesuffix(FURB188). Before, we usedtext_lenwhich was counting bytes rather than codepoints (chars) and therefore disagreed with Python'slenfor non-ASCII text.Closes #13620