Apply NFKC normalization to unicode identifiers when storing bindings in the semantic model#10381
Closed
AlexWaygood wants to merge 6 commits intoastral-sh:mainfrom
Closed
Apply NFKC normalization to unicode identifiers when storing bindings in the semantic model#10381AlexWaygood wants to merge 6 commits intoastral-sh:mainfrom
AlexWaygood wants to merge 6 commits intoastral-sh:mainfrom
Conversation
CodSpeed Performance ReportMerging #10381 will degrade performances by 5.17%Comparing Summary
Benchmarks breakdown
|
Member
Author
|
Ouch. I guess I need to start running benchmarks before filing PRs :) |
1915fd2 to
6749d86
Compare
Contributor
|
ddc1d3a to
d79042e
Compare
… in the semantic model
This reverts commit d79042e.
7e55231 to
0385a62
Compare
Member
Author
|
Closing in favour of #10412 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #5003.
Python applies NFKC normalization to identifiers that use unicode characters. That means that F821 should not be emitted if ruff encounters the following snippet (but on
main, it is), as from Python's perspective, these are all the same identifier:I fixed this false positive by changing the
bindingsfield inruff_python_semantic/scope.rsso that identifiers are always unicode-normalized according to NFKC before being stored in the hashmap. An alternative approach I played around with was to unicode-normalize identifiers in the AST itself. However, this would have had undesirable consequences: the formatter would have started eagerly normalizing unicode characters in identifiers when it reformatted a Python file.Test Plan
cargo test