Fix NFKC normalization bug when removing unused imports#12571
Fix NFKC normalization bug when removing unused imports#12571AlexWaygood merged 3 commits intomainfrom
Conversation
8644e30 to
41192cb
Compare
|
|
Thanks for looking into this bug. This looks interesting. It would help me review if you could update the summary and include a short summary of what the issue was/how this PR fixes the issue. |
Sorry, my bad. I provided an analysis of the bug in the issue, but forgot to copy it over in the PR summary. I'll do so first thing tomorrow. |
There was a problem hiding this comment.
If I understand the change correctly, the issue is that libCST doesn't perform nfkc normalization. Is that correct?
Sorry, my bad. I provided an analysis of the bug in the issue, but forgot to copy it over in the PR summary. I'll do so first thing tomorrow.
Linking to the analysis from the issue in the comment should be sufficient. I only want to avoid that we merge the PR with a stale summary (commit message)
Yup, that's correct. Our lexer eagerly performs NFKC normalization to match Python's semantics, whereas libCST's lexer leaves NFKC confusables untouched. I don't think that's a bug in libCST, since it's a CST rather than an AST; but it is a difference that we need to account for when building |
CodSpeed Performance ReportMerging #12571 will improve performances by ×2Comparing Summary
Benchmarks breakdown
|
Fixes #12570.
Summary
Our lexer eagerly applies NFKC normalization to NKFC-confusable unicode characters; this is done to match Python's semantics at runtime, where the interpreter views these characters as the same. When removing unused imports, however, we use
libCST, which doesn't apply the same normalization (it would be incorrect forlibCSTto do so, since it's a CST rather than an AST). This can lead to aQualifiedNameconstructed from alibCSTnode not comparing equal to aQualifiedNameconstructed from aruff_python_astnode that represents the same sapn of source code as thelibCSTnode. The difference here between the two parsers is the root cause of #12570.This PR therefore takes care to apply NFKC normalization when constructing
QualifiedNames fromlibCSTnodes, so that theseQualifiedNames are always comparable toQualifiedNames constructed fromruff_python_astnodes.Test plan
cargo test