Skip to content

Remove unicodedata dependency from idna-data tool#220

Merged
kjd merged 2 commits into
masterfrom
unicode-next
Apr 22, 2026
Merged

Remove unicodedata dependency from idna-data tool#220
kjd merged 2 commits into
masterfrom
unicode-next

Conversation

@kjd

@kjd kjd commented Apr 22, 2026

Copy link
Copy Markdown
Owner

Summary

  • Replace the tool's use of unicodedata.normalize("NFKC", ...) and unicodedata.unidata_version with pre-computed NFKC_Casefold mappings from DerivedNormalizationProps.txt, downloaded from unicode.org alongside the other data files.
  • This ensures the tool always uses data matching the requested Unicode version rather than whatever version is bundled with the Python runtime.
  • Fixes a misclassification of U+A7F1 (MODIFIER LETTER CAPITAL S), which was incorrectly PVALID because the system Python had Unicode 16.0 data where this codepoint was unassigned.

kjd added 2 commits April 21, 2026 19:27
…odule

Replaces the dependency on unicodedata.normalize() and
unicodedata.unidata_version with the pre-computed NFKC_Casefold
property from DerivedNormalizationProps.txt, ensuring the tool always
uses data matching the requested Unicode version rather than whatever
version is bundled with the Python runtime.
U+A7F1 (MODIFIER LETTER CAPITAL S) is now correctly classified as
DISALLOWED rather than PVALID, as its NFKC_CF mapping makes it
unstable. Previously misclassified because the system Python had
older Unicode data that didn't include this codepoint.
@kjd kjd merged commit 5f20d1e into master Apr 22, 2026
37 checks passed
@kjd kjd deleted the unicode-next branch April 22, 2026 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant