Remove unicodedata dependency from idna-data tool by kjd · Pull Request #220 · kjd/idna

kjd · 2026-04-22T02:59:27Z

Summary

Replace the tool's use of unicodedata.normalize("NFKC", ...) and unicodedata.unidata_version with pre-computed NFKC_Casefold mappings from DerivedNormalizationProps.txt, downloaded from unicode.org alongside the other data files.
This ensures the tool always uses data matching the requested Unicode version rather than whatever version is bundled with the Python runtime.
Fixes a misclassification of U+A7F1 (MODIFIER LETTER CAPITAL S), which was incorrectly PVALID because the system Python had Unicode 16.0 data where this codepoint was unassigned.

…odule Replaces the dependency on unicodedata.normalize() and unicodedata.unidata_version with the pre-computed NFKC_Casefold property from DerivedNormalizationProps.txt, ensuring the tool always uses data matching the requested Unicode version rather than whatever version is bundled with the Python runtime.

U+A7F1 (MODIFIER LETTER CAPITAL S) is now correctly classified as DISALLOWED rather than PVALID, as its NFKC_CF mapping makes it unstable. Previously misclassified because the system Python had older Unicode data that didn't include this codepoint.

kjd added 2 commits April 21, 2026 19:27

kjd merged commit 5f20d1e into master Apr 22, 2026
37 checks passed

kjd deleted the unicode-next branch April 22, 2026 03:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unicodedata dependency from idna-data tool#220

Remove unicodedata dependency from idna-data tool#220
kjd merged 2 commits into
masterfrom
unicode-next

kjd commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kjd commented Apr 22, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant