Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3930 +/- ##
==========================================
+ Coverage 97.84% 97.88% +0.04%
==========================================
Files 74 96 +22
Lines 8152 8315 +163
==========================================
+ Hits 7976 8139 +163
Misses 176 176
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds support for multi-codepoint emoji (graphemes) to Rich by using Unicode data from wcwidth to correctly calculate character widths in terminals. This enables proper handling of complex emoji that consist of multiple codepoints joined together (like 👩🔧 - female mechanic).
Changes:
- Replaced simple width calculation with grapheme-aware Unicode handling using data from wcwidth library
- Added Unicode version-specific width tables (4.1.0 through 17.0.0) for consistent rendering across different Unicode versions
- Updated text wrapping, cropping, and cell manipulation functions to work on grapheme boundaries
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/make_width_tables.py | New script to generate Unicode width tables from wcwidth data |
| tools/make_terminal_widths.py | Removed old width table generation script |
| rich/cells.py | Core implementation of grapheme splitting and cell width calculations |
| rich/_unicode_data/init.py | Unicode version loader with auto-detection and fallback logic |
| rich/_unicode_data/_versions.py | List of supported Unicode versions |
| rich/_unicode_data/unicode*.py | Generated width tables for each Unicode version (14 files) |
| tests/test_unicode_data.py | Tests for Unicode data loading and version selection |
| tests/test_text.py | Added tests for wrapping multi-codepoint emoji |
| tests/test_cells.py | Added comprehensive tests for grapheme splitting and cell operations |
| rich/main.py | Updated sponsor message styling (unrelated cosmetic change) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Rich has never handled multi-codepoint emoji, because terminals were highly inconsistent about how they were rendered. And there was no way to know in advance how wide a given collection of codepoints was going to be.
This is technically still the case. However modern terminals are more consistent now, and they tend to follow unicode rules. Which means that Rich can also follow those rules and get it right more of the time.
This PR uses the data from https://github.com/jquast/wcwidth to calculate widths in the same way. I didn't use wcwidth directly, as Rich has more complex requirements. Calculating the width of a string is not enough, if you want to do wrapping, cropping etc.
This PR adds the capability to split strings in to sequences that produce a single glyph in the terminal. And that is used in a few low-level functions.
The data is stored in a way that keeps startup time low, and makes width calculations more optimal than wcwidth.
This iteration should hopefully offer similar performance to the earlier versions. In the nearish future there will be a Rust library to speed up some of the lower level operations.
Fixes #3897