Skip to content

Handle graphemes#3930

Merged
willmcgugan merged 29 commits intomasterfrom
cell-string
Jan 22, 2026
Merged

Handle graphemes#3930
willmcgugan merged 29 commits intomasterfrom
cell-string

Conversation

@willmcgugan
Copy link
Member

@willmcgugan willmcgugan commented Jan 20, 2026

Rich has never handled multi-codepoint emoji, because terminals were highly inconsistent about how they were rendered. And there was no way to know in advance how wide a given collection of codepoints was going to be.

This is technically still the case. However modern terminals are more consistent now, and they tend to follow unicode rules. Which means that Rich can also follow those rules and get it right more of the time.

This PR uses the data from https://github.com/jquast/wcwidth to calculate widths in the same way. I didn't use wcwidth directly, as Rich has more complex requirements. Calculating the width of a string is not enough, if you want to do wrapping, cropping etc.

This PR adds the capability to split strings in to sequences that produce a single glyph in the terminal. And that is used in a few low-level functions.

The data is stored in a way that keeps startup time low, and makes width calculations more optimal than wcwidth.

This iteration should hopefully offer similar performance to the earlier versions. In the nearish future there will be a Rust library to speed up some of the lower level operations.

Fixes #3897

@willmcgugan willmcgugan marked this pull request as draft January 20, 2026 15:52
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 99.53917% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 97.88%. Comparing base (56855a6) to head (cbae922).
⚠️ Report is 98 commits behind head on master.

Files with missing lines Patch % Lines
rich/cells.py 99.23% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3930      +/-   ##
==========================================
+ Coverage   97.84%   97.88%   +0.04%     
==========================================
  Files          74       96      +22     
  Lines        8152     8315     +163     
==========================================
+ Hits         7976     8139     +163     
  Misses        176      176              
Flag Coverage Δ
unittests 97.88% <99.53%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@willmcgugan willmcgugan changed the title WIP Handle graphemes Handle graphemes Jan 22, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for multi-codepoint emoji (graphemes) to Rich by using Unicode data from wcwidth to correctly calculate character widths in terminals. This enables proper handling of complex emoji that consist of multiple codepoints joined together (like 👩‍🔧 - female mechanic).

Changes:

  • Replaced simple width calculation with grapheme-aware Unicode handling using data from wcwidth library
  • Added Unicode version-specific width tables (4.1.0 through 17.0.0) for consistent rendering across different Unicode versions
  • Updated text wrapping, cropping, and cell manipulation functions to work on grapheme boundaries

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tools/make_width_tables.py New script to generate Unicode width tables from wcwidth data
tools/make_terminal_widths.py Removed old width table generation script
rich/cells.py Core implementation of grapheme splitting and cell width calculations
rich/_unicode_data/init.py Unicode version loader with auto-detection and fallback logic
rich/_unicode_data/_versions.py List of supported Unicode versions
rich/_unicode_data/unicode*.py Generated width tables for each Unicode version (14 files)
tests/test_unicode_data.py Tests for Unicode data loading and version selection
tests/test_text.py Added tests for wrapping multi-codepoint emoji
tests/test_cells.py Added comprehensive tests for grapheme splitting and cell operations
rich/main.py Updated sponsor message styling (unrelated cosmetic change)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

willmcgugan and others added 2 commits January 22, 2026 14:55
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@willmcgugan willmcgugan marked this pull request as ready for review January 22, 2026 15:04
@willmcgugan willmcgugan merged commit f000c31 into master Jan 22, 2026
24 checks passed
@willmcgugan willmcgugan deleted the cell-string branch January 22, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Variation Selector U+FE0F not accounted for in cell width calculation

3 participants