Handle graphemes by willmcgugan · Pull Request #3930 · Textualize/rich

willmcgugan · 2026-01-20T15:52:10Z

Rich has never handled multi-codepoint emoji, because terminals were highly inconsistent about how they were rendered. And there was no way to know in advance how wide a given collection of codepoints was going to be.

This is technically still the case. However modern terminals are more consistent now, and they tend to follow unicode rules. Which means that Rich can also follow those rules and get it right more of the time.

This PR uses the data from https://github.com/jquast/wcwidth to calculate widths in the same way. I didn't use wcwidth directly, as Rich has more complex requirements. Calculating the width of a string is not enough, if you want to do wrapping, cropping etc.

This PR adds the capability to split strings in to sequences that produce a single glyph in the terminal. And that is used in a few low-level functions.

The data is stored in a way that keeps startup time low, and makes width calculations more optimal than wcwidth.

This iteration should hopefully offer similar performance to the earlier versions. In the nearish future there will be a Rust library to speed up some of the lower level operations.

Fixes #3897

codecov-commenter · 2026-01-20T17:32:15Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 99.53917% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 97.88%. Comparing base (56855a6) to head (cbae922).
⚠️ Report is 98 commits behind head on master.

Files with missing lines	Patch %	Lines
rich/cells.py	99.23%	1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3930      +/-   ##
==========================================
+ Coverage   97.84%   97.88%   +0.04%     
==========================================
  Files          74       96      +22     
  Lines        8152     8315     +163     
==========================================
+ Hits         7976     8139     +163     
  Misses        176      176

Flag	Coverage Δ
unittests	`97.88% <99.53%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR adds support for multi-codepoint emoji (graphemes) to Rich by using Unicode data from wcwidth to correctly calculate character widths in terminals. This enables proper handling of complex emoji that consist of multiple codepoints joined together (like 👩‍🔧 - female mechanic).

Changes:

Replaced simple width calculation with grapheme-aware Unicode handling using data from wcwidth library
Added Unicode version-specific width tables (4.1.0 through 17.0.0) for consistent rendering across different Unicode versions
Updated text wrapping, cropping, and cell manipulation functions to work on grapheme boundaries

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tools/make_width_tables.py	New script to generate Unicode width tables from wcwidth data
tools/make_terminal_widths.py	Removed old width table generation script
rich/cells.py	Core implementation of grapheme splitting and cell width calculations
rich/_unicode_data/init.py	Unicode version loader with auto-detection and fallback logic
rich/_unicode_data/_versions.py	List of supported Unicode versions
rich/_unicode_data/unicode*.py	Generated width tables for each Unicode version (14 files)
tests/test_unicode_data.py	Tests for Unicode data loading and version selection
tests/test_text.py	Added tests for wrapping multi-codepoint emoji
tests/test_cells.py	Added comprehensive tests for grapheme splitting and cell operations
rich/main.py	Updated sponsor message styling (unrelated cosmetic change)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_text.py

rich/_unicode_data/__init__.py

tests/test_unicode_data.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

willmcgugan added 5 commits January 17, 2026 17:08

cell string class

0ffe796

cell tables

1dde8ba

f string path

72b0a9e

split text

5ffec68

move to cells.py

7001a52

willmcgugan marked this pull request as draft January 20, 2026 15:52

willmcgugan added 4 commits January 20, 2026 15:53

tables and tests

494ce86

remove reference to cell strings

69cd818

fix typing

7d4a115

test

230fdac

willmcgugan added 11 commits January 20, 2026 17:36

tests for unicode data

de76258

future

263d25b

py3.8 fix

b63cb57

typing issues

c7778bc

alternative typing

0d7f2ab

typing

ee796e1

typing

19a6fad

mypy fix

786e9de

typing

d09cc4f

tests

f924a21

old code

3c8d89c

willmcgugan changed the title ~~WIP Handle graphemes~~ Handle graphemes Jan 22, 2026

willmcgugan added 3 commits January 22, 2026 14:24

optimizations

079a6a5

restore caching behavior

9904f14

changelog

3b533b0

willmcgugan requested a review from Copilot January 22, 2026 14:41

Copilot started reviewing on behalf of willmcgugan January 22, 2026 14:41 View session

changelog

16c5575

Copilot AI reviewed Jan 22, 2026

View reviewed changes

tests/test_text.py Show resolved Hide resolved

rich/_unicode_data/__init__.py Outdated Show resolved Hide resolved

tests/test_unicode_data.py Outdated Show resolved Hide resolved

willmcgugan and others added 2 commits January 22, 2026 14:55

Update rich/_unicode_data/__init__.py

2836d72

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update tests/test_unicode_data.py

199a839

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

willmcgugan marked this pull request as ready for review January 22, 2026 15:04

willmcgugan added 3 commits January 22, 2026 15:08

typp

49647d2

credit

219b4f2

update wcwidth data, add consistant sorting

cbae922

willmcgugan merged commit f000c31 into master Jan 22, 2026
24 checks passed

willmcgugan deleted the cell-string branch January 22, 2026 16:08

johnthagen mentioned this pull request Mar 2, 2026

Missing hidden import for rich._unicode_data pyinstaller/pyinstaller#9387

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle graphemes#3930

Handle graphemes#3930
willmcgugan merged 29 commits intomasterfrom
cell-string

willmcgugan commented Jan 20, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

willmcgugan commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

willmcgugan commented Jan 20, 2026 •

edited

Loading

codecov-commenter commented Jan 20, 2026 •

edited

Loading