Update specs and zero-width tables regarding Mc by jquast · Pull Request #200 · jquast/wcwidth

jquast · 2026-01-28T23:17:12Z

Closes #155, Should Combining characters of Category 'Mc' be width 1? Yes.

As I wrote then, "Probably the largest obstacle has been finding 1 or 2 Terminal emulators that correctly display languages that include Mc characters.". I felt it would be too progressive, but I have since found some TE's capable of measuring and rendering complex script like Devanagari correctly.

I expect ucs-detect results for sanskrit-based languages to improve in next release with this update. Still doing some testing and integration.

Previous to this change,

Shell test using printf(1), '|' should align in output:

$ printf "\xe0\xa4\xae\xe0\xa4\xbe\xe0\xa4\xa8\xe0\xa4\xb5\xe0\xa4\xbe\xe0\xa4\xa7\xe0\xa4\xbf\xe0\xa4\x95\xe0\xa4\xbe\xe0\xa4\xb0\xe0\xa4\xbe\xe0\xa4\xa3\xe0\xa4\xbe\xe0\xa4\x82|\\n1234567|\\n"
मानवाधिकाराणां|
1234567|

After integration with ucs-detect, I expect:

Shell test using printf(1), '|' should align in output:

$ printf "\xe0\xa4\xae\xe0\xa4\xbe\xe0\xa4\xa8\xe0\xa4\xb5\xe0\xa4\xbe\xe0\xa4\xa7\xe0\xa4\xbf\xe0\xa4\x95\xe0\xa4\xbe\xe0\xa4\xb0\xe0\xa4\xbe\xe0\xa4\xa3\xe0\xa4\xbe\xe0\xa4\x82|\\n1234567890123|\\n"
मानवाधिकाराणां|
1234567890123|

codspeed-hq · 2026-01-28T23:49:06Z

CodSpeed Performance Report

Merging this PR will degrade performance by 15.11%

_{Comparing jq/category-mc (b8fd762) with master (8d27c08)}

Summary

❌ 2 regressed benchmarks
✅ 52 untouched benchmarks
🆕 2 new benchmarks
⏩ 3 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
🆕	`test_width_fastpath_integrity_udhr`	N/A	10.8 s	N/A
❌	`test_width_decomposed`	2.1 ms	2.4 ms	-13.32%
🆕	`test_width_wcswidth_consistency_udhr`	N/A	6.8 s	N/A
❌	`test_wcswidth_decomposed`	1.8 ms	2.2 ms	-15.11%

3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

find useful URL's for various terms in specification to unicode's specifications. Also reference, "Rendering Nonspacing Marks"

I've seen it written both ways ..

codecov · 2026-01-29T00:14:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (8d27c08) to head (4769a8a).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##            master      #200   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           14        15    +1     
  Lines          885       896   +11     
  Branches       225       227    +2     
=========================================
+ Hits           885       896   +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

please welcome the new all time rapper category mc, knows all the signs and hes got a lot to say

thank you to wrap() tests of udhr data for discovering this very strange side-effect

I regret I wasn't aware of any practical language test until I had an unrelated error in udhr data benchmarks about related differences of wcswidth() and width(). So we reverse engineered the results into tests.

hard to describe, even why we have a fast path like this, its because most use is many lines of few words, headlines, messages, and so on, if this library processed only udhr data or worse-case never-ending lines of combining death -- when integrating downstream i see most common use case is 1 to few words.

very small change, respect memory and usage a bit, though it may show as a an "impact" in benchmarking, because our benchmark is excersizing hundreds of languages -- we should target "2-3" languages, which is really just 1 western (ascii) + 1 eastern language (CJK) + graphics+emojis

jquast · 2026-01-29T04:20:37Z

The benchmark shows what is expected for reduction of performance, the performance increase is because I broke the udhr table into parts and "skip every N line", because the total codecov time is way more than local -- the use of valgrind there to "count every instruction" rather than wall time, which is fine but I have learned is 10x slower in processing than wall time.

The boundary pattern in ``get_kitty_keyboard_state()`` used ``.+`` which matched prematurely! Existing tests didn't catch this because they use ungetch() which buffers the entire response jquast/wcwidth#200 jquast/wcwidth#202

jquast marked this pull request as ready for review January 28, 2026 23:55

Base automatically changed from jq/do-not-distribute-data-files to master January 29, 2026 00:04

jquast changed the base branch from master to 0.1.5 January 29, 2026 00:05

jquast changed the base branch from 0.1.5 to master January 29, 2026 00:05

jquast added 2 commits January 28, 2026 19:05

Update specs and zero-width tables regarding Mc

7ed5218

find useful URL's for various terms in specification to unicode's specifications. Also reference, "Rendering Nonspacing Marks"

changelog

97cad51

jquast force-pushed the jq/category-mc branch from 6529f0b to 97cad51 Compare January 29, 2026 00:06

"Spacing Mark" or "Spacing Combining Mark"?

b9ad20c

I've seen it written both ways ..

Start with failing tests, integrating with ghostty branch

d810971

jquast marked this pull request as draft January 29, 2026 00:31

jquast added 14 commits January 28, 2026 19:39

specs and benchmark refactor, udhr by 10k lines

bd3ea48

CATEGORY_MC table set Mc in zero table match new spec

9808659

stand-alone Mc test also

51f9049

update tables, add table_mc.py

69b447c

please welcome the new all time rapper category mc, knows all the signs and hes got a lot to say

This logic passes test, still reviewing

fa38ec3

on deprecation

cab8c85

tox -eformat

abd2862

marble madness

a9efce8

thank you to wrap() tests of udhr data for discovering this very strange side-effect

what's the opposite of TDD? T's

178e351

I regret I wasn't aware of any practical language test until I had an unrelated error in udhr data benchmarks about related differences of wcswidth() and width(). So we reverse engineered the results into tests.

match origin/master

4160f78

trying still to make udhr tests useful but not so long!

b8fd762

change UDHR EVERY to 20 ..

4769a8a

jquast marked this pull request as ready for review January 29, 2026 04:18

jquast merged commit a198889 into master Jan 29, 2026
42 of 43 checks passed

jquast deleted the jq/category-mc branch January 29, 2026 04:25

jquast mentioned this pull request Jan 31, 2026

bugfix kitty keyboard protocol detection jquast/blessed#348

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update specs and zero-width tables regarding Mc#200

Update specs and zero-width tables regarding Mc#200
jquast merged 18 commits intomasterfrom
jq/category-mc

jquast commented Jan 28, 2026

Uh oh!

codspeed-hq bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 29, 2026 •

edited

Loading

Uh oh!

jquast commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jquast commented Jan 28, 2026

Uh oh!

codspeed-hq bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging this PR will degrade performance by 15.11%

Summary

Performance Changes

Footnotes

Uh oh!

codecov bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jquast commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq bot commented Jan 28, 2026 •

edited

Loading

codecov bot commented Jan 29, 2026 •

edited

Loading