Skip to content

Update specs and zero-width tables regarding Mc#200

Merged
jquast merged 18 commits intomasterfrom
jq/category-mc
Jan 29, 2026
Merged

Update specs and zero-width tables regarding Mc#200
jquast merged 18 commits intomasterfrom
jq/category-mc

Conversation

@jquast
Copy link
Copy Markdown
Owner

@jquast jquast commented Jan 28, 2026

Closes #155, Should Combining characters of Category 'Mc' be width 1? Yes.

As I wrote then, "Probably the largest obstacle has been finding 1 or 2 Terminal emulators that correctly display languages that include Mc characters.". I felt it would be too progressive, but I have since found some TE's capable of measuring and rendering complex script like Devanagari correctly.

I expect ucs-detect results for sanskrit-based languages to improve in next release with this update. Still doing some testing and integration.

Previous to this change,

Shell test using printf(1), '|' should align in output:

$ printf "\xe0\xa4\xae\xe0\xa4\xbe\xe0\xa4\xa8\xe0\xa4\xb5\xe0\xa4\xbe\xe0\xa4\xa7\xe0\xa4\xbf\xe0\xa4\x95\xe0\xa4\xbe\xe0\xa4\xb0\xe0\xa4\xbe\xe0\xa4\xa3\xe0\xa4\xbe\xe0\xa4\x82|\\n1234567|\\n"
मानवाधिकाराणां|
1234567|

After integration with ucs-detect, I expect:

Shell test using printf(1), '|' should align in output:

$ printf "\xe0\xa4\xae\xe0\xa4\xbe\xe0\xa4\xa8\xe0\xa4\xb5\xe0\xa4\xbe\xe0\xa4\xa7\xe0\xa4\xbf\xe0\xa4\x95\xe0\xa4\xbe\xe0\xa4\xb0\xe0\xa4\xbe\xe0\xa4\xa3\xe0\xa4\xbe\xe0\xa4\x82|\\n1234567890123|\\n"
मानवाधिकाराणां|
1234567890123|

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Jan 28, 2026

CodSpeed Performance Report

Merging this PR will degrade performance by 15.11%

Comparing jq/category-mc (b8fd762) with master (8d27c08)

Summary

❌ 2 regressed benchmarks
✅ 52 untouched benchmarks
🆕 2 new benchmarks
⏩ 3 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
🆕 test_width_fastpath_integrity_udhr N/A 10.8 s N/A
test_width_decomposed 2.1 ms 2.4 ms -13.32%
🆕 test_width_wcswidth_consistency_udhr N/A 6.8 s N/A
test_wcswidth_decomposed 1.8 ms 2.2 ms -15.11%

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@jquast jquast marked this pull request as ready for review January 28, 2026 23:55
Base automatically changed from jq/do-not-distribute-data-files to master January 29, 2026 00:04
@jquast jquast changed the base branch from master to 0.1.5 January 29, 2026 00:05
@jquast jquast changed the base branch from 0.1.5 to master January 29, 2026 00:05
find useful URL's for various terms in specification to unicode's
specifications.  Also reference, "Rendering Nonspacing Marks"
I've seen it written both ways ..
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (8d27c08) to head (4769a8a).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #200   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           14        15    +1     
  Lines          885       896   +11     
  Branches       225       227    +2     
=========================================
+ Hits           885       896   +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jquast jquast marked this pull request as draft January 29, 2026 00:31
jquast added 14 commits January 28, 2026 19:39
please welcome the new all time rapper category mc, knows all the signs
and hes got a lot to say
thank you to wrap() tests of udhr data for discovering this very strange
side-effect
I regret I wasn't aware of any practical language test until I had an
unrelated error in udhr data benchmarks about related differences of
wcswidth() and width(). So we reverse engineered the results into tests.
hard to describe, even why we have a fast path like this, its because
most use is many lines of few words, headlines, messages, and so on, if
this library processed only udhr data or worse-case never-ending lines
of combining death -- when integrating downstream i see most common use
case is 1 to few words.
very small change, respect memory and usage a bit, though it may show as
a an "impact" in benchmarking, because our benchmark is excersizing
hundreds of languages -- we should target "2-3" languages, which is
really just 1 western (ascii) + 1 eastern language (CJK) + graphics+emojis
@jquast jquast marked this pull request as ready for review January 29, 2026 04:18
@jquast
Copy link
Copy Markdown
Owner Author

jquast commented Jan 29, 2026

The benchmark shows what is expected for reduction of performance, the performance increase is because I broke the udhr table into parts and "skip every N line", because the total codecov time is way more than local -- the use of valgrind there to "count every instruction" rather than wall time, which is fine but I have learned is 10x slower in processing than wall time.

@jquast jquast merged commit a198889 into master Jan 29, 2026
42 of 43 checks passed
@jquast jquast deleted the jq/category-mc branch January 29, 2026 04:25
jquast added a commit to jquast/blessed that referenced this pull request Feb 1, 2026
The boundary pattern in ``get_kitty_keyboard_state()`` used ``.+`` which matched prematurely!

Existing tests didn't catch this because they use ungetch() which buffers the entire response

jquast/wcwidth#200
jquast/wcwidth#202
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Should Combining characters of Category 'Mc' be width 1?

1 participant