Update specs and zero-width tables regarding Mc#200
Conversation
CodSpeed Performance ReportMerging this PR will degrade performance by 15.11%Comparing Summary
Performance Changes
Footnotes
|
find useful URL's for various terms in specification to unicode's specifications. Also reference, "Rendering Nonspacing Marks"
6529f0b to
97cad51
Compare
I've seen it written both ways ..
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #200 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 14 15 +1
Lines 885 896 +11
Branches 225 227 +2
=========================================
+ Hits 885 896 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
please welcome the new all time rapper category mc, knows all the signs and hes got a lot to say
thank you to wrap() tests of udhr data for discovering this very strange side-effect
I regret I wasn't aware of any practical language test until I had an unrelated error in udhr data benchmarks about related differences of wcswidth() and width(). So we reverse engineered the results into tests.
hard to describe, even why we have a fast path like this, its because most use is many lines of few words, headlines, messages, and so on, if this library processed only udhr data or worse-case never-ending lines of combining death -- when integrating downstream i see most common use case is 1 to few words.
very small change, respect memory and usage a bit, though it may show as a an "impact" in benchmarking, because our benchmark is excersizing hundreds of languages -- we should target "2-3" languages, which is really just 1 western (ascii) + 1 eastern language (CJK) + graphics+emojis
|
The benchmark shows what is expected for reduction of performance, the performance increase is because I broke the udhr table into parts and "skip every N line", because the total codecov time is way more than local -- the use of valgrind there to "count every instruction" rather than wall time, which is fine but I have learned is 10x slower in processing than wall time. |
The boundary pattern in ``get_kitty_keyboard_state()`` used ``.+`` which matched prematurely! Existing tests didn't catch this because they use ungetch() which buffers the entire response jquast/wcwidth#200 jquast/wcwidth#202
Closes #155, Should Combining characters of Category 'Mc' be width 1? Yes.
As I wrote then, "Probably the largest obstacle has been finding 1 or 2 Terminal emulators that correctly display languages that include Mc characters.". I felt it would be too progressive, but I have since found some TE's capable of measuring and rendering complex script like Devanagari correctly.
I expect ucs-detect results for sanskrit-based languages to improve in next release with this update. Still doing some testing and integration.
Previous to this change,
After integration with ucs-detect, I expect: