Fixes widechar table lookup#2943
Conversation
|
Hey there! Thanks for helping Mudlet improve. 🌟 Test versionsYou can directly test the changes here:
No need to install anything - just unzip and run. |
|
Nice find! Thank you, we'll look at this. The original code is from https://github.com/ridiculousfish/widecharwidth so we should tell them about this too. |
|
Tested, for me is good on windows 7 (black/white emoji). Version 4.0.3 was buggy Thanks @SoulSpirit79 |
… pixels rather than in "characters"
|
In an effort of getting better results for wide characters, I changed the way the engine creates the QRect for each glyph. For example, if the font size is set to 8, a standard lating character is 8 pixels wide. The drawback is that where non-standard-width characters are displayed, the text alignment isn't perfect anymore. If this solution is acceptable, widechar_wcwidth.h becomes obsolete and could be deleted along with all of its usages. |
|
Aha - nice find. Can you check if this new non-even characters affects line wrapping code? |
src/TTextEdit.cpp
Outdated
|
|
||
| auto textRect = QRect(mFontWidth * cursor.x(), mFontHeight * cursor.y(), mFontWidth * charWidth, mFontHeight); | ||
| //auto metrics = QFontMetrics(painter.font()); | ||
| auto advance = mFontMetrics.horizontalAdvance(grapheme); |
There was a problem hiding this comment.
This is only available in Qt 5.11 while we support 5.7 at minimum - right now. However I'd like to up the mininum to 5.11 because Debian stable now supports it, so this shouldn't be a problem.
There was a problem hiding this comment.
Didn't know sorry, I'll check if I can achieve the same result using QFontMetrics::boundingRect
|
Hm, try connecting to |
|
@Eraene what do you think? |
I see bad characters both in 4.0.3 and this branch, but they render differently. I think there's something broken on the server anyway, I tried a couple of server encodings but the result doesn't change. |
|
You need to set the encoding to |
Ok, got it. The explanation is in how whitespaces are handled: I think 4.0.3 knows that server has gbk encoding and renders whitespaces (0x20) as double-width. (but I can't find any reference to this in the code). Current branch treats 0x20 always the same, and it's thinner than any other chinese symbol. I think this is the correct way of handling it. Asian servers should not use 0x20 for formatting text, but a wider whitespace character: |
Found a better reference that "demonstrate" that the full-width whitespace for asian languages is 0x3000 and not 0x20: The first character of their unicode block is that full-width whitespace, and I doubt it is a coincidence. |
src/TConsole.cpp
Outdated
| splitter->setParent(layer); | ||
|
|
||
| mUpperPane = new TTextEdit(this, splitter, &buffer, mpHost, false); | ||
| mUpperPane = new TTextEdit(this, splitter, &buffer, mpHost, false, QFontMetrics(mpHost->mDisplayFont)); |
There was a problem hiding this comment.
mpHost->mDisplayFont is NOT appropriate for all instances of TTextEdit you will need to inspect the parent TConsole's (i.e. mpConsole) mType and work out which font to pass.
There was a problem hiding this comment.
Ok, I'll try to understand and fix it
src/TTextEdit.cpp
Outdated
| void TTextEdit::updateScreenView() | ||
| { | ||
| if (isHidden()) { | ||
| mFontMetrics = QFontMetrics(mpHost->mDisplayFont); |
There was a problem hiding this comment.
Why are you using mpHost->mDisplayFont and not mDisplayFont here?
And when you do use the latter the following lines can access the mFontMetrics member instead of each of them each retrieving a fresh instance of the relevant QFontMetrics as well...! 😉
There was a problem hiding this comment.
Sorry my bad, cut&paste misprint. I'll fix it + I was trying to minimize the modified code. I'm fixing it.
src/TTextEdit.cpp
Outdated
| // and mIsMiniConsole is true for user created Mini Consoles and User Windows | ||
| if (mpConsole->getType() == TConsole::MainConsole) { | ||
| mFontMetrics = QFontMetrics(mpHost->mDisplayFont); | ||
| mFontWidth = QFontMetrics(mpHost->mDisplayFont).width(QChar('W')); |
There was a problem hiding this comment.
Someone else (possibly me) also has a PR pending that will change this to:
mFontWidth = QFontMetrics(mpHost->mDisplayFont).averageCharWidth();
so there may be git conflicts to resolve in the near future...
There was a problem hiding this comment.
And of course this and the following lines can all reuse the newly found mFontMetrics class instance and refer to that instead of each of them fetching a new copy...
There was a problem hiding this comment.
My idea is to completely remove mFontWidth: it's an approssimation of the char width but we could use the exact width instead. I just can't completely understand some usages of that variable and thus kept the original code.
The solution I'm proposing in this branch isn't perfect yet, but I'll work on it if it could be accepted.
There was a problem hiding this comment.
I agree, mFontWidth does not make too much sense anymore, however I'm not sure about converting everything to work with pixels either just yet - because that breaks a lot of assumptions we've made so far... 😅 this will require some thinking!
src/TTextEdit.cpp
Outdated
| { | ||
| uint unicode = getGraphemeBaseCharacter(grapheme); | ||
| int charWidth; | ||
| if (unicode == '\t') { |
There was a problem hiding this comment.
How well does the new code work with tab characters given that you are doing away with the current faking of (Left) Tabs...
There was a problem hiding this comment.
That's an open point I was working on.
The code you quoted was not symmetric to the code in convertMouseXToBufferX, which was:
if (unicode == '\t') { characterWidth = 8; } else { characterWidth = getGraphemeWidth(unicode); }
and it confused me. I'll fix it as soon as possible.
src/widechar_width.h
Outdated
|
|
||
| /* An inclusive range of characters. */ | ||
| struct widechar_range { | ||
| int32_t lo; |
There was a problem hiding this comment.
👎 DO NOT MAKE ANY CHANGES TO THIS FILE - IT IS IMPORTED WHOLESALE FROM UPSTREAM AT: https://github.com/ridiculousfish/widecharwidth AND MAY WELL GET OVERWRITTEN THE NEXT TIME THAT THE UNICODE VERSION IS INCREMENTED.
There was a problem hiding this comment.
Oh, sorry I was reviewing without looking at any past feedback from anyone else - yes if there are argument/variable type errors then they need to be passed upstream - though remember that this file is designed to work on multiple OSes and so may use some types to ensure compatibility which might not seem to be the optimum ones to us.
The reason that a few pixels may be being chopped off of the right for some Emojis and other wide glyphs is, I suspect, because it is not the same font being used for all the characters, if the specified font does not contain a wanted glyph then the Qt and Font subsystems (Font Config on Linux for instance) will (normally) work out which other fonts do have the wanted glyph for that grapheme and pull in the best (for some value of best) one they can find - however there is no guarantee that it will have the exact same font metrics +. The best way around that would be to measure the actual widths used when the glyphs are painted and to remember them for later when working out where the mouse is when it is clicked on or moved over them - mayhaps I am going to have to dust off that prototype code I cobbled together to do it like that...
+ - I would point out one exception to that: in the code for the 2D Map room symbols - where I have made it is possible to force only the specified font to be used!
There was a problem hiding this comment.
This file was removed entirely and not used anymore @SlySven.
There was a problem hiding this comment.
👎 DO NOT MAKE ANY CHANGES TO THIS FILE - IT IS IMPORTED WHOLESALE FROM UPSTREAM AT: https://github.com/ridiculousfish/widecharwidth AND MAY WELL GET OVERWRITTEN THE NEXT TIME THAT THE UNICODE VERSION IS INCREMENTED.
I'm not sure whether the problem lies in the library on in some compiling flag for Mudlet.
I'm not so expert with C, but reading here and there seems like wchar_t is the correct type to handle 2-bytes or 4-bytes character. But in Mudlet dev environment it is interpreted as a 2-bytes type and cannot handle many unicode ranges.
I'd open an issue to https://github.com/ridiculousfish/widecharwidth if I'd know for sure.
There was a problem hiding this comment.
This file was removed entirely and not used anymore @SlySven.
No it isn't - in fact I recently updated it to support Unicode 12.1 ! 😕
It's purpose is to tell the caller whether the grapheme is supposed to be drawn in a normal or wide (double the width of a normal) width - taking into account whether the character is defined as a normal or wide one or is an ambiguous width or whether it changed from being narrow in Unicode 8 to be wide in Unicode 9. If it is an ambiguous width then our code considers the choice the user made from them - or if they set it to auto - the server encoding the user has selected...
The bottom line is that it is trying to approximate the duo-space font we need for MUD type text displays...
That's not an option, we can't break rendering and tell them to fix it even if we might be right 😕 I don't think there is special treatment of pkuxkx but I'll verify in the code. |
|
Yeah, there is no special treatment. In current code So horizontal advance is perhaps not the correct number to use in 100% of the cases? It works for emoji's since we want them to be bigger than normal text, but not for these characters which we'd like to be actually monospace. |
To solve this, only when |
|
Let's think on a better solution - exceptions are bound to fail. See what I mentioned above in regards to character width - it is not just related to the space, it is related to the different way we are measuring now which works for emoji's, but breaks asian text. |
|
Thanks so much for working on this! It looks like it'll be harder than expected, but it was to happen either way when we'd get around to this. |
Like @SlySven said, that problem rises when the selected font ( According to the documentation (https://doc.qt.io/qt-5/qfontmetrics.html#horizontalAdvance-1) |
|
Sorry, can you explain what you mean? Ubuntu Mono doesn't have emoji's either, they are 100% certainly drawn using the Noto font fallback I've added - and you're implying that |
I think |
|
Oh, I see what you mean now. You basically turned off monospace rendering of characters - which is why emoji's work - because they aren't drawn monospace anymore and bigger than normal - and other things don't, like special characters - because they aren't drawn monospace anymore, either - they are drawn smaller than intended. That won't work, we can't fix one thing and break another. How do other applications draw emoji's when using monospace fonts? We still have to stick with monospace, not monospace as you've seen just breaks a lot of things. |
|
What if we treat emoji's as 2 width characters and draw them in the middle of the space we have for 2 characters? Would that look acceptable, without breaking all text display in Mudlet? |
Sorry if I'm a bit late, but I was out for vacation. I checked the changes in the code made in the last weeks and I've found nothing that could have fixed the problem. I cannot understand how you can obtain the result highlighted by your screenshot.
For example, taking the character 😂 Line 574 in b3133ef ... converts the character to 0xFFFFF602 as it reaches that library:Line 516 in b3133ef |
|
Thanks for reopening the issue.
As I said, I'm not confortable on reporting the issue on the ridiculousfish project because I don't fully understand what's behind the |
|
I am not sure that just because the file being referred is Certainly when I updated the code in 67449e5 it seems that later Unicode data was being used from the other two files used to generate |
|
Of course I could be wrong 🤦♂ - there are differences between https://unicode.org/Public/emoji/5.0/emoji-data.txt and https://unicode.org/Public/emoji/12.0/emoji-data.txt - so apologies to you @SoulSpirit79 ... Lets see if upstream pick up on this: ridiculousfish/widecharwidth#5 |
|
After ridiculousfish/widecharwidth#6 (comment), I think we should draw emoji's as 2x width. It'll help not screw up the formatting of everything, so monospaced fonts still stay monospace, and our emoji's won't be getting cut off. |
|
@SoulSpirit79 sorry, this kind of fell off the radar - but I'll look into picking this work up and finishing it, if you or nobody else can't. I think we should draw emoji's as 2x width and still keep the display consistent: making it inconsistent is just doing to break a lot of things and people would also be unhappy about the formatting. |
I agree with your point of view, but we have to choose a strategy to make emojis work. For a definitive solution instead, there are various points to consider:
If this was my project, I'd move this way, in order:
What do you think? |
|
Um, sorry for not responding to this before but I've finally run into part of this whilst doing some work on a Windows 64 bit build - it turns out that the primary problem with Anyhow I have referred a fix (well it worked for me) for this upstream as ridiculousfish/widecharwidth#7 - I intend to incorporate those changes into our codebase and see what effect that has on the wrong widths particularly on Windows... @SoulSpirit79 Thanks for reporting the issue, even if the solution we adopt is not this one. |
|
As for the variations - we consider the whole grapheme at a time so the width is pretty much dependent on the classification of the first (the base) codepoint {though the Fitzpatrick modifiers might mess with that if the font renderer does not understand them and adopts the fall-back of a colour or grey scale square}; another consideration ought to be the text/emoji variation selectors - but we are only just getting the full-color emoji stuff working - and the Qt libraries totally fail to handle those at all as far as I can tell... |
Thanks @SlySven , much appreciated :) |
This is a bugfix that I got into upstream and fixes some issues on Windows where some Emojis (off of the BMP) were not being correctly typed - they were being identified as private use characters on Windows because the argument to a function was not wide enough (did not have enough bits) to convay the values required. I believe this may be a simpler fix for the issue that PR: Mudlet#2943 was trying to fix. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
This is a bugfix that I got into upstream and fixes some issues on Windows where some Emojis (off of the BMP) were not being correctly typed - they were being identified as private use characters on Windows because the argument to a function was not wide enough (did not have enough bits) to convay the values required. I believe this may be a simpler fix for the issue that PR: #2943 was trying to fix. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
|
@SoulSpirit79 How does the test versions in #3207 look compared to the situation you were trying to fix in this PR - i.e. how do they compare to the before this PR case? I suspect that the widths will be better but you might still see some shaving off the right hand side of some of those emojis... |
The result is really good, even if all emojis are shaved off, here's a screenshot of the #3207 build on Win10 64: Thumbs up :) |
|
What should we do next with this? See how emoji's render as in 2 character width, to observe if the shaving is gone? |
|
Saying the emoji's render as 2 characters wide is sadly a bit misleading and mono-space-ist! Unfortunately when more than one font is used there is no guarantee that even Ultimately we may have to check the width of each character when painted and then divide that by how many "normal" character spaces we think it should be and then reset our idea of what the "character width" to use is for every grapheme still in the Oh, and we also must ensure that kerning is switched off - it is not appropriate for a mone/duospaced text layout. OTOH once we do the above we can easily (IMHO) allow the use of proportional fonts - we will just render them in a monospaced manner...! |
|
I think we should explore sticking with mono-space-ist as we have been, because I suspect it's pretty vital to how MUDs work, especially for the text formatting. If we throw that out I think it'll make a mess. That's why I want to see how emoji's look look like if we try them as 2 space-characters, do they fit in then or no... |
|
Transparent backgrounds are a non-starter I think - remember that MUD text has foreground and background colours that are BOTH adjustable (except that foreground colours do not apply to colour emojis - on OS Versions that support them - i.e. not Windows 7 or FreeBSD). And there is no certainty that emojis are transmitted with spaces padding them, either from other characters (or more likely in my opinion, from each other...:roll_eyes::disappointed:!) |
This is a bugfix that I got into upstream and fixes some issues on Windows where some Emojis (off of the BMP) were not being correctly typed - they were being identified as private use characters on Windows because the argument to a function was not wide enough (did not have enough bits) to convay the values required. I believe this may be a simpler fix for the issue that PR: Mudlet#2943 was trying to fix. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
|
Just encountered this issue in April 2020, hoping for a resolution and some .cpp conflict resolution :) |
|
From what I remember this PR fixes one thing and breaks the other. I'd be happy to see it fixed! |
|
Closing for now as we haven't had any movement on this, but please do re-open if you'd like to work on it again. |













Brief overview of PR changes/additions
Related to issue #2379
Rendering width of characters in TConsole was wrong for some glyphs because the lookup algorithm of unicode ranges was faulty (it converted unsigned ints to signed ints). This patch fixes the algorithm.
Motivation for adding to Mudlet
Some wide characters (like emojis) were interpreted as single-width instead of double-width.
Output before the patch:

Output after the patch:
