Add text analyser to consoles#1579
Conversation
This will allow the user to determine the exact characters present in a line of MUD (or locally generated) text which could be useful in working out what to use to construct triggers and also how the displayed characters are constructed in UTF-8 bytes. To use it the user should control click on a line of text in a console and then right click to bring up the context menu. By moving the cursor over the "analyser characters" action and hovering there a generated tool-tip will be brought up that contains HTML tables with the break down of Unicode code-points which is then broken down into the corresponding UTF-16 (which Qt uses internally to store all text in QStrings) and UTF-8 which is what the Lua sub-system gets as its form of strings. For characters that are not on the Basic Multi-Plane and which thus need two QChar (and thus UTF-16 surrogate pairs) to represent the pairs of UTF-16 values (in the range 0xD800 to 0xDBFF for the first, high surrogate followed by one in the range 0xDC00 to 0xDFFF for the second, low surrogate) those numbers are also displayed. Note: this works on (only) the whole first line of the selected text even if more than one line is selected - or only a fraction of that first line is selected. The context menu entry is NOT shown if there is no selection. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
|
This is one part of a significant PR I am assembling to sort out the display/selection of non-BMP or text with combining diacriticals or other unusual features that mean that there is not a straight one-to-one mapping of
{OK, the capital A in Braille needs the preceding dot-6 to turn it from the lower case dot-1 of 'a' into the upper case 'A'...!} Anyhow this is one part that can be done completely separately to the rest so I pulled it out as a separate PR to make the other one smaller. 😇 |
|
Great tool! Since this is mostly aimed at writing triggers, what do you think about moving the analyze option to the trigger pattern field where it's more appropriate for the context? |
|
Do you mean duplicating the functionality - to show exactly what is in the patterns so that they can be checked - or, well, what? 😕 The functionality is keyed to the mouse event in the Note that it is done using the tooltip to that "analyze characters" |
|
The idea behind moving the editor is that since this is a very, very, advanced tool that most people will not understand - we might as well have it where it's relevant, in the editor, and not fill up the right-click menu for the main console. That way both the super technical users needing to decode the Unicode for some very specific triggers can find it where it's relevant and the newbies won't be confused by it! |
|
I'm not sure that it would be simple to put the functionality in the editor then because you would have to find a way to navigate to the particular <aside>N.B. If you are ever debugging a hover or onHover type slot consider changing it temporarily to an clicked/activate type behaviour before setting breakpoints in it - under some circumstances {for me a FreeBSD OS and Qt 5.7.1, running inside Qt Creator} it can hang the Application/Qt Creator, not entirely sure why but I suspect the stopping of execution within the temporary hover code messes up/locks up the Qt Creator/Debugger parent process' interactions which use the same methods to display details in the source code at the same time... 😮 </aside> |
|
Hmm, why would the |
|
Oh! You mean as a dialogue that the user can (choose one of):
If method 1 was chosen it would be possible to use a scrollable, read-only text display type widget which could handle more than one line of text (mind you, if we go in that direction we might have to do something to better track how each line of text is actually ended - we currently massage the end of the lines but if we start trying to show the raw text the user will expect to see that information IMHO; i.e. did Mudlet wrap that line because it exceeded the wrap limit or did the MUD server put a line-feed there, and if it was the MUD server, did it use a line-feed, a carriage-return & line-feed pair - or what? I've seen some odd things in the code base with Mudlet stuffing either Method 2 could actually be done as well as what is here now and would, IMHO complement this PR so far. The current "pop-up" can show what is in the "incoming" text - and - for the right sub-set of the trigger patterns (beginning of line/substring/exact match) replicating the action on those would show what the match would be - if we really wanted to be fancy we could have a second "test" context menu action for the pattern item to say whether it would fire on the (whole of the first line of the) selection in the main I do like the idea of 2 as well but are you really against the current proposal as it is now as a pop-up tooltip on an item in the context menu of any Does anyone else, having tried this {Windows; macOs or Linux} have any opinions? |
|
Yeah, because I think this is a very advanced option that'll seldomly be used (even you won't use it that much once done with Unicode work?), so having it in the right-click menu for everything is concerning me for reasons of screen real state and impression of the client, don't want to have a tool that'll confuse most people and make them feel stupid that they don't understand it right in their face all the time. |
|
Would it be acceptable to hide it unless a "Special Options" item is checked to enable it? I am currently finding it useful but I can see where you are coming from in that it is only going to be used in situations where the end-user is not quite sure what they are seeing and want to check the details more carefully than normal - or because they are getting a character that they are not expecting! Fair enough that is not that likely for Western Latinate languages but I suspect it might be an issue for some other idographic type writing systems with hundreds if not thousands of glyphs...! |
|
OK! |
# By Vadim Peretokin (81) and others * development: (162 commits) BugFix: set Server Encoding correctly on auto-loading profiles Install xz-utils to guarantee xz is available Create and upload source tarballs on release BugFix: use Alternative OpenSSLBinary for Windows (Mudlet#1850) Fix auto-save kicking in and blocking save profile as Upgrade a few classes to newer Qt connect style (Mudlet#1846) Fix package exporter to work with async save (Mudlet#1832) Pulled out actions into a mudlet member class variables (Mudlet#1839) Create module zip if it's not already created when syncing (Mudlet#1842) i18n-ise GUI label creation (Mudlet#1838) Align "no map"message in centre propely (Mudlet#1837) New Crowdin translations (Mudlet#1756) Delete old version checks (Mudlet#1833) Re-set dev. 3.11.1 bugfix release. Fix iterator to actually iterate BugFix: fix faulty log options for new profiles or old profile save files Big5 support (Mudlet#1808) BugFix: include a fail-back icon for the auto-saved profile in Con. Dialog Back to development we go! ... Conflicts resolved in: * src/TTextEdit.cpp * src/TTextEdit.h Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
I found there are some non-printing Unicode code-points that it is sensible
to report - especially as they cannot currently be shown {the FitzPatrick
Emoji modifiers that give a skin colour tint to human related Emojis},
however as they are not on the BMP a revision was needed to:
`inline QString TTextEdit::convertWhiteSpaceToVisual(...)`
Some undefined code sequencing (using multiple operator++ calls in the
multiple operators of a method meant the order in which they were used
was arbitrary) and produced errors when done in the wrong order.
Added per profile option to enable (disabled by default) this analyser as
requested.
Added an additional line to the information table which gives the decimal
escape codes that would be needed to specify the individual bytes that
make up a non-ASCII grapheme in Lua for characters that are not printable
ASCII characters.
Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
|
Have added another line which gives the decimal escapes (the |
|
... and I got it (slightly) wrong! 😢 |
… typo Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
|
Okay, this should be good enough - |
src/dlgProfilePreferences.cpp
Outdated
| "encodings of <b>Big5</b>, <b>GBK</b> or <b>GBK18030</b> and 'narrow' for all others.</li></ul></p>" | ||
| "<p><i>This is a temporary arrangement and will probably change when Mudlet gains " | ||
| "full support for languages other than English.</i></p>")); | ||
| checkBox_enableTextAnalyzer->setToolTip(tr("<p>Check this option to activate a context (right click) menu action on any " |
There was a problem hiding this comment.
"Check this option to activate" is superflous, of course checking this option will do something 😉
There was a problem hiding this comment.
I guess so - though some of your Discord options do sort of work in reverse! I.e. check them to NOT do something which keeps coders on their toes when comparing the code to what the user sees...
There was a problem hiding this comment.
I will review this text.
There was a problem hiding this comment.
Have tweaked the text, lets see if that will do.
src/TTextEdit.cpp
Outdated
| "<tr><th>%12</th>%13</tr>" | ||
| "</table></small></body></html>") | ||
| .arg(completedRows) | ||
| .arg(tr("Index (UTF-16)", "1st Row heading for Text analyser output, table item is the count into the QChars/TChars that make up the text {this translation used 2 times}"), indexes) |
There was a problem hiding this comment.
I'd recommend moving the two index rows side by side for better clarity.
There was a problem hiding this comment.
Well, at present the UTF-16 is all above the visual representation and the UTF-8 is all below and the separation is useful when the two are not in lock-step across the line e.g. consider this example I made for another PR where the differences in the indexes really shows up:

It is even more obvious when you stray off the Basic Multi-plane because then you start getting UTF-16 surrogate pairs and by definition at least three if not four bytes for the UTF-8 representations.
| } | ||
|
|
||
| if (rowItems > rowLimit) { | ||
| if (isFirstRow) { |
There was a problem hiding this comment.
That isFirstRow is what is used to only show those sidings (horizontal headings? 😜) on the first row of the table. It'd be interested to check the code that works out what part of the TBuffer to analyse is deciding in that situation - perhaps I broke something when I revised it to only consider the selected part of the first line... 😢
There was a problem hiding this comment.
That shouldn't be happening - I'll have to take another look...
There was a problem hiding this comment.
I can't reproduce this - have you got a replay with that text in it?
|
I've also found that I need to fix things to handle the display of |
…(...)
In the new TTextEdit::convertWhitespaceToVisual(...) {renamed from
"convertWhiteSpaceToVisual" !) I've moved the comments that describe each
"code" for each white space (for which there are thus little or no visible
means to tell them apart otherwise) characters into the translation tr(...)
call so that it is available to translators. For most languages it will be
best not to attempt to translated these codes but for ideographic
lanaguages where each grapheme can be a complete word it is possible that
a complete explaination will be of comparible space on screen as the "code"
that explains what each white-space character is so is worth providing for
those languages only!
Also fixes a bug where the less-than character is seen as the code in Qt
Rich-Text/HTML that opens a formatting tag and is not displayed as a '<'.
Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
Ran clang-format on the whole of ./src/TTextEdit.cpp Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
Although wanted by reviewer this might not be a good idea if large numbers of Emoji characters are involved with FitzPatrick modifiers as the display widget might need to be a bit large even though it is limited to showing sixteen glyphs per row in the tabular layout. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>



This will allow the user to determine the exact characters present in a line of MUD (or locally generated) text which could be useful in working out what to use to construct triggers and also how the displayed characters are constructed in UTF-8 bytes. To use it the user should control click on a line of text in a console and then right click to bring up the context menu. By moving the cursor over the "analyse characters" action and hovering there a generated tool-tip will be brought up that contains HTML tables with the break down of Unicode code-points which is then broken down into the corresponding UTF-16 (which Qt uses internally to store all text in
QStrings) and UTF-8 which is what the Lua sub-system gets as its form of strings. For characters that are not on the Basic Multi-Plane and which thus need twoQChars (and thus UTF-16 surrogate pairs) to represent the pairs of UTF-16 values (in the range 0xD800 to 0xDBFF for the first, high surrogate followed by one in the range 0xDC00 to 0xDFFF for the second, low surrogate) those numbers are also displayed.Note: this works on (only) the whole first line of the selected text even if more than one line is selected - or only a fraction of that first line is selected. The context menu entry is NOT shown if there is no selection.
Signed-off-by: Stephen Lyons slysven@virginmedia.com