Add text analyser to consoles by SlySven · Pull Request #1579 · Mudlet/Mudlet

SlySven · 2018-03-09T20:31:31Z

This will allow the user to determine the exact characters present in a line of MUD (or locally generated) text which could be useful in working out what to use to construct triggers and also how the displayed characters are constructed in UTF-8 bytes. To use it the user should control click on a line of text in a console and then right click to bring up the context menu. By moving the cursor over the "analyse characters" action and hovering there a generated tool-tip will be brought up that contains HTML tables with the break down of Unicode code-points which is then broken down into the corresponding UTF-16 (which Qt uses internally to store all text in QStrings) and UTF-8 which is what the Lua sub-system gets as its form of strings. For characters that are not on the Basic Multi-Plane and which thus need two QChars (and thus UTF-16 surrogate pairs) to represent the pairs of UTF-16 values (in the range 0xD800 to 0xDBFF for the first, high surrogate followed by one in the range 0xDC00 to 0xDFFF for the second, low surrogate) those numbers are also displayed.

Note: this works on (only) the whole first line of the selected text even if more than one line is selected - or only a fraction of that first line is selected. The context menu entry is NOT shown if there is no selection.

Signed-off-by: Stephen Lyons slysven@virginmedia.com

This will allow the user to determine the exact characters present in a line of MUD (or locally generated) text which could be useful in working out what to use to construct triggers and also how the displayed characters are constructed in UTF-8 bytes. To use it the user should control click on a line of text in a console and then right click to bring up the context menu. By moving the cursor over the "analyser characters" action and hovering there a generated tool-tip will be brought up that contains HTML tables with the break down of Unicode code-points which is then broken down into the corresponding UTF-16 (which Qt uses internally to store all text in QStrings) and UTF-8 which is what the Lua sub-system gets as its form of strings. For characters that are not on the Basic Multi-Plane and which thus need two QChar (and thus UTF-16 surrogate pairs) to represent the pairs of UTF-16 values (in the range 0xD800 to 0xDBFF for the first, high surrogate followed by one in the range 0xDC00 to 0xDFFF for the second, low surrogate) those numbers are also displayed. Note: this works on (only) the whole first line of the selected text even if more than one line is selected - or only a fraction of that first line is selected. The context menu entry is NOT shown if there is no selection. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>

SlySven · 2018-03-09T21:02:26Z

This is one part of a significant PR I am assembling to sort out the display/selection of non-BMP or text with combining diacriticals or other unusual features that mean that there is not a straight one-to-one mapping of QChar/TChar to on-screen glyphs. It will be really useful to do things like telling the difference between say all of these - which are (almost) all representations of things that could be held to look like:

a: aаɑᵃₐ⒜ⓐ⠁ⲁａ𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝗮
e: eɐᵉₑ℮ℯⅇ⒠ⓔ⠊ⲉｅ𝈡𝐞𝑒𝒆𝓮𝔢𝕖𝖊𝗲
A: `ΑAᗅᗋᗩᴀᴬ₳Ⓐ⠠⠁ⲀꓮＡ𐌀𝐀𝐴𝑨𝚨𝛢𝜜𝝖𝞐𝒜𝔸𝕬𝖠𝗔𝘈

{OK, the capital A in Braille needs the preceding dot-6 to turn it from the lower case dot-1 of 'a' into the upper case 'A'...!}

Anyhow this is one part that can be done completely separately to the rest so I pulled it out as a separate PR to make the other one smaller. 😇

Some examples:

vadi2 · 2018-03-13T12:23:07Z

Great tool! Since this is mostly aimed at writing triggers, what do you think about moving the analyze option to the trigger pattern field where it's more appropriate for the context?

SlySven · 2018-03-14T00:03:46Z

Do you mean duplicating the functionality - to show exactly what is in the patterns so that they can be checked - or, well, what? 😕

The functionality is keyed to the mouse event in the TTextEdit class that is incorporated into any TConsole so it even works in the Central Debug Console; so moving it to the editor does not make too much sense to me. 🙁

Note that it is done using the tooltip to that "analyze characters" QAction so that it only persists while the mouse hovers over it; this has the unfortunate side-effect that it is not then possible to select and copy a part of the details but it does mean that it disappears nicely afterwards and no extra code to close/clean up/prevent mutiple copies of the display is needed. It is intended to be informational - which I think it is - and of course, once the user identifies what is being shown, there is always the option to select and copy the text directly from the TConsole to paste it in the editor...

vadi2 · 2018-03-14T06:56:02Z

The idea behind moving the editor is that since this is a very, very, advanced tool that most people will not understand - we might as well have it where it's relevant, in the editor, and not fill up the right-click menu for the main console. That way both the super technical users needing to decode the Unicode for some very specific triggers can find it where it's relevant and the newbies won't be confused by it!

SlySven · 2018-03-14T12:46:16Z

I'm not sure that it would be simple to put the functionality in the editor then because you would have to find a way to navigate to the particular TConsole and then hope that the line of text that you are interested in that console is the (first) currently selected one when you get there. It might be doable but it would make things more complex to try and do something like that when this, implemented as a single SLOT method (with a helper to identify those code-points that are obviously not going to have a printable representation), is relatively straightforward.

<aside>N.B. If you are ever debugging a hover or onHover type slot consider changing it temporarily to an clicked/activate type behaviour before setting breakpoints in it - under some circumstances {for me a FreeBSD OS and Qt 5.7.1, running inside Qt Creator} it can hang the Application/Qt Creator, not entirely sure why but I suspect the stopping of execution within the temporary hover code messes up/locks up the Qt Creator/Debugger parent process' interactions which use the same methods to display details in the source code at the same time... 😮 </aside>

vadi2 · 2018-03-14T13:02:01Z

Hmm, why would the TConsole need any involvement? I am thinking the person would copy/paste the line into the trigger pattern field, and then analyze it.

SlySven · 2018-03-14T20:57:55Z

Oh! You mean as a dialogue that the user can (choose one of):

bring up (from the TConsole console menu, or a submenu item on the "Tools" main menu) a dialogue and drop some copied text to be analysed into - note that the code here only works on the whole of the first line in the selection (even if only part of that line is selected) - this is currently because at the time of pushing this code out there the selection of part of a line containing either combining diacriticals/variation selectors/non-BMP characters (anything that breaks the previous assumption of one QChar/TChar per visible glyph) is broken (fix pending in the outstanding PR I'm progressing locally) - so that partial line copying is faulty. Also, there is a physical limit in terms of screen real estate that means handling multiple lines with the current HTML in a tool-tip is not going to fly/fit!
activate from within a pattern to display whatever has been placed there - I suspect this would not want to be activate-able for colour and possibly one or two ( line count , spacer ?) other types where the content of the pattern box does not really "match" a text as such.

If method 1 was chosen it would be possible to use a scrollable, read-only text display type widget which could handle more than one line of text (mind you, if we go in that direction we might have to do something to better track how each line of text is actually ended - we currently massage the end of the lines but if we start trying to show the raw text the user will expect to see that information IMHO; i.e. did Mudlet wrap that line because it exceeded the wrap limit or did the MUD server put a line-feed there, and if it was the MUD server, did it use a line-feed, a carriage-return & line-feed pair - or what? I've seen some odd things in the code base with Mudlet stuffing either \r or \0xff on the end of some lines and I'm not sure what on earth is going on there!)

Method 2 could actually be done as well as what is here now and would, IMHO complement this PR so far. The current "pop-up" can show what is in the "incoming" text - and - for the right sub-set of the trigger patterns (beginning of line/substring/exact match) replicating the action on those would show what the match would be - if we really wanted to be fancy we could have a second "test" context menu action for the pattern item to say whether it would fire on the (whole of the first line of the) selection in the main TConsole - such a go/no-go indicator might be helpful for those trying to debug the more complex PCRE patterns (although not as much, perhaps, as the Regex101 website)...

I do like the idea of 2 as well but are you really against the current proposal as it is now as a pop-up tooltip on an item in the context menu of any TConsole?

Does anyone else, having tried this {Windows; macOs or Linux} have any opinions?

vadi2 · 2018-03-16T05:00:08Z

Yeah, because I think this is a very advanced option that'll seldomly be used (even you won't use it that much once done with Unicode work?), so having it in the right-click menu for everything is concerning me for reasons of screen real state and impression of the client, don't want to have a tool that'll confuse most people and make them feel stupid that they don't understand it right in their face all the time.

SlySven · 2018-03-26T15:46:41Z

Would it be acceptable to hide it unless a "Special Options" item is checked to enable it?

I am currently finding it useful but I can see where you are coming from in that it is only going to be used in situations where the end-user is not quite sure what they are seeing and want to check the details more carefully than normal - or because they are getting a character that they are not expecting! Fair enough that is not that likely for Western Latinate languages but I suspect it might be an issue for some other idographic type writing systems with hundreds if not thousands of glyphs...!

vadi2 · 2018-03-26T16:47:46Z

OK!

# By Vadim Peretokin (81) and others * development: (162 commits) BugFix: set Server Encoding correctly on auto-loading profiles Install xz-utils to guarantee xz is available Create and upload source tarballs on release BugFix: use Alternative OpenSSLBinary for Windows (Mudlet#1850) Fix auto-save kicking in and blocking save profile as Upgrade a few classes to newer Qt connect style (Mudlet#1846) Fix package exporter to work with async save (Mudlet#1832) Pulled out actions into a mudlet member class variables (Mudlet#1839) Create module zip if it's not already created when syncing (Mudlet#1842) i18n-ise GUI label creation (Mudlet#1838) Align "no map"message in centre propely (Mudlet#1837) New Crowdin translations (Mudlet#1756) Delete old version checks (Mudlet#1833) Re-set dev. 3.11.1 bugfix release. Fix iterator to actually iterate BugFix: fix faulty log options for new profiles or old profile save files Big5 support (Mudlet#1808) BugFix: include a fail-back icon for the auto-saved profile in Con. Dialog Back to development we go! ... Conflicts resolved in: * src/TTextEdit.cpp * src/TTextEdit.h Signed-off-by: Stephen Lyons <slysven@virginmedia.com>

I found there are some non-printing Unicode code-points that it is sensible to report - especially as they cannot currently be shown {the FitzPatrick Emoji modifiers that give a skin colour tint to human related Emojis}, however as they are not on the BMP a revision was needed to: `inline QString TTextEdit::convertWhiteSpaceToVisual(...)` Some undefined code sequencing (using multiple operator++ calls in the multiple operators of a method meant the order in which they were used was arbitrary) and produced errors when done in the wrong order. Added per profile option to enable (disabled by default) this analyser as requested. Added an additional line to the information table which gives the decimal escape codes that would be needed to specify the individual bytes that make up a non-ASCII grapheme in Lua for characters that are not printable ASCII characters. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>

SlySven · 2018-08-03T11:55:43Z

Have added another line which gives the decimal escapes (the \### where ### = 001 to 031; 127 to 255) for Lua for non-ASCII characters so that UTF-8 sequences can be reproduced in the editor (I've also started the indexing at 1 not 0 to reflect that lua does that for arrays and string indexing).

SlySven · 2018-10-25T23:07:59Z

... and I got it (slightly) wrong! 😢

src/Host.h

src/TTextEdit.cpp

… typo Signed-off-by: Stephen Lyons <slysven@virginmedia.com>

SlySven · 2018-10-26T00:22:02Z

Okay, this should be good enough -

src/TTextEdit.cpp

vadi2 · 2018-10-26T05:24:43Z

src/dlgProfilePreferences.cpp

+                                                            "encodings of <b>Big5</b>, <b>GBK</b> or <b>GBK18030</b> and 'narrow' for all others.</li></ul></p>"
+                                                            "<p><i>This is a temporary arrangement and will probably change when Mudlet gains "
+                                                            "full support for languages other than English.</i></p>"));
+    checkBox_enableTextAnalyzer->setToolTip(tr("<p>Check this option to activate a context (right click) menu action on any "


"Check this option to activate" is superflous, of course checking this option will do something 😉

I guess so - though some of your Discord options do sort of work in reverse! I.e. check them to NOT do something which keeps coders on their toes when comparing the code to what the user sees...

I will review this text.

Have tweaked the text, lets see if that will do.

src/dlgProfilePreferences.cpp

src/TTextEdit.cpp

vadi2 · 2018-10-26T05:38:46Z

src/TTextEdit.cpp

+                                                             "<tr><th>%12</th>%13</tr>"
+                                                             "</table></small></body></html>")
+                                              .arg(completedRows)
+                                              .arg(tr("Index (UTF-16)", "1st Row heading for Text analyser output, table item is the count into the QChars/TChars that make up the text {this translation used 2 times}"), indexes)


I'd recommend moving the two index rows side by side for better clarity.

Well, at present the UTF-16 is all above the visual representation and the UTF-8 is all below and the separation is useful when the two are not in lock-step across the line e.g. consider this example I made for another PR where the differences in the indexes really shows up:

It is even more obvious when you stray off the Basic Multi-plane because then you start getting UTF-16 surrogate pairs and by definition at least three if not four bytes for the UTF-8 representations.

src/TTextEdit.cpp

vadi2 · 2018-10-26T05:44:53Z

src/TTextEdit.cpp

+        }
+
+        if (rowItems > rowLimit) {
+            if (isFirstRow) {


Selecting multiple lines breaks the table:

That isFirstRow is what is used to only show those sidings (horizontal headings? 😜) on the first row of the table. It'd be interested to check the code that works out what part of the TBuffer to analyse is deciding in that situation - perhaps I broke something when I revised it to only consider the selected part of the first line... 😢

That shouldn't be happening - I'll have to take another look...

I can't reproduce this - have you got a replay with that text in it?

SlySven · 2018-10-28T12:48:01Z

I've also found that I need to fix things to handle the display of < (and I guess > and possibly &) as it is not being shown in the "visual" or "lua character or code" boxes...

…(...) In the new TTextEdit::convertWhitespaceToVisual(...) {renamed from "convertWhiteSpaceToVisual" !) I've moved the comments that describe each "code" for each white space (for which there are thus little or no visible means to tell them apart otherwise) characters into the translation tr(...) call so that it is available to translators. For most languages it will be best not to attempt to translated these codes but for ideographic lanaguages where each grapheme can be a complete word it is possible that a complete explaination will be of comparible space on screen as the "code" that explains what each white-space character is so is worth providing for those languages only! Also fixes a bug where the less-than character is seen as the code in Qt Rich-Text/HTML that opens a formatting tag and is not displayed as a '<'. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>

Ran clang-format on the whole of ./src/TTextEdit.cpp Signed-off-by: Stephen Lyons <slysven@virginmedia.com>

Although wanted by reviewer this might not be a good idea if large numbers of Emoji characters are involved with FitzPatrick modifiers as the display widget might need to be a bit large even though it is limited to showing sixteen glyphs per row in the tabular layout. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>

vadi2

Let's suck it and see 👍

SlySven added the 1 - Ready label Mar 9, 2018

SlySven assigned vadi2 Mar 9, 2018

SlySven requested a review from a team March 9, 2018 20:31

vadi2 assigned SlySven and unassigned vadi2 Mar 26, 2018

vadi2 added 0 - In progress and removed 1 - Ready for review labels May 18, 2018

vadi2 force-pushed the development branch from 7f1af80 to fcc37d5 Compare June 9, 2018 12:14

SlySven mentioned this pull request Aug 2, 2018

Refactor: clean up TBuffer text format aspects #1840

Merged

SlySven requested a review from a team August 2, 2018 22:49

SlySven added 1 - Ready for review and removed 0 - In progress labels Aug 2, 2018

SlySven assigned vadi2 and Kebap and unassigned SlySven Aug 2, 2018

SlySven added this to the 3.12 milestone Aug 2, 2018

SlySven commented Oct 25, 2018

View reviewed changes

src/Host.h Show resolved Hide resolved

SlySven commented Oct 25, 2018

View reviewed changes

src/TTextEdit.cpp Outdated Show resolved Hide resolved

SlySven commented Oct 25, 2018

View reviewed changes

src/TTextEdit.cpp Outdated Show resolved Hide resolved

BugFix: remove commented dummy method that had been implemented + fix…

17bbc51

… typo Signed-off-by: Stephen Lyons <slysven@virginmedia.com>

SlySven added 1 - Ready for review and removed 0 - Work in progress labels Oct 26, 2018

SlySven changed the title ~~WIP. Add text analyser to consoles~~ Add text analyser to consoles Oct 26, 2018

SlySven assigned vadi2 and unassigned SlySven Oct 26, 2018

vadi2 reviewed Oct 26, 2018

View reviewed changes

vadi2 assigned SlySven and unassigned vadi2 Oct 26, 2018

vadi2 modified the milestones: 3.12, 3.15 Oct 26, 2018

SlySven added 0 - Work in progress and removed 1 - Ready for review labels Oct 28, 2018

SlySven added 3 commits October 28, 2018 23:21

Merge branch 'development' into Enhance_addMudTextLineGraphemeAnalyser

2ce4c7b

Revise: a couple of details prompted by peer-review

5721f10

Ran clang-format on the whole of ./src/TTextEdit.cpp Signed-off-by: Stephen Lyons <slysven@virginmedia.com>

SlySven assigned vadi2 and unassigned SlySven Dec 16, 2018

vadi2 approved these changes Dec 17, 2018

View reviewed changes

SlySven merged commit e286d1e into Mudlet:development Dec 17, 2018

SlySven deleted the Enhance_addMudTextLineGraphemeAnalyser branch December 17, 2018 19:07

SlySven restored the Enhance_addMudTextLineGraphemeAnalyser branch June 22, 2020 18:02

SlySven deleted the Enhance_addMudTextLineGraphemeAnalyser branch June 22, 2020 18:25

Uh oh!

Conversation

SlySven commented Mar 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SlySven commented Mar 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadi2 commented Mar 13, 2018

Uh oh!

SlySven commented Mar 14, 2018

Uh oh!

vadi2 commented Mar 14, 2018

Uh oh!

SlySven commented Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vadi2 commented Mar 14, 2018

Uh oh!

SlySven commented Mar 14, 2018

Uh oh!

vadi2 commented Mar 16, 2018

Uh oh!

SlySven commented Mar 26, 2018

Uh oh!

vadi2 commented Mar 26, 2018

Uh oh!

SlySven commented Aug 3, 2018

Uh oh!

SlySven commented Oct 25, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SlySven commented Oct 26, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SlySven Oct 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SlySven commented Oct 28, 2018

Uh oh!

vadi2 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SlySven commented Mar 9, 2018 •

edited

Loading

SlySven commented Mar 9, 2018 •

edited

Loading

SlySven commented Mar 14, 2018 •

edited

Loading

SlySven Oct 28, 2018 •

edited

Loading