Skip to content

Add text analyser to consoles#1579

Merged
SlySven merged 16 commits intoMudlet:developmentfrom
SlySven:Enhance_addMudTextLineGraphemeAnalyser
Dec 17, 2018
Merged

Add text analyser to consoles#1579
SlySven merged 16 commits intoMudlet:developmentfrom
SlySven:Enhance_addMudTextLineGraphemeAnalyser

Conversation

@SlySven
Copy link
Copy Markdown
Member

@SlySven SlySven commented Mar 9, 2018

This will allow the user to determine the exact characters present in a line of MUD (or locally generated) text which could be useful in working out what to use to construct triggers and also how the displayed characters are constructed in UTF-8 bytes. To use it the user should control click on a line of text in a console and then right click to bring up the context menu. By moving the cursor over the "analyse characters" action and hovering there a generated tool-tip will be brought up that contains HTML tables with the break down of Unicode code-points which is then broken down into the corresponding UTF-16 (which Qt uses internally to store all text in QStrings) and UTF-8 which is what the Lua sub-system gets as its form of strings. For characters that are not on the Basic Multi-Plane and which thus need two QChars (and thus UTF-16 surrogate pairs) to represent the pairs of UTF-16 values (in the range 0xD800 to 0xDBFF for the first, high surrogate followed by one in the range 0xDC00 to 0xDFFF for the second, low surrogate) those numbers are also displayed.

Note: this works on (only) the whole first line of the selected text even if more than one line is selected - or only a fraction of that first line is selected. The context menu entry is NOT shown if there is no selection.

Signed-off-by: Stephen Lyons slysven@virginmedia.com

This will allow the user to determine the exact characters present in a
line of MUD (or locally generated) text which could be useful in working
out what to use to construct triggers and also how the displayed characters
are constructed in UTF-8 bytes.  To use it the user should control click
on a line of text in a console and then right click to bring up the context
menu.  By moving the cursor over the "analyser characters" action and
hovering there a generated tool-tip will be brought up that contains HTML
tables with the break down of Unicode code-points which is then broken down
into the corresponding UTF-16 (which Qt uses internally to store
all text in QStrings) and UTF-8 which is what the Lua sub-system gets as
its form of strings.  For characters that are not on the Basic Multi-Plane
and which thus need two QChar (and thus UTF-16 surrogate pairs) to
represent the pairs of UTF-16 values (in the range 0xD800 to 0xDBFF for the
first, high surrogate followed by one in the range 0xDC00 to 0xDFFF for the
second, low surrogate) those numbers are also displayed.

Note: this works on (only) the whole first line of the selected text even
if more than one line is selected - or only a fraction of that first line
is selected.  The context menu entry is NOT shown if there is no selection.

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
@SlySven SlySven requested a review from a team March 9, 2018 20:31
@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Mar 9, 2018

This is one part of a significant PR I am assembling to sort out the display/selection of non-BMP or text with combining diacriticals or other unusual features that mean that there is not a straight one-to-one mapping of QChar/TChar to on-screen glyphs. It will be really useful to do things like telling the difference between say all of these - which are (almost) all representations of things that could be held to look like:

  • a: aаɑᵃₐ⒜ⓐ⠁ⲁa𝐚𝑎𝒂𝒶𝓪𝔞𝕒𝖆𝗮
  • e: eɐᵉₑ℮ℯⅇ⒠ⓔ⠊ⲉe𝈡𝐞𝑒𝒆𝓮𝔢𝕖𝖊𝗲
  • A: `ΑAᗅᗋᗩᴀᴬ₳Ⓐ⠠⠁ⲀꓮA𐌀𝐀𝐴𝑨𝚨𝛢𝜜𝝖𝞐𝒜𝔸𝕬𝖠𝗔𝘈

{OK, the capital A in Braille needs the preceding dot-6 to turn it from the lower case dot-1 of 'a' into the upper case 'A'...!}

Anyhow this is one part that can be done completely separately to the rest so I pulled it out as a separate PR to make the other one smaller. 😇

Some examples:
analysing screen content
analysing screen content 1

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Mar 13, 2018

Great tool! Since this is mostly aimed at writing triggers, what do you think about moving the analyze option to the trigger pattern field where it's more appropriate for the context?

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Mar 14, 2018

Do you mean duplicating the functionality - to show exactly what is in the patterns so that they can be checked - or, well, what? 😕

The functionality is keyed to the mouse event in the TTextEdit class that is incorporated into any TConsole so it even works in the Central Debug Console; so moving it to the editor does not make too much sense to me. 🙁

Note that it is done using the tooltip to that "analyze characters" QAction so that it only persists while the mouse hovers over it; this has the unfortunate side-effect that it is not then possible to select and copy a part of the details but it does mean that it disappears nicely afterwards and no extra code to close/clean up/prevent mutiple copies of the display is needed. It is intended to be informational - which I think it is - and of course, once the user identifies what is being shown, there is always the option to select and copy the text directly from the TConsole to paste it in the editor...

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Mar 14, 2018

The idea behind moving the editor is that since this is a very, very, advanced tool that most people will not understand - we might as well have it where it's relevant, in the editor, and not fill up the right-click menu for the main console. That way both the super technical users needing to decode the Unicode for some very specific triggers can find it where it's relevant and the newbies won't be confused by it!

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Mar 14, 2018

I'm not sure that it would be simple to put the functionality in the editor then because you would have to find a way to navigate to the particular TConsole and then hope that the line of text that you are interested in that console is the (first) currently selected one when you get there. It might be doable but it would make things more complex to try and do something like that when this, implemented as a single SLOT method (with a helper to identify those code-points that are obviously not going to have a printable representation), is relatively straightforward.

<aside>N.B. If you are ever debugging a hover or onHover type slot consider changing it temporarily to an clicked/activate type behaviour before setting breakpoints in it - under some circumstances {for me a FreeBSD OS and Qt 5.7.1, running inside Qt Creator} it can hang the Application/Qt Creator, not entirely sure why but I suspect the stopping of execution within the temporary hover code messes up/locks up the Qt Creator/Debugger parent process' interactions which use the same methods to display details in the source code at the same time... 😮 </aside>

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Mar 14, 2018

Hmm, why would the TConsole need any involvement? I am thinking the person would copy/paste the line into the trigger pattern field, and then analyze it.

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Mar 14, 2018

Oh! You mean as a dialogue that the user can (choose one of):

  1. bring up (from the TConsole console menu, or a submenu item on the "Tools" main menu) a dialogue and drop some copied text to be analysed into - note that the code here only works on the whole of the first line in the selection (even if only part of that line is selected) - this is currently because at the time of pushing this code out there the selection of part of a line containing either combining diacriticals/variation selectors/non-BMP characters (anything that breaks the previous assumption of one QChar/TChar per visible glyph) is broken (fix pending in the outstanding PR I'm progressing locally) - so that partial line copying is faulty. Also, there is a physical limit in terms of screen real estate that means handling multiple lines with the current HTML in a tool-tip is not going to fly/fit!
  2. activate from within a pattern to display whatever has been placed there - I suspect this would not want to be activate-able for colour and possibly one or two ( line count , spacer ?) other types where the content of the pattern box does not really "match" a text as such.

If method 1 was chosen it would be possible to use a scrollable, read-only text display type widget which could handle more than one line of text (mind you, if we go in that direction we might have to do something to better track how each line of text is actually ended - we currently massage the end of the lines but if we start trying to show the raw text the user will expect to see that information IMHO; i.e. did Mudlet wrap that line because it exceeded the wrap limit or did the MUD server put a line-feed there, and if it was the MUD server, did it use a line-feed, a carriage-return & line-feed pair - or what? I've seen some odd things in the code base with Mudlet stuffing either \r or \0xff on the end of some lines and I'm not sure what on earth is going on there!)

Method 2 could actually be done as well as what is here now and would, IMHO complement this PR so far. The current "pop-up" can show what is in the "incoming" text - and - for the right sub-set of the trigger patterns (beginning of line/substring/exact match) replicating the action on those would show what the match would be - if we really wanted to be fancy we could have a second "test" context menu action for the pattern item to say whether it would fire on the (whole of the first line of the) selection in the main TConsole - such a go/no-go indicator might be helpful for those trying to debug the more complex PCRE patterns (although not as much, perhaps, as the Regex101 website)...

I do like the idea of 2 as well but are you really against the current proposal as it is now as a pop-up tooltip on an item in the context menu of any TConsole?

Does anyone else, having tried this {Windows; macOs or Linux} have any opinions?

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Mar 16, 2018

Yeah, because I think this is a very advanced option that'll seldomly be used (even you won't use it that much once done with Unicode work?), so having it in the right-click menu for everything is concerning me for reasons of screen real state and impression of the client, don't want to have a tool that'll confuse most people and make them feel stupid that they don't understand it right in their face all the time.

@vadi2 vadi2 assigned SlySven and unassigned vadi2 Mar 26, 2018
@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Mar 26, 2018

Would it be acceptable to hide it unless a "Special Options" item is checked to enable it?

I am currently finding it useful but I can see where you are coming from in that it is only going to be used in situations where the end-user is not quite sure what they are seeing and want to check the details more carefully than normal - or because they are getting a character that they are not expecting! Fair enough that is not that likely for Western Latinate languages but I suspect it might be an issue for some other idographic type writing systems with hundreds if not thousands of glyphs...!

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Mar 26, 2018

OK!

# By Vadim Peretokin (81) and others
* development: (162 commits)
  BugFix: set Server Encoding correctly on auto-loading profiles
  Install xz-utils to guarantee xz is available
  Create and upload source tarballs on release
  BugFix: use Alternative OpenSSLBinary for Windows (Mudlet#1850)
  Fix auto-save kicking in and blocking save profile as
  Upgrade a few classes to newer Qt connect style (Mudlet#1846)
  Fix package exporter to work with async save (Mudlet#1832)
  Pulled out actions into a mudlet member class variables (Mudlet#1839)
  Create module zip if it's not already created when syncing (Mudlet#1842)
  i18n-ise GUI label creation (Mudlet#1838)
  Align "no map"message in centre propely (Mudlet#1837)
  New Crowdin translations (Mudlet#1756)
  Delete old version checks (Mudlet#1833)
  Re-set dev.
  3.11.1 bugfix release.
  Fix iterator to actually iterate
  BugFix: fix faulty log options for new profiles or old profile save files
  Big5 support (Mudlet#1808)
  BugFix: include a fail-back icon for the auto-saved profile in Con. Dialog
  Back to development we go!
  ...

Conflicts resolved in:
* src/TTextEdit.cpp
* src/TTextEdit.h

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
I found there are some non-printing Unicode code-points that it is sensible
to report - especially as they cannot currently be shown {the FitzPatrick
Emoji modifiers that give a skin colour tint to human related Emojis},
however as they are not on the BMP a revision was needed to:
`inline QString TTextEdit::convertWhiteSpaceToVisual(...)`

Some undefined code sequencing (using multiple operator++ calls in the
multiple operators of a method meant the order in which they were used
was arbitrary) and produced errors when done in the wrong order.

Added per profile option to enable (disabled by default) this analyser as
requested.

Added an additional line to the information table which gives the decimal
escape codes that would be needed to specify the individual bytes that
make up a non-ASCII grapheme in Lua for characters that are not printable
ASCII characters.

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
@SlySven SlySven requested a review from a team August 2, 2018 22:49
@SlySven SlySven assigned vadi2 and Kebap and unassigned SlySven Aug 2, 2018
@SlySven SlySven added this to the 3.12 milestone Aug 2, 2018
@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Aug 3, 2018

Have added another line which gives the decimal escapes (the \### where ### = 001 to 031; 127 to 255) for Lua for non-ASCII characters so that UTF-8 sequences can be reproduced in the editor (I've also started the indexing at 1 not 0 to reflect that lua does that for arrays and string indexing).

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Oct 25, 2018

... and I got it (slightly) wrong! 😢

… typo

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
@SlySven SlySven changed the title WIP. Add text analyser to consoles Add text analyser to consoles Oct 26, 2018
@SlySven SlySven assigned vadi2 and unassigned SlySven Oct 26, 2018
@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Oct 26, 2018

Okay, this should be good enough - :shipit:

"encodings of <b>Big5</b>, <b>GBK</b> or <b>GBK18030</b> and 'narrow' for all others.</li></ul></p>"
"<p><i>This is a temporary arrangement and will probably change when Mudlet gains "
"full support for languages other than English.</i></p>"));
checkBox_enableTextAnalyzer->setToolTip(tr("<p>Check this option to activate a context (right click) menu action on any "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Check this option to activate" is superflous, of course checking this option will do something 😉

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so - though some of your Discord options do sort of work in reverse! I.e. check them to NOT do something which keeps coders on their toes when comparing the code to what the user sees...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will review this text.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tweaked the text, lets see if that will do.

"<tr><th>%12</th>%13</tr>"
"</table></small></body></html>")
.arg(completedRows)
.arg(tr("Index (UTF-16)", "1st Row heading for Text analyser output, table item is the count into the QChars/TChars that make up the text {this translation used 2 times}"), indexes)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend moving the two index rows side by side for better clarity.

Copy link
Copy Markdown
Member Author

@SlySven SlySven Oct 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, at present the UTF-16 is all above the visual representation and the UTF-8 is all below and the separation is useful when the two are not in lock-step across the line e.g. consider this example I made for another PR where the differences in the indexes really shows up:
indexing into utf16 and utf8 text example
It is even more obvious when you stray off the Basic Multi-plane because then you start getting UTF-16 surrogate pairs and by definition at least three if not four bytes for the UTF-8 representations.

}

if (rowItems > rowLimit) {
if (isFirstRow) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selecting multiple lines breaks the table:

image

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isFirstRow is what is used to only show those sidings (horizontal headings? 😜) on the first row of the table. It'd be interested to check the code that works out what part of the TBuffer to analyse is deciding in that situation - perhaps I broke something when I revised it to only consider the selected part of the first line... 😢

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't be happening - I'll have to take another look...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't reproduce this - have you got a replay with that text in it?

@vadi2 vadi2 assigned SlySven and unassigned vadi2 Oct 26, 2018
@vadi2 vadi2 modified the milestones: 3.12, 3.15 Oct 26, 2018
@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Oct 28, 2018

I've also found that I need to fix things to handle the display of < (and I guess > and possibly &) as it is not being shown in the "visual" or "lua character or code" boxes...

…(...)

In the new TTextEdit::convertWhitespaceToVisual(...) {renamed from
"convertWhiteSpaceToVisual" !) I've moved the comments that describe each
"code" for each white space (for which there are thus little or no visible
means to tell them apart otherwise) characters into the translation tr(...)
call so that it is available to translators.  For most languages it will be
best not to attempt to translated these codes but for ideographic
lanaguages where each grapheme can be a complete word it is possible that
a complete explaination will be of comparible space on screen as the "code"
that explains what each white-space character is so is worth providing for
those languages only!

Also fixes a bug where the less-than character is seen as the code in Qt
Rich-Text/HTML that opens a formatting tag and is not displayed as a '<'.

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
Ran clang-format on the whole of ./src/TTextEdit.cpp

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
@SlySven SlySven assigned vadi2 and unassigned SlySven Dec 16, 2018
Although wanted by reviewer this might not be a good idea if large numbers
of Emoji characters are involved with FitzPatrick modifiers as the display
widget might need to be a bit large even though it is limited to showing
sixteen glyphs per row in the tabular layout.

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
Copy link
Copy Markdown
Member

@vadi2 vadi2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's suck it and see 👍

@SlySven SlySven merged commit e286d1e into Mudlet:development Dec 17, 2018
@SlySven SlySven deleted the Enhance_addMudTextLineGraphemeAnalyser branch December 17, 2018 19:07
@SlySven SlySven restored the Enhance_addMudTextLineGraphemeAnalyser branch June 22, 2020 18:02
@SlySven SlySven deleted the Enhance_addMudTextLineGraphemeAnalyser branch June 22, 2020 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants