Refactoring/reimplementation of MXP support#3625
Conversation
* These refactorings are intended to simplify TBuffer class, extracting responbilities that aren't directly related to the buffer mgmt and support extracting MXP protocol handling from TBuffer. - Extracted TEncodingTable, this class is now responsible for managing a map of computer-friendly encoding names as keys. - Extract TLinkStore to manage links and associated hints to be displayed.
|
Hey there! Thanks for helping Mudlet improve. 🌟 Test versionsYou can directly test the changes here:
No need to install anything - just unzip and run. |
- Extracted code responsible for detecting entities such as > < etc from the main loop in translateToPlainText to a specialized class that keeps track of identifying an entity and replacing it into the text stream
- Extracted TMxpTagDetector class to process input and extract MXP tags for later processing
|
🚨 Please be aware that the encoding tables you are manipulating are in the process of significant reorganisation in PR #3579 . That will mess with this one somewhat. |
vadi2
left a comment
There was a problem hiding this comment.
These changes are looking good 👍 unfortunately as mentioned #3579 doesn't just add a new encoding but also re-engineers a fair bit of things so it'll conflict with this re-engineering as well. It's just about to make it in, so let's wait for that first and continue with this?
Sorry that the 2nd thing you picked was already worked on! I hope this won't happen often. Ideally we like to keep PRs small and scoped to one topic.
vadi2
left a comment
There was a problem hiding this comment.
Yep, looks great, let's just get the outstanding PR in first and this is good to go.
src/TMxpTagDetector.cpp
Outdated
| ***************************************************************************/ | ||
| #include "TMxpTagDetector.h" | ||
|
|
||
| bool TMxpTagDetector::handle(char& ch, size_t& localBufferPosition) |
src/TEncodingTable.cpp
Outdated
|
|
||
| #include "TEncodingTable.h" | ||
|
|
||
| const TEncodingTable TEncodingTable::defaultInstance = TEncodingTable(csmEncodings); |
There was a problem hiding this comment.
Also happy to see this not bloat up TBuffer 👍
src/TEntityHandler.cpp
Outdated
| } | ||
|
|
||
| return false; | ||
| } No newline at end of file |
src/TEntityHandler.cpp
Outdated
|
|
||
| #include "TEntityHandler.h" | ||
|
|
||
| bool TEntityHandler::handle(char& ch, std::string& localBuffer, size_t& localBufferPosition, size_t localBufferLength) |
There was a problem hiding this comment.
👍 nice class too. Unrelated, but Evennia and Mudlet at times still struggle to cooperate on what is a < and what is a <
|
|
||
| #include "TLinkStore.h" | ||
|
|
||
| int TLinkStore::addLinks(const QStringList& links, const QStringList& hints) |
|
Ok, we can try to handle the merging when #3579 is finished. I have some other changes to make, I am almost managing to fully extract MXP from TBuffer. |
- Extracted TMxpTagProcessor class, responsible for parsing MXP tags and executing the expected behavior
- TMxpProcessor keeps the state of MXP and delegates tag detection and processing to TMxpTagDetector and TMxpTagProcessor
|
I have just cancelled the current MacOS CI as that has been broken for a couple of days - and I am trying a bugfix which was queued behind it... ... hopefully if it flies and goes in then you (or I) will be able to go into Travis and restart the CI build jobs for Macs. |
- Includes a table of 64 standard codes (uses a hash table instead of previous if/else comparisons) - Supports unicode codes in decimal and hexdecimal such as   and   - Supports registering custom entities (useful for implementing MXP spec) - Includes tests for the resolver and the MXP entity handler
vadi2
left a comment
There was a problem hiding this comment.
Nice to start on tests for the C++ side 👍
src/TEntityResolver.cpp
Outdated
|
|
||
| QString TEntityResolver::getResolution(const QString& entity) const | ||
| { | ||
| if (entity.front() != '&' || entity.back() != ';') |
There was a problem hiding this comment.
Could you still wrap single line ifs with braces - we've adopted that codestyle across the codebase. Apple's Goto fail was enlightening here.
There was a problem hiding this comment.
Could you still wrap single line ifs with braces - we've adopted that codestyle across the codebase. Apple's Goto fail was enlightening here.
Ok, I'll ty to stick to it. What do you use for helping to enforce the coding style?
Is there any naming convention you use in the project for classes. I am putting these Ts in front of class names just because they are already used around, but I didn get what is the meaning.
There was a problem hiding this comment.
clang-format - we have a .clang-format file at the root of the repo with the style. Unfortunately clang-format doesn't handle adding braces - there was some other tool that does, but we keep it simple with 1 formatter.
Naming convention - for variables it's m for a member variable, s for static and c for constant, hence, scm in a few places. I'm not sure about classes, to be honest :O
There was a problem hiding this comment.
And a p for pointer.
*quoting many a teenage contestant on the UK ITV Quiz show from the late 80s "BlockBusters" to the host Bob Holness as in this Celebratory reprisal hosted by Dara O'Brean*:
"Can I have a P(ee) please Bob."
|
Hey we've fixed up the failing macOS builds - I've updated your branch with the fix so it builds. |
|
Sorry it's taking a while - it'll be ready in 1-2 days max. |
- Extracts MXP support from TBuffer - Support for parsing MXP tags, replacing direct string manipulation by accessing MxpTag attributes - Class hierarchy of MxpTagHandler for implementing support for each tag. Allow for a clear separation of the implementation of each tag and makes it easier for adding new support - Defines a clear interface separating the MXP implementation from other parts of the code - Solves some limitations of previous implementation such as supporting element definitions not only for <SEND> tags, and interpolating named and positional attributes - Supports COLOR tag
…nto mxptbufrefactoring
- Removes the MxpTagDetector, previously used to identify opening/closing <> tag symbols - Replaced in previous commits by TMxpNodeBuilder, which detects and parse tags into a MxpTag object
…nto mxptbufrefactoring
- Events are added in a queue from TMxpMudlet (MXP client interface) - TConsole sends the event to the TLuaInterpreter on printOnDisplay
SlySven
left a comment
There was a problem hiding this comment.
This is a massive PR - in other circumstances I would be proud (and Vadim would, I guess, be cringing at the thought) of it. MXP is not something I want to touch with a 10 foot pole so for someone else to pick it up and give it a good going over is brilliant. 🙂
However the gains from factoring out the encoding tables are marginal IMVHO - fair enough it moves them to a separate class - however they (apart from my recent excursion into the TTextCodec class) are only really of relevance to this class. I was careful in the design to ensure that there is only ever one set of the tables in the whole application and that by using the QVector form for each table that lookups could be as good as possible. I note that you have incorporated the changes that I made recently to factor out the one potentially non-const element of the tables (their translated names for presentation in the preferences dialogue). TBH I would be happier - and this PR would be smaller if those tables were left as they were for the moment.
I guess the MXP will need a thorough testing - but does anyone (apart from Zugg) know exactly it is supposed to behave - i.e. how can we actually test this in practice? 😜
src/CMakeLists.txt
Outdated
| TScript.cpp | ||
| TSplitter.cpp | ||
| TSplitterHandle.cpp | ||
| TStringUtils.cpp |
src/CMakeLists.txt
Outdated
| TScript.h | ||
| TSplitter.h | ||
| TSplitterHandle.h | ||
| TStringUtils.h |
| #include <QTextStream> | ||
| #include "post_guard.h" | ||
|
|
||
| #include "TMxpMudlet.h" |
There was a problem hiding this comment.
I'd suggest moving these two up into the other Mudlet ones above the guarded Qt ones...
| const MxpTagAttribute& MxpStartTag::getAttribute(const QString& attrName) const | ||
| { | ||
| const auto ptr = mAttrsMap.find(attrName.toUpper()); | ||
| return *ptr; |
There was a problem hiding this comment.
Does one need the local ptr value - I am not sure about return value optimisations and other things - and as you are not returning a reference the const on the return value is meaningless - same thing actually applies elsewhere in this class BTW...
There was a problem hiding this comment.
Strange - i thought I should have spotted that '&' - guess I've been staring at too much code. 🕶️
There was a problem hiding this comment.
Ah, hang on - you are returning a reference, and it refers to, let me think, a result from using QMap<T1, T2>::find(const T1 &) which returns an iterator pointing to the T2 type value with the given key. Maybe that is safe after all. {I've heard nasty tales of returning references to variables with local scope but that doesn't seem to be the case here, right?} 😖
There was a problem hiding this comment.
Ah, hang on - you are returning a reference, and it refers to, let me think, a result from using
QMap<T1, T2>::find(const T1 &)which returns an iterator pointing to the T2 type value with the given key. Maybe that is safe after all. {I've heard nasty tales of returning references to variables with local scope but that doesn't seem to be the case here, right?}
Exactly, this is to avoid returning a local reference. But I think the class can be redesigned to hold a value instead of reference.
| for (const auto& attrName : mAttrsNames) { | ||
| result.append(' '); | ||
| if (attrName.contains(" ") || attrName.contains("<")) { | ||
| result.append('"'); |
There was a problem hiding this comment.
Use QStringLiteral(...) here as well.
| { | ||
| public: | ||
| TMxpVersionTagHandler() : TMxpSingleTagHandler("VERSION") {} | ||
| inline static const QString scmVersionString = QStringLiteral("\n\x1b[1z<VERSION MXP=1.0 CLIENT=Mudlet VERSION=%1>\n"); |
There was a problem hiding this comment.
Are we really at the level of MXP 1.0 with all these changes? We certainly weren't with the previous code!
There was a problem hiding this comment.
I didn't change this part, the previous code already claimed it was MXP 1.0.
| ***************************************************************************/ | ||
| #include "TStringUtils.h" | ||
|
|
||
| QStringRef TStringUtils::trimmedRef(const QStringRef& ref) |
There was a problem hiding this comment.
Unlike some of the above classes and their methods, these really are working on references and need to return const values (erm, references)! Indeed they ought to be marked as const method as well shouldn't they?
src/TStringUtils.cpp
Outdated
| return isOneOf(ch, "\'\""); | ||
| } | ||
|
|
||
| bool TStringUtils::isOneOf(QChar ch, const char* str) |
There was a problem hiding this comment.
Remember QStrings are UTF-16 based and that, like UTF-8, is a variable length encoding QChar to convey their meaning (in which case ch.isSurrogate() will be true for both QChar that make them up) .
src/TStringUtils.h
Outdated
|
|
||
| static void toUpper(QString& str) | ||
| { | ||
| for (auto& ch : str) { |
There was a problem hiding this comment.
How can this work for graphemes not in the BMP? I.e. ones that need two QChars to convey their meaning?
There was a problem hiding this comment.
All this MXP code deals only with tags and their attributes, the content is just "ignored" and handled by the other parts of the code. But this function is not used anymore, I'll delete it
| const QVector<QChar>& TEncodingTable::getLookupTable(const QByteArray& encoding) const | ||
| { | ||
| const auto ptr = mEncodingMap.find(encoding); | ||
| return ptr != mEncodingMap.end() ? ptr.value() : csmEmptyLookupTable; |
There was a problem hiding this comment.
The original code that used a QMap<T1, T2>::value(const T1&) is actually clearer and simpler as it automatically returned a suitable null result (well, an empty table) if the encoding key is not found - so no need to have to declare a dummy one like you have to with this STL style getLookupTable(...) method.
There was a problem hiding this comment.
I think there's a semantic improvement. It's clear that this is not just a Map from a byte array to a vector of chars datastructure, but something for holding encodings and their names.
But the idea was to be a starting point, now you can move the code in TBuffer that deals with encoding to this class.
Basic principle of separation of concerns. Right now TBuffer is doing too many things. It handles protocols, char coloring, encodings, ... maybe some of these responsibilities could be better delegated to different classes.
|
I think it's good to move those tables out - it makes the huge TBuffer class smaller and more focused on what it should do 👍 Same with the big tables in |
|
But does it slow things down with extra layers of function calls? This one is giving me indigestion as it is a lot to take in - OTOH I suppose it is sauce for the gander! |
| const QMap<QByteArray, QVector<QChar>> getEncodings() const { return mEncodingMap; } | ||
| QList<QByteArray> getEncodingNames() const; | ||
|
|
||
| const QVector<QChar>& getLookupTable(const QByteArray& encoding) const; |
There was a problem hiding this comment.
This function is pointless IMHO - the data is a single constant static QMap that is public and a Qt QMap<QByteArray, QVector<QChar>>::value(encoding) at the single point of use will return a suitable null (empty) table automagically without the need for csmEmptyLookupTable... 😝
I agree. I think it is a lot better to have more smaller more focused classes than huge files with lots of members. Besides the gains in comprenhension, it makes it easier to handle merges as people are less likely to be changing the same file, reduces compile time when you change something, makes IDEs fasters (CLion gets laggy in some files of this project). As for the performance issue of the "extra call", the compiler is smart enough to optimize things and inline it. A GUI app is not the kind of application where we should look for optimizing function calls. |
|
Awesome! Is this good to go guys? |
|
I guess we will see how this all works in practice - if it does break anything I guess we will just have to glue the bits back together again. I cannot say that I approve it because there is too much to take in - but I realise that there is no reason to disapprove it either. Whilst I thought that de coding Extended ASCII streams is something that should be kept in here if there is no significant harm to moving that to a separate class then factoring the code out to there is not so unreasonable after all. When it comes to the If @gcms can go through this and post up a cleaned up version then the person who does put it into the development code can use the that to replace the text that would be used by default! 😀 |
I think the PR description gives a good overview of what's in it. Maybe we can use it as commit message? If you think it's still too long I can try to make it more succinct. |
|
The description is fine. Agree on the mega PR, they are painful - splitting up into bitesize pieces makes them much more manageable. But it's OK for the first time :) Mind resolving the merge conflict? |
|
It's in - brilliant, thanks a lot :) |
|
Coverity found a few minor issues, mind fixing them? |
I pushed it to this branch. Do I need to create a separate PR? |
|
Hm, try. If it doesn't work, try braching off |
Refactoring/reimplementation of MXP protocol
Extracts MXP support from TBuffer
Separates input parsing from protocol processing: input is parsed into MxpTag objects by the MxpNodeBuilder and are passed to be handled by the MxpTagProcessor
Class hierarchy of MxpTagHandler for implementing support for each tag: allows for a clear separation of the implementation of each tag and makes it easier for adding support for new tags
Defines a clear interface (MxpClient) separating the MXP implementation from other parts of the code
Solves some limitations of previous implementation such as supporting element definitions not only for<SEND> tags, interpolating named and positional attributes and interpolating &text; placeholder by tag content
Includes support for COLOR tag
Handles FONT, U, I, B, VAR and !ENTITY tags, but so far no defined behavior (can be implemented in MxpMudlet)
Attributes that hold MXP state are now in the Host class to avoid needless copies when TBuffer is copied around
Introduces QtTests for some of the classes
Other refactorings intended to simplify TBuffer class, extracting responbilities that aren't directly related to the buffer mgmt
Extracted TEncodingTable from TBuffer, this class is responsible for mapping encoding names to tables. It also removes some bloat from TBuffer.
TLinkStore to manage links and associated hints to be displayed.
TEntityResolver to map character entities, such as > &Mudlet#32;   to their associated string values; supports interpolating strings replacing the entity placeholders by their values
#### Brief overview of PR changes/additions Internal refactoring, no functional changes. - use `.arg()` - ~~use `QStringBuilder` with % operator~~ didn't work as expected with multiple code lines #### Motivation for adding to Mudlet Fix #8306 Enhance #3625 by @gcms #### Other info (issues closed, discussion etc) Increasing readability, shorting code ~50%.

<SEND>tags, interpolating named and positional attributes and interpolating &text; placeholder by tag content>  to their associated string values; supports interpolating strings replacing the entity placeholders by their valuesBrief overview of PR changes/additions
Motivation for adding to Mudlet
Other info (issues closed, discussion etc)