Skip to content

Enhance: add support for CP437 encoding#3579

Merged
SlySven merged 14 commits intoMudlet:developmentfrom
SlySven:Enhance_addSupportForCP437AndOtherEncodings
Apr 21, 2020
Merged

Enhance: add support for CP437 encoding#3579
SlySven merged 14 commits intoMudlet:developmentfrom
SlySven:Enhance_addSupportForCP437AndOtherEncodings

Conversation

@SlySven
Copy link
Copy Markdown
Member

@SlySven SlySven commented Apr 8, 2020

This encoding is often used within the font within the ROM of PC Display cards and - with a lot of box drawing characters - is popular in some MUDs for inline map displays.

Additionally I have also managed to include the following extra encodings that are derived from CP437:

  • CP667 - "Mazovia" Used in MSDOS for Polish.
  • CP737 - Used in MSDOS for Greek, more popular than CP869 but is missing a couple of characters.
  • CP869 - Used in MSDOS for Greek, less popular than CP737 but more complete.

Signed-off-by: Stephen Lyons slysven@virginmedia.com

This encoding is often used within the font within the ROM of PC Display
cards and - with a lot of box drawing characters - is popular in some
MUDs for inline map displays.

Additionally I have also managed to include the following extra encodings
that are derived from CP437:
+ CP667 - "Mazovia" Used in MSDOS for Polish.
+ CP737 - Used in MSDOS for Greek, more popular than CP869 but is missing
  a couple of characters.
+ CP869 - Used in MSDOS for Greek, less popular than CP737 but more
  complete.

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
@SlySven SlySven requested a review from a team as a code owner April 8, 2020 19:13
@SlySven SlySven requested a review from a team April 8, 2020 19:13
@add-deployment-links
Copy link
Copy Markdown

add-deployment-links bot commented Apr 8, 2020

Hey there! Thanks for helping Mudlet improve. 🌟

Test versions

You can directly test the changes here:

No need to install anything - just unzip and run.
Let us know if it works well, and if it doesn't, please give details.

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Apr 9, 2020

@SlySven this is crashing because the mpOutOfBandDataIncomingCodec is null when this echo is being printed from ctelnet.cpp:252:

qDebug().nospace() << "cTelnet::encodingChanged(" << encoding << ") INFO - Installing a codec for OOB protocols that can handle: " << mpOutOfBandDataIncomingCodec->aliases();

@vadi2 vadi2 assigned SlySven and unassigned vadi2 Apr 9, 2020
@SlySven SlySven marked this pull request as draft April 9, 2020 13:27
@thoth-medievia
Copy link
Copy Markdown

thoth-medievia commented Apr 9, 2020

The new definitions in TBuffer.cpp look right, but this is a limitation in QTextCodec, which has a list of supported codecs here: https://doc.qt.io/qt-5/qtextcodec.html - you will have to create a new codec class for this. I found a GPL2 example of someone doing that for CP437, here: https://github.com/openmoko/openmoko-svn/blob/master/src/host/qemu-neo1973/phonesim/lib/serial/qatutils.cpp - search down for "437" and you will see it.

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 9, 2020

Ah, and reading the documentation:

[static] QTextCodec *QTextCodec::codecForName(const QByteArray &name)

Searches all installed QTextCodec objects and returns the one which best matches name; the match is case-insensitive. Returns 0 if no codec matching the name name could be found.

Note: This function is thread-safe.

The emphasis is mine - but it seems that with CP437 (or one of the other ones) we are getting a case where there is NOT an encoder - which is a concern...

SlySven added 2 commits April 9, 2020 18:42
Hadn't arisen in the past but two of the new ones in the last commit are
not supported by the codecs that Qt provide in the current (5.14.2) version
of it's libraries.

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
This is the CP667 (Mazovia) and CP737 (DOS Greek) ones, the other Greek one
with more characters but supposedly less used is okay.

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 9, 2020

Of course https://doc.qt.io/qt-5/qtextcodec.html mentions:

If Qt is compiled with ICU support enabled, most codecs supported by ICU will also be available to the application.

I believe the Qt libraries I am using have ICU support included.

@thoth-medievia
Copy link
Copy Markdown

That's right - there's no encoder. Here are the one's QT supports by default. For everything else you would have to create the class yourself. I put a link above where someone did that for IBM437/CP437

QT Default Supported Encodings:

Big5, Big5-HKSCS, CP949, EUC-JP, EUC-KR, GB18030, HP-ROMAN8, IBM 850, IBM 866, IBM 874, ISO 2022-JP, ISO 8859-1 to 10, ISO 8859-13 to 16, Iscii-Bng, Dev, Gjr, Knd, Mlm, Ori, Pnj, Tlg, and Tml, KOI8-R, KOI8-U, Macintosh, Shift-JIS, TIS-620, TSCII, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, Windows-1250 to 1258

@thoth-medievia
Copy link
Copy Markdown

If ICU Supports them, perhaps we're just calling them by the wrong names? For example, I think the common name for CP437 may be IBM437, and while using QT you can use the name they use, but if relying on ICU for the codec, we may need to look up what They call it...

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Apr 9, 2020

The latest build is able to change the encoding fine now, try it out.

@thoth-medievia
Copy link
Copy Markdown

It looks like we are able to change the encoding to CP437 now :) However, the system now crashes once connected to a server with either of the new encodings active, on sending any data.

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 9, 2020

Well the aliases that Qt knows for the CP437 codec in the (WIndows 64Bit) system I have in front of me are:

  • ibm-437_P100-1995
  • ibm-437
  • cp437
  • 437
  • csPC8CodePage437
  • windows-437

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 9, 2020

As for still crashing - I have an idea which I will check out in the next couple of hours...

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 9, 2020

Which MUD are you using this encoding for BTW?

@thoth-medievia
Copy link
Copy Markdown

thoth-medievia commented Apr 9, 2020 via email

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 9, 2020

🤦‍♀️

Ah, if your version (or rather the one built by us on AppVeyor) does not have the same encoders as mine does then yeah - it looks as though it will explode as soon as you try to send data - there will be the warning message:
cTelnet::encodingChanged("CP437") WARNING - Unable to locate a codec for OOB protocols that can handle: "CP437"
but that won't stop the Application shooting itself in the foot by trying to use a pointer that points to nothing afterwards...

@thoth-medievia
Copy link
Copy Markdown

thoth-medievia commented Apr 9, 2020 via email

@thoth-medievia
Copy link
Copy Markdown

As for still crashing - I have an idea which I will check out in the next couple of hours...
Did you have any luck with this? BTW - it seems to me like the local encoding actually works. So long as you don't send data - the received data appears to be encoded. I can't say for sure that it's encoded PROPERLY though.

:)

SlySven added 2 commits April 12, 2020 00:48
Converts the names of the encodings (used internally) from `QString`s to
`QByteArrays` as that is what Qt uses for it's own management of them.

Take the translated Server Encoding name out of the static and thus
immutable TBuffer::csmEncodingTable and put them in a shared
`(QMap<QByteArray, QString>) mudlet::mEncodingNameMap`

Convert the preference dialogue to store the encoder required as a
QDataRole::User data item in each item rather than using the (translated)
text itself. This is more robust and simpler to handle. It also eliminared
the need for a couple of convertors to change between what is being used
internally to identify the encoders and what a user might use as the latter
are not actually used now.

As the way we provide the multi-byte decoding of Big5 is done it turns out
that we can supply both Big5-ETen (effectively what Qt provides as Big5+)
and Big5-HKCS, just by asking Qt for the correct one. So both are now
offered.

Check for each 8-bit Extended ASCII encoder on startup and do not offer
encoders that we do not provide - this should eliminate problems between
what the coder/developer has in their Qt libraries compared to what the
CI (installer builder) uses...

Four custom codecs are included here that I have written to support
the encodings listed - which is needed for Windows CI builds which uses
only Qt and NOT IUC libraries to provide encoders. These are integrated
into the above start-up checks and they are inserted into the system only
if what they can replace is not present.

Prepend an 'm ' to the displayed codec in the preferences to make it clear
that it is a Mudlet provided one (and might be slighly different {wrong!}
compared to an offical one). The `getServerEncoding()` result will give a
`M_` prefix to the same data so that they can be distinguished.

Replace the "CP874" encoding as it was erronously being called
"ISO 8859-3 (South Europe)" in the preferences and was probably clashing
with the "ISO 8859-3" real encoding. As it happens the proper CP874,
according to Wikipedia, has three codes that required TWO unicode points
to encode. This is not something that can be supported by the
`QVector<QChar>` based sysmtem that I have developed here {at least not
without some exceptional extra code) however the encoding IS very similiar
to CP1161 {a.k.a. Windows-874} and that CAN be done in this manner!

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
…errors

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Apr 13, 2020

The internal details of the encodings are still exposed to the user and will need to be exposed in the documentation... my experience says if you do something like "here's a thing, but you're supposed to ignore this thing", people won't be better off and informed, but more confused and wondering why did they need to learn something not relevant to them.

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 13, 2020

The internal details of the encodings are still exposed to the user and will need to be exposed in the documentation...

Do you mean from the flag to the getServerEncodeXXXX() pair of functions that will only pop up in the error message if they provide an argument which is not suitable and we are not documenting at present, and in the UI in the encoding selector.

The former we could mention in a note at the bottom in the wiki:
"To support the use of some encodings on some OSes Mudlet has to provide its own, what is called an encoder to stand-in for what in other circumstances would be provided by the Qt system of libraries, in some limited cases it is useful to know when that is happening and providing a boolean true argument to this function will cause that encoder to be identified with a M_ prefix."

and the latter in a tooltip:
"To support the use of some encodings on some OSes Mudlet has to provide its own, what is called an encoder to stand-in for what in other circumstances would be provided by the Qt system of libraries, in some cases it is useful to know when that is happening and those are identified in the lists with an m prefix."

If we do not do this then it will be very difficult to determine from the end user if an encoding defect is coming from bugs in the codecs I have created or something else...

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Apr 16, 2020

That's a fair point, but still, it's confusing information for the non-techies. You know how a well-written technical explanation still goes over people's heads - at best they don't understand it, at worst they think the client is confusing because it uses all sorts of terminology they don't understand. This is how Mudlet becomes "confusing and complicated" for them, and we can't have that. That's my half of @Mudlet/ui-review, perhaps @Kebap has a take on this as well.

But still, we do need to debug things. How about we make a special function getMudletDebug() that returns the name of these internal codecs actually used and other important internal information?

That way we don't 'litter' every function with a debug flag and keep it all in one tidy spot.

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Apr 17, 2020

SlySven#10 removes the debug flag and #3633 adds the function. With that this'll be all set, and we gain a new pattern for showing important debug info!

image

@vadi2 vadi2 mentioned this pull request Apr 17, 2020
@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 17, 2020

Are you okay with the m prefix in the GUI selector for the encoder selection with a tooltip...

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Apr 17, 2020

I don't think we should show it there either, for the same reasons - more confusion than help 😀

@thoth-medievia
Copy link
Copy Markdown

We've been doing some testing on Medievia - and this latest version works perfectly! When do you expect this will be merged in?

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Apr 19, 2020

It's just about ready to go and will be in the next release (https://github.com/Mudlet/Mudlet/milestones).

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 21, 2020

I am on this now - got diverted into a related matter which I realise I will have to re-insert into my task queue a bit further back. I aim to have this done in the next 24 hours.

Revised the Lua getMudletInfo() function to handle the encoder details
being `QByteArray`s instead of `QString`s. Also revised the text display
format better so that it can be copied-and-pasted somewhere else and
actually fit on the screen - this required including some code to wrap
the text at the comma separator. I think the code is good enough but a
second opinion in case there are any corner-cases would be welcomed.

Added a short explanation about the "m " prefix in the tool-tip for the
encoder selection widget in the preferences.

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 21, 2020

Okay I have a workable setup now - including some reworking of the getMudletInfo() - I have more extensive plans for that but they can wait for another day... 😈

image

Somehow though my repo has gotten out of step with the main one so I am going to have to merge in and resolve conflicts from the development branch... 🤷‍♂️

…odings

Conflicts resolved in:
* src/TBuffer.h
* src/TLuaInterpreter.cpp

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
</property>
<property name="toolTip">
<string>&lt;p&gt;If you are playing a non-English game and seeing � instead of text, or special letters like &lt;span style=&quot; font-weight:600;&quot;&gt;ñ&lt;/span&gt; aren't showing right - try changing the encoding to UTF-8 or to one suggested by your game.&lt;/p&gt;</string>
<string>&lt;p&gt;If you are playing a non-English game and seeing � instead of text, or special letters like &lt;span style=&quot; font-weight:600;&quot;&gt;ñ&lt;/span&gt; aren't showing right - try changing the encoding to UTF-8 or to one suggested by your game.&lt;/p&gt;&lt;p&gt;For some encodings on some Operating Systems Mudlet itself has to provide the codec needed; if that is the case for this Mudlet then there will be a &lt;tt&gt;m &lt;/tt&gt; prefixed applied to those encoding names (so if they have errors the blame can be applied correctly!)&lt;/p&gt;</string>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this tooltip to stay as-is because the new information does not really provide any useful information for the player, just gives more text to read which to a non-techie might be confusing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you will have people saying WTF does that "m" mean. Also it give the hint that if they are getting odd characters when they have a need for an encoder in that group - like the Medival people - they can appreciate that we ought to be able to fix it if they get in touch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the purpose of getMudletInfo() and the internal codec should not leak to the tooltip, if it is!

knownEncodings.append(adjustEncoding(encoding));
}

QString encodingNames{QLatin1String(knownEncodings.join(", "))};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's this PR's scope to rework how getMudletInfo() does the display.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous one had to be reworked to handle the different way the data was changed and it had some short-comings - using the Host::postMessage(...) actually produced a poor layout for the information (the hanging indent meant it took up more space and the colouration didn't help and there wasn't sensible line breaking for what is a long line of text - as postMessage(...) expects multi-line texts to be pre-broken into lines).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different problem unrelated to adding CP437, I think. The old layout is far more compact than the new one, and more compact = good, less text to read.

@SlySven SlySven merged commit 3e5d480 into Mudlet:development Apr 21, 2020
@SlySven SlySven deleted the Enhance_addSupportForCP437AndOtherEncodings branch April 21, 2020 22:59
@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Apr 23, 2020

@thoth-medievia can we get a screenshot of the CP437 encoding in action with your game's map for the release post?

@thoth-medievia
Copy link
Copy Markdown

thoth-medievia commented Apr 23, 2020 via email

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Apr 23, 2020

I don't think they came through on Github.

To list your game, check out https://wiki.mudlet.org/w/Listing_Your_MUD :)

We'll mention the game anyway when we add it to the defaults list - spend it instead on promoting it to the outside world :)

@thoth-medievia
Copy link
Copy Markdown

mudlet-medievia-screepcap-3
mudlet-medievia-screepcap-2
mudlet-medievia-screepcap-1

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented Apr 23, 2020

That looks amazing, could you also do .png not .jpg? That negatively impacts quality

@SlySven
Copy link
Copy Markdown
Member Author

SlySven commented Apr 23, 2020

What would be useful would be a 4:1 width:height ration .png (or maybe .svg) logo icon (ideally with any detail in the leftmost quarter if that is relevant) along with a few paragraphs as an introduction/tease/description {for example consider the WoTMUD one - I wrote that myself!!! 😊 }. We have the connection details (medievia.com:4000) and you do NOT use SSL (do you). Obviously Medievia is the name to use for it so I think that is all that is needed.

<aside>Do you have an Application Id for Discord which we can use so that Discord RPC says "Playing Medievia" and shows your Discord guild icon (if it is stored on Medievia's Discord Guild with the key/name server-icon) rather than it saying "Playing Mudlet" - if we asked Vadim nicely I feel sure he would upload your Icon into our Guilds' resources {though of course we are limited to a total of 150} so it could show the Medievia icon even if using the Mudlet Guild resources...</aside>

@thoth-medievia
Copy link
Copy Markdown

medievia-icon

Here is our 4:1 ratio icon for Mudlet. We don't use SSL, just telnet to port 4000. We don't presently have a Discord ID, but we will get one for inclusion in the next release.

Here are our teaser paragraphs:

"
If you take the deepest and most advanced game possible, and mix it with the best chat world around, what you have is Medievia. Imagine...

Endless player-created areas to explore for the first time. Ships you can take out on the open seas. You can hunt serpents, kill monsters, attack other ships, explore new islands. We have a 4-million room expansive wilderness that you can travel through for hours in one direction, and an under ocean world that is even more vast. We have helpful dragons that fly you high above the wilderness to far off places, and evil dragons whose lairs you can defeat with teams of dozens of other players.

We have a custom font that provides beautifully rendered overhead maps, blending the best of text gaming with graphical navigation.

You can make your fortune trading via covered wagons across the wilderness, through adventure after adventure. You can fight for dominance in PvP action in zones, ships, wilderness, herobattles, arenas and quests. You can create your own bloodline for future generations of heroes to carry on your name. You can even build your own homes and castles! This is a small sample of our features. The complete list would scroll for pages, and is still growing. Medievia has been continuously developed for over 25 years.

This is a game that has the intuition to track your happiness, pride, sadness, and fear while changing itself to fit your needs! Weather, storms, wind, fire, floods, disease, even asteroids. This is literally the most dynamic game world ever attempted. This is Medievia.

Do you dare enter?
"
Thank you so much to the mudlet team for helping us with all of this. The player base is very excited to be making the transition to mudlet. :)

Warm Regards,
Thoth

@vadi2
Copy link
Copy Markdown
Member

vadi2 commented May 2, 2020

The the reqs I mentioned in #3579 (comment) all good to go? (sorry if the follow-up comment was misleading)

Chris7 pushed a commit to Chris7/Mudlet that referenced this pull request Jan 2, 2022
* Enhance: add support for CP437 encoding

This encoding is often used within the font within the ROM of PC Display
cards and - with a lot of box drawing characters - is popular in some
MUDs for inline map displays.

Additionally I have also managed to include the following extra encodings
that are derived from CP437:
+ CP667 - "Mazovia" Used in MSDOS for Polish.
+ CP737 - Used in MSDOS for Greek, more popular than CP869 but is missing
  a couple of characters.
+ CP869 - Used in MSDOS for Greek, less popular than CP737 but more
  complete.

Hadn't arisen in the past but two of the new ones in the last commit are
not supported by the codecs that Qt provide in the current (5.14.2) version
of it's libraries.

* Enhance: add own encoders for CP437/667/737/869 for use on Win CI builds

Converts the names of the encodings (used internally) from `QString`s to
`QByteArrays` as that is what Qt uses for it's own management of them.

Take the translated Server Encoding name out of the static and thus
immutable TBuffer::csmEncodingTable and put them in a shared
`(QMap<QByteArray, QString>) mudlet::mEncodingNameMap`

Convert the preference dialogue to store the encoder required as a
QDataRole::User data item in each item rather than using the (translated)
text itself. This is more robust and simpler to handle. It also eliminated
the need for a couple of convertors to change between what is being used
internally to identify the encoders and what a user might use as the latter
are not actually used now.

Because of the way we provide the multi-byte decoding of Big5 is done it
turns out that we can supply both Big5-ETen (effectively what Qt provides
as Big5+) and Big5-HKCS, just by asking Qt for the correct one. So both
are now offered.

Check for each 8-bit Extended ASCII encoder on startup and do not offer
encoders that we do not provide - this should eliminate problems between
what the coder/developer has in their Qt libraries compared to what the
CI (installer builder) uses...

Four custom codecs are included here that I have written to support
the encodings listed - which is needed for Windows CI builds which uses
only Qt and NOT IUC libraries to provide encoders. These are integrated
into the above start-up checks and they are inserted into the system only
if what they can replace is not present.

Prepend an 'm ' to the displayed codec in the preferences to make it clear
that it is a Mudlet provided one (and might be slightly different {wrong!}
compared to an official one).

Replace the "CP874" encoding as it was erroneously being called
"ISO 8859-3 (South Europe)" in the preferences and was probably clashing
with the "ISO 8859-3" real encoding. As it happens the proper CP874,
according to Wikipedia, has three codes that required TWO Unicode points
to encode. This is not something that can be supported by the
`QVector<QChar>` based system that I have developed here {at least not
without some exceptional extra code) however the encoding IS very similar
to CP1161 {a.k.a. Windows-874} and that CAN be done in this manner!

Finish off hiding the fact that we have differently named substitute
`TTextCodec`s with modified names compared to the Qt `QTextCodec`s that we
hope (but are not always rewarded with) will be present. The modified names
will not show up in the relevant Lua API functions but will be accepted if they
are explicitly asked for to set the encoding.

Revise: update original PR to fit in current development branch

Revised the Lua getMudletInfo() function to handle the encoder details
being `QByteArray`s instead of `QString`s. Also revised the text display
format better so that it can be copied-and-pasted somewhere else and
actually fit on the screen - this required including some code to wrap
the text at the comma separator. I think the code is good enough but a
second opinion in case there are any corner-cases would be welcomed.

Added a short explanation about the "m " prefix in the tool-tip for the
encoder selection widget in the preferences.

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
Co-authored-by: Vadim Peretokin <vperetokin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants