Enhance: add support for CP437 encoding#3579
Conversation
This encoding is often used within the font within the ROM of PC Display cards and - with a lot of box drawing characters - is popular in some MUDs for inline map displays. Additionally I have also managed to include the following extra encodings that are derived from CP437: + CP667 - "Mazovia" Used in MSDOS for Polish. + CP737 - Used in MSDOS for Greek, more popular than CP869 but is missing a couple of characters. + CP869 - Used in MSDOS for Greek, less popular than CP737 but more complete. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
|
Hey there! Thanks for helping Mudlet improve. 🌟 Test versionsYou can directly test the changes here:
No need to install anything - just unzip and run. |
|
@SlySven this is crashing because the |
|
The new definitions in TBuffer.cpp look right, but this is a limitation in QTextCodec, which has a list of supported codecs here: https://doc.qt.io/qt-5/qtextcodec.html - you will have to create a new codec class for this. I found a GPL2 example of someone doing that for CP437, here: https://github.com/openmoko/openmoko-svn/blob/master/src/host/qemu-neo1973/phonesim/lib/serial/qatutils.cpp - search down for "437" and you will see it. |
|
Ah, and reading the documentation:
The emphasis is mine - but it seems that with CP437 (or one of the other ones) we are getting a case where there is NOT an encoder - which is a concern... |
Hadn't arisen in the past but two of the new ones in the last commit are not supported by the codecs that Qt provide in the current (5.14.2) version of it's libraries. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
This is the CP667 (Mazovia) and CP737 (DOS Greek) ones, the other Greek one with more characters but supposedly less used is okay. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
|
Of course https://doc.qt.io/qt-5/qtextcodec.html mentions:
I believe the Qt libraries I am using have ICU support included. |
|
That's right - there's no encoder. Here are the one's QT supports by default. For everything else you would have to create the class yourself. I put a link above where someone did that for IBM437/CP437 QT Default Supported Encodings: Big5, Big5-HKSCS, CP949, EUC-JP, EUC-KR, GB18030, HP-ROMAN8, IBM 850, IBM 866, IBM 874, ISO 2022-JP, ISO 8859-1 to 10, ISO 8859-13 to 16, Iscii-Bng, Dev, Gjr, Knd, Mlm, Ori, Pnj, Tlg, and Tml, KOI8-R, KOI8-U, Macintosh, Shift-JIS, TIS-620, TSCII, UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, Windows-1250 to 1258 |
|
If ICU Supports them, perhaps we're just calling them by the wrong names? For example, I think the common name for CP437 may be IBM437, and while using QT you can use the name they use, but if relying on ICU for the codec, we may need to look up what They call it... |
|
The latest build is able to change the encoding fine now, try it out. |
|
It looks like we are able to change the encoding to CP437 now :) However, the system now crashes once connected to a server with either of the new encodings active, on sending any data. |
|
Well the aliases that Qt knows for the CP437 codec in the (WIndows 64Bit) system I have in front of me are:
|
|
As for still crashing - I have an idea which I will check out in the next couple of hours... |
|
Which MUD are you using this encoding for BTW? |
|
Medievia :)
medievia.com:4000
We need this encoding primarily so that we can pass all 128 ASCII
characters through to our inline map font.
…On Thu, Apr 9, 2020 at 3:05 PM Stephen Lyons ***@***.***> wrote:
Which MUD are you using this encoding for BTW?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3579 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/APANYRJSPPVPMIZZGX7CSXLRLYMA3ANCNFSM4MEFLS4Q>
.
|
|
🤦♀️ Ah, if your version (or rather the one built by us on AppVeyor) does not have the same encoders as mine does then yeah - it looks as though it will explode as soon as you try to send data - there will be the warning message: |
|
I think we're depending on QT being compiled with -icu in order for the
non-default encoders to be supported. The default list of supported codecs
on QT is listed above. It might be worth mentioning that QT is planning on
discontinuing support for ICU. I am just using the binary from your latest
build to test.
…On Thu, Apr 9, 2020 at 3:13 PM Stephen Lyons ***@***.***> wrote:
🤦♀️
Ah, if *your* version (or rather the one built by us on AppVeyor) does
not have the same encoders as mine does then yeah - it looks as though it
will explode as soon as you try to send data - there will be the warning
message:
cTelnet::encodingChanged("CP437") WARNING - Unable to locate a codec for
OOB protocols that can handle: "CP437"
but that won't stop the Application shooting itself in the foot by trying
to use a pointer that points to nothing afterwards...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3579 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/APANYRKE3B32DABGC3LOPNLRLYM33ANCNFSM4MEFLS4Q>
.
|
:) |
Converts the names of the encodings (used internally) from `QString`s to
`QByteArrays` as that is what Qt uses for it's own management of them.
Take the translated Server Encoding name out of the static and thus
immutable TBuffer::csmEncodingTable and put them in a shared
`(QMap<QByteArray, QString>) mudlet::mEncodingNameMap`
Convert the preference dialogue to store the encoder required as a
QDataRole::User data item in each item rather than using the (translated)
text itself. This is more robust and simpler to handle. It also eliminared
the need for a couple of convertors to change between what is being used
internally to identify the encoders and what a user might use as the latter
are not actually used now.
As the way we provide the multi-byte decoding of Big5 is done it turns out
that we can supply both Big5-ETen (effectively what Qt provides as Big5+)
and Big5-HKCS, just by asking Qt for the correct one. So both are now
offered.
Check for each 8-bit Extended ASCII encoder on startup and do not offer
encoders that we do not provide - this should eliminate problems between
what the coder/developer has in their Qt libraries compared to what the
CI (installer builder) uses...
Four custom codecs are included here that I have written to support
the encodings listed - which is needed for Windows CI builds which uses
only Qt and NOT IUC libraries to provide encoders. These are integrated
into the above start-up checks and they are inserted into the system only
if what they can replace is not present.
Prepend an 'm ' to the displayed codec in the preferences to make it clear
that it is a Mudlet provided one (and might be slighly different {wrong!}
compared to an offical one). The `getServerEncoding()` result will give a
`M_` prefix to the same data so that they can be distinguished.
Replace the "CP874" encoding as it was erronously being called
"ISO 8859-3 (South Europe)" in the preferences and was probably clashing
with the "ISO 8859-3" real encoding. As it happens the proper CP874,
according to Wikipedia, has three codes that required TWO unicode points
to encode. This is not something that can be supported by the
`QVector<QChar>` based sysmtem that I have developed here {at least not
without some exceptional extra code) however the encoding IS very similiar
to CP1161 {a.k.a. Windows-874} and that CAN be done in this manner!
Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
…errors Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
|
The internal details of the encodings are still exposed to the user and will need to be exposed in the documentation... my experience says if you do something like "here's a thing, but you're supposed to ignore this thing", people won't be better off and informed, but more confused and wondering why did they need to learn something not relevant to them. |
Do you mean from the flag to the The former we could mention in a note at the bottom in the wiki: and the latter in a tooltip: If we do not do this then it will be very difficult to determine from the end user if an encoding defect is coming from bugs in the codecs I have created or something else... |
|
That's a fair point, but still, it's confusing information for the non-techies. You know how a well-written technical explanation still goes over people's heads - at best they don't understand it, at worst they think the client is confusing because it uses all sorts of terminology they don't understand. This is how Mudlet becomes "confusing and complicated" for them, and we can't have that. That's my half of @Mudlet/ui-review, perhaps @Kebap has a take on this as well. But still, we do need to debug things. How about we make a special function That way we don't 'litter' every function with a debug flag and keep it all in one tidy spot. |
|
SlySven#10 removes the debug flag and #3633 adds the function. With that this'll be all set, and we gain a new pattern for showing important debug info! |
|
Are you okay with the |
|
I don't think we should show it there either, for the same reasons - more confusion than help 😀 |
|
We've been doing some testing on Medievia - and this latest version works perfectly! When do you expect this will be merged in? |
|
It's just about ready to go and will be in the next release (https://github.com/Mudlet/Mudlet/milestones). |
|
I am on this now - got diverted into a related matter which I realise I will have to re-insert into my task queue a bit further back. I aim to have this done in the next 24 hours. |
Revised the Lua getMudletInfo() function to handle the encoder details being `QByteArray`s instead of `QString`s. Also revised the text display format better so that it can be copied-and-pasted somewhere else and actually fit on the screen - this required including some code to wrap the text at the comma separator. I think the code is good enough but a second opinion in case there are any corner-cases would be welcomed. Added a short explanation about the "m " prefix in the tool-tip for the encoder selection widget in the preferences. Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
|
Okay I have a workable setup now - including some reworking of the Somehow though my repo has gotten out of step with the main one so I am going to have to merge in and resolve conflicts from the development branch... 🤷♂️ |
…odings Conflicts resolved in: * src/TBuffer.h * src/TLuaInterpreter.cpp Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
| </property> | ||
| <property name="toolTip"> | ||
| <string><p>If you are playing a non-English game and seeing � instead of text, or special letters like <span style=" font-weight:600;">ñ</span> aren't showing right - try changing the encoding to UTF-8 or to one suggested by your game.</p></string> | ||
| <string><p>If you are playing a non-English game and seeing � instead of text, or special letters like <span style=" font-weight:600;">ñ</span> aren't showing right - try changing the encoding to UTF-8 or to one suggested by your game.</p><p>For some encodings on some Operating Systems Mudlet itself has to provide the codec needed; if that is the case for this Mudlet then there will be a <tt>m </tt> prefixed applied to those encoding names (so if they have errors the blame can be applied correctly!)</p></string> |
There was a problem hiding this comment.
I'd like this tooltip to stay as-is because the new information does not really provide any useful information for the player, just gives more text to read which to a non-techie might be confusing.
There was a problem hiding this comment.
Then you will have people saying WTF does that "m" mean. Also it give the hint that if they are getting odd characters when they have a need for an encoder in that group - like the Medival people - they can appreciate that we ought to be able to fix it if they get in touch.
There was a problem hiding this comment.
That is the purpose of getMudletInfo() and the internal codec should not leak to the tooltip, if it is!
| knownEncodings.append(adjustEncoding(encoding)); | ||
| } | ||
|
|
||
| QString encodingNames{QLatin1String(knownEncodings.join(", "))}; |
There was a problem hiding this comment.
I don't think it's this PR's scope to rework how getMudletInfo() does the display.
There was a problem hiding this comment.
The previous one had to be reworked to handle the different way the data was changed and it had some short-comings - using the Host::postMessage(...) actually produced a poor layout for the information (the hanging indent meant it took up more space and the colouration didn't help and there wasn't sensible line breaking for what is a long line of text - as postMessage(...) expects multi-line texts to be pre-broken into lines).
There was a problem hiding this comment.
A different problem unrelated to adding CP437, I think. The old layout is far more compact than the new one, and more compact = good, less text to read.
|
@thoth-medievia can we get a screenshot of the CP437 encoding in action with your game's map for the release post? |
|
Here are some screencaps to show the Medievia Map Font working with CP437
on Mudlet. I am sure you will agree it looks BEAUTIFUL!
How can we get a Medievia profile on that default connect screen?
Also, is there any way to coordinate having a Medievia banner on your
website to coincide with the launch of support for Medievia on the 05/02
release?
We have a small marketing budget available for things like that.
Regards,
Thoth
…On Thu, Apr 23, 2020 at 3:30 AM Vadim Peretokin ***@***.***> wrote:
@thoth-medievia <https://github.com/thoth-medievia> can we get a
screenshot of the CP437 encoding in action with your game's map for the
release post?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3579 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/APANYRJKP5TXFICPCPP7TKDRN7VAFANCNFSM4MEFLS4Q>
.
|
|
I don't think they came through on Github. To list your game, check out https://wiki.mudlet.org/w/Listing_Your_MUD :) We'll mention the game anyway when we add it to the defaults list - spend it instead on promoting it to the outside world :) |
|
That looks amazing, could you also do .png not .jpg? That negatively impacts quality |
|
What would be useful would be a 4:1 width:height ration .png (or maybe .svg) logo icon (ideally with any detail in the leftmost quarter if that is relevant) along with a few paragraphs as an introduction/tease/description {for example consider the WoTMUD one - I wrote that myself!!! 😊 }. We have the connection details (medievia.com:4000) and you do NOT use SSL (do you). Obviously Medievia is the name to use for it so I think that is all that is needed. <aside>Do you have an Application Id for Discord which we can use so that Discord RPC says "Playing Medievia" and shows your Discord guild icon (if it is stored on Medievia's Discord Guild with the key/name |
|
Here is our 4:1 ratio icon for Mudlet. We don't use SSL, just telnet to port 4000. We don't presently have a Discord ID, but we will get one for inclusion in the next release. Here are our teaser paragraphs: " Endless player-created areas to explore for the first time. Ships you can take out on the open seas. You can hunt serpents, kill monsters, attack other ships, explore new islands. We have a 4-million room expansive wilderness that you can travel through for hours in one direction, and an under ocean world that is even more vast. We have helpful dragons that fly you high above the wilderness to far off places, and evil dragons whose lairs you can defeat with teams of dozens of other players. We have a custom font that provides beautifully rendered overhead maps, blending the best of text gaming with graphical navigation. You can make your fortune trading via covered wagons across the wilderness, through adventure after adventure. You can fight for dominance in PvP action in zones, ships, wilderness, herobattles, arenas and quests. You can create your own bloodline for future generations of heroes to carry on your name. You can even build your own homes and castles! This is a small sample of our features. The complete list would scroll for pages, and is still growing. Medievia has been continuously developed for over 25 years. This is a game that has the intuition to track your happiness, pride, sadness, and fear while changing itself to fit your needs! Weather, storms, wind, fire, floods, disease, even asteroids. This is literally the most dynamic game world ever attempted. This is Medievia. Do you dare enter? Warm Regards, |
|
The the reqs I mentioned in #3579 (comment) all good to go? (sorry if the follow-up comment was misleading) |
* Enhance: add support for CP437 encoding
This encoding is often used within the font within the ROM of PC Display
cards and - with a lot of box drawing characters - is popular in some
MUDs for inline map displays.
Additionally I have also managed to include the following extra encodings
that are derived from CP437:
+ CP667 - "Mazovia" Used in MSDOS for Polish.
+ CP737 - Used in MSDOS for Greek, more popular than CP869 but is missing
a couple of characters.
+ CP869 - Used in MSDOS for Greek, less popular than CP737 but more
complete.
Hadn't arisen in the past but two of the new ones in the last commit are
not supported by the codecs that Qt provide in the current (5.14.2) version
of it's libraries.
* Enhance: add own encoders for CP437/667/737/869 for use on Win CI builds
Converts the names of the encodings (used internally) from `QString`s to
`QByteArrays` as that is what Qt uses for it's own management of them.
Take the translated Server Encoding name out of the static and thus
immutable TBuffer::csmEncodingTable and put them in a shared
`(QMap<QByteArray, QString>) mudlet::mEncodingNameMap`
Convert the preference dialogue to store the encoder required as a
QDataRole::User data item in each item rather than using the (translated)
text itself. This is more robust and simpler to handle. It also eliminated
the need for a couple of convertors to change between what is being used
internally to identify the encoders and what a user might use as the latter
are not actually used now.
Because of the way we provide the multi-byte decoding of Big5 is done it
turns out that we can supply both Big5-ETen (effectively what Qt provides
as Big5+) and Big5-HKCS, just by asking Qt for the correct one. So both
are now offered.
Check for each 8-bit Extended ASCII encoder on startup and do not offer
encoders that we do not provide - this should eliminate problems between
what the coder/developer has in their Qt libraries compared to what the
CI (installer builder) uses...
Four custom codecs are included here that I have written to support
the encodings listed - which is needed for Windows CI builds which uses
only Qt and NOT IUC libraries to provide encoders. These are integrated
into the above start-up checks and they are inserted into the system only
if what they can replace is not present.
Prepend an 'm ' to the displayed codec in the preferences to make it clear
that it is a Mudlet provided one (and might be slightly different {wrong!}
compared to an official one).
Replace the "CP874" encoding as it was erroneously being called
"ISO 8859-3 (South Europe)" in the preferences and was probably clashing
with the "ISO 8859-3" real encoding. As it happens the proper CP874,
according to Wikipedia, has three codes that required TWO Unicode points
to encode. This is not something that can be supported by the
`QVector<QChar>` based system that I have developed here {at least not
without some exceptional extra code) however the encoding IS very similar
to CP1161 {a.k.a. Windows-874} and that CAN be done in this manner!
Finish off hiding the fact that we have differently named substitute
`TTextCodec`s with modified names compared to the Qt `QTextCodec`s that we
hope (but are not always rewarded with) will be present. The modified names
will not show up in the relevant Lua API functions but will be accepted if they
are explicitly asked for to set the encoding.
Revise: update original PR to fit in current development branch
Revised the Lua getMudletInfo() function to handle the encoder details
being `QByteArray`s instead of `QString`s. Also revised the text display
format better so that it can be copied-and-pasted somewhere else and
actually fit on the screen - this required including some code to wrap
the text at the comma separator. I think the code is good enough but a
second opinion in case there are any corner-cases would be welcomed.
Added a short explanation about the "m " prefix in the tool-tip for the
encoder selection widget in the preferences.
Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
Co-authored-by: Vadim Peretokin <vperetokin@gmail.com>



This encoding is often used within the font within the ROM of PC Display cards and - with a lot of box drawing characters - is popular in some MUDs for inline map displays.
Additionally I have also managed to include the following extra encodings that are derived from CP437:
Signed-off-by: Stephen Lyons slysven@virginmedia.com