Skip to content

Correctly decode mathml bytes buffer to unicode string from mathType#10803

Merged
michaelDCurran merged 6 commits into
masterfrom
decodeMathTypeStrings
Mar 2, 2020
Merged

Correctly decode mathml bytes buffer to unicode string from mathType#10803
michaelDCurran merged 6 commits into
masterfrom
decodeMathTypeStrings

Conversation

@michaelDCurran

Copy link
Copy Markdown
Member

Link to issue number:

None.

Summary of the issue:

In NVDA 2019.3 it is no longer possible to read or interact with math in Microsoft word. We fetch the Math from Microsoft Word using MathType. However, MathType returns the math as utf8 encoded bytes, not a unicode string. Yet when we then pass this to our xml handling code, an error is raised due to it not being unicode.

Description of how this pull request fixes the issue:

Decode the byte string from mathType into unicode first.

Testing performed:

Tested NVDA reading all math equations in this
Sample Expressions to Navigate.docx

Known issues with pull request:

I can't find specific documentation for mathType stating that the string is really encoded as utf8. But, since the XML (and therefore mathml) standard encoding should be utf8, I think this is a safe assumption. Of course in 2019.2.1 and below, as we did not specify an encoding, it would have assumed mbcs.

Change log entry:

Bug fixes:

  • NVDA can again read and interact with math equations in Microsoft Word.

@AppVeyorBot

Copy link
Copy Markdown

See test results for failed build of commit 39daeafc7a

@LeonarddeR LeonarddeR left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just only a style thing.

Could we somehow verify the utf-8 assumption by creating equations that contain unicode characters?

Comment thread source/NVDAObjects/window/winword.py Outdated
@@ -1,8 +1,8 @@
#appModules/winword.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revisit the full header and add appropriate spaces after the hashes?

@michaelDCurran michaelDCurran merged commit 42be101 into master Mar 2, 2020
@michaelDCurran michaelDCurran deleted the decodeMathTypeStrings branch March 2, 2020 00:38
@nvaccessAuto nvaccessAuto added this to the 2020.1 milestone Mar 2, 2020
michaelDCurran added a commit that referenced this pull request Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants