Correctly decode mathml bytes buffer to unicode string from mathType#10803
Merged
Conversation
See test results for failed build of commit 39daeafc7a |
LeonarddeR
approved these changes
Feb 21, 2020
LeonarddeR
left a comment
Collaborator
There was a problem hiding this comment.
Just only a style thing.
Could we somehow verify the utf-8 assumption by creating equations that contain unicode characters?
| @@ -1,8 +1,8 @@ | |||
| #appModules/winword.py | |||
Collaborator
There was a problem hiding this comment.
Could you revisit the full header and add appropriate spaces after the hashes?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Link to issue number:
None.
Summary of the issue:
In NVDA 2019.3 it is no longer possible to read or interact with math in Microsoft word. We fetch the Math from Microsoft Word using MathType. However, MathType returns the math as utf8 encoded bytes, not a unicode string. Yet when we then pass this to our xml handling code, an error is raised due to it not being unicode.
Description of how this pull request fixes the issue:
Decode the byte string from mathType into unicode first.
Testing performed:
Tested NVDA reading all math equations in this
Sample Expressions to Navigate.docx
Known issues with pull request:
I can't find specific documentation for mathType stating that the string is really encoded as utf8. But, since the XML (and therefore mathml) standard encoding should be utf8, I think this is a safe assumption. Of course in 2019.2.1 and below, as we did not specify an encoding, it would have assumed mbcs.
Change log entry:
Bug fixes: