remove use of unichr, basestring and unicode objects. by michaelDCurran · Pull Request #9724 · nvaccess/nvda

michaelDCurran · 2019-06-12T13:13:27Z

Link to issue number:

None

Summary of the issue:

In Python3, strings are unicode by default. str is unicode, and unicode and basestring no longer exist.

Description of how this pull request fixes the issue:

Changed all unichr calls to chr calls. As chr was never used in the code base, logically it is safe enough to simply replace all of these.
Replaced occurances of basestring with str. If we were allowing both unicode and ascii, then now it should just be unicode.
All usage of unicode() has been removed where we can assume a string should already be unicode, or changed to str() where we are converting from something that isn't a string (E.g. an int).
nvwave now initializes its buffers with bytes objects.
The eSpeak synthDriver now correctly converts voice/variant IDs/names to utf8 from unicode and back again.

This pr was produced by grepping for the various symbols, and handling each case specifically by looking at the code.

Testing performed:

With one or two other runtime changes to do with hashing etc (addressed in a separate pr), NVDA can now run, and all synthDrivers, including eSpeak function correctly.

Known issues with pull request:

No brailleDisplay drivers have been touched yet, nor has hwIo in any significant way. These will have to be handled carefully in regards to what should be bytes and what should be strings.

Change log entry:

None.

LeonarddeR · 2019-06-12T14:17:59Z

* Changed all unichr calls to chr calls. As chr was never used in the code base, logically it is safe enough to simply replace all of these.

I find at least an occurrence of chr in handyTech.py. Having said that, you just said that you haven't touched braille display drivers in this pr, so I assume you mean that chr isn't used outside braille display drivers.

* Replaced occurances of basestring with str. If we were allowing both unicode and ascii, then now it should just be unicode.

How about the cases where str is already used? I'm pretty sure we need a way to distinguish between cases where str has now been introduced instead of basestring, and cases where str was already efined.

LeonarddeR

Apart from my question about whether this is the right moment to convert basestring to str without covering all the str cases first, here are some comments, particularly about very long lines that could be fixed while at it. I don't insist on doing this as part of this pr, though.

LeonarddeR · 2019-06-12T17:15:34Z

 		This string should be included as returned. There is no need to call repr.
 		@param string: The string to format.
-		@type string: nbasestring
+		@type string: nstr


Typo, nstr > str

LeonarddeR · 2019-06-12T17:19:09Z

 				text=ctypes.cast(buf,ctypes.c_wchar_p).value
 			else:
-				text=unicode(ctypes.cast(buf,ctypes.c_char_p).value, errors="replace", encoding=locale.getlocale()[1])
+				encoding=locale.getlocale()[1]


I wonder whether we should change this into locale.getpreferredencoding() while at it.

LeonarddeR · 2019-06-12T17:22:02Z

@@ -128,7 +128,7 @@
 }

 def bytesToInt(bytes):


This function can be removed anyway, but that will be dealt with later.

LeonarddeR · 2019-06-12T17:37:51Z

 		# Return True if the URL indicates that this is probably a web browser document.
 		# We do this check because we don't want to remember caret positions for email messages, etc.
-		return isinstance(docConstId, basestring) and docConstId.split("://", 1)[0] in ("http", "https", "ftp", "ftps", "file")
+		return isinstance(docConstId, str) and docConstId.split("://", 1)[0] in ("http", "https", "ftp", "ftps", "file")


This line is really huge. Could you split it while at it?

LeonarddeR · 2019-06-12T17:42:10Z

 		if scriptHandler.getLastScriptRepeatCount()>=1:
 			if curObject.TextInfo!=NVDAObjectTextInfo:
 				textList=[]
-				if curObject.name and isinstance(curObject.name, basestring) and not curObject.name.isspace():


Could you also split this one while at it?

LeonarddeR · 2019-06-12T17:45:16Z

@@ -55,7 +55,7 @@ def translate(tableList, inbuf, typeform=None, cursorPos=None, mode=0):
 	* returns a list of integers instead of an string with cells, and


Ugh, this should be a string instead of an string.

LeonarddeR · 2019-06-12T17:51:17Z


 def _escapeXml(text):
-	text = unicode(text).translate(XML_ESCAPES)
+	text = str(text).translate(XML_ESCAPES)


Do we ever expect text to be something else than str? if not, we can safely omit wrapping this into str

Suggested change

text = str(text).translate(XML_ESCAPES)

text = text.translate(XML_ESCAPES)

LeonarddeR · 2019-06-12T17:53:53Z

 	for fileName in os.listdir(dir):
 		if os.path.isfile("%s\\%s"%(dir,fileName)):
-			file=codecs.open("%s\\%s"%(dir,fileName))
+			file=open("%s\\%s"%(dir,fileName))


Is there something that makes this change necessary?

LeonarddeR · 2019-06-12T17:56:12Z

 	dialogArguements = automation.VARIANT( dialogString )
 	gui.mainFrame.prePopup() 
-	windll.mshtml.ShowHTMLDialogEx( gui.mainFrame.Handle , moniker , HTMLDLG_MODELESS , addressof( dialogArguements ) , unicode(DIALOG_OPTIONS ), None)
+	windll.mshtml.ShowHTMLDialogEx( gui.mainFrame.Handle , moniker , HTMLDLG_MODELESS , addressof( dialogArguements ) , DIALOG_OPTIONS, None)


Wow, this is long as well!

LeonarddeR · 2019-06-12T17:57:06Z

 			if log.isEnabledFor(log.DEBUG):
 				startTime = time.time()
-			self.VBufHandle=NVDAHelper.localLib.VBuf_createBuffer(self.rootNVDAObject.appModule.helperLocalBindingHandle,self.rootDocHandle,self.rootID,unicode(self.backendName))
+			self.VBufHandle=NVDAHelper.localLib.VBuf_createBuffer(self.rootNVDAObject.appModule.helperLocalBindingHandle,self.rootDocHandle,self.rootID,self.backendName)


Another pretty long line

michaelDCurran · 2019-06-12T20:55:46Z

I realized last night I had not pushed all my commits for this branch. Though I have now. However, I will address the feedback you have given me so far. Also, re basestring to str: I checked all existing uses of str, and they didn't look problematic to me.

LeonarddeR · 2019-06-12T21:01:11Z

may I ask how you checked for current str? Just a grep for str?

michaelDCurran · 2019-06-12T21:26:44Z

That is correct.

feerrenrut · 2019-06-13T13:32:57Z

This modifies a few braille drivers, and other braille related files. I expect it will conflict with the work I am doing. Please see #9736.

Can you revert the changes to the overlapping files:

 source/bdDetect.py
 source/braille.py
 source/brailleDisplayDrivers/eurobraille.py
 source/brailleInput.py
 source/brailleTables.py
source/hwIo.py

… separate pr.

michaelDCurran · 2019-06-14T23:32:54Z

@LeonarddeR and @feerrenrut this is ready for another review again.

michaelDCurran added 3 commits June 12, 2019 22:40

Convert all usage of unichr to chr.

6c2beb8

Replace basestring with str.

cbd12ff

Remove usage of unicode() which is not available in Python3.

97f4380

michaelDCurran requested review from LeonarddeR and feerrenrut June 12, 2019 13:13

LeonarddeR reviewed Jun 12, 2019

View reviewed changes

michaelDCurran force-pushed the py3_unichr branch from cadf853 to 97f4380 Compare June 12, 2019 20:47

LeonarddeR mentioned this pull request Jun 13, 2019

Python 3: No longer encode or decode if it is not necessary #9734

Merged

michaelDCurran added 2 commits June 14, 2019 09:55

Revert changes to braille related files as these will be handled in a…

a564157

… separate pr.

Address review comments.

dac4364

LeonarddeR approved these changes Jun 15, 2019

View reviewed changes

Comment thread source/speechXml.py

michaelDCurran merged commit 590c4f9 into threshold_py3_staging Jun 15, 2019

nvaccessAuto added this to the 2019.3 milestone Jun 15, 2019

josephsl mentioned this pull request Jul 23, 2019

What's new and readme: we are moving to Python 3.7 #9942

Merged

lukaszgo1 mentioned this pull request Jul 30, 2020

Fix-up of PR 9724. Stop including empty properties when copying object to the clipboard #11441

Merged

		@@ -55,7 +55,7 @@ def translate(tableList, inbuf, typeform=None, cursorPos=None, mode=0):
		* returns a list of integers instead of an string with cells, and

	text = str(text).translate(XML_ESCAPES)
	text = text.translate(XML_ESCAPES)

Uh oh!

Conversation

michaelDCurran commented Jun 12, 2019

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Testing performed:

Known issues with pull request:

Change log entry:

Uh oh!

LeonarddeR commented Jun 12, 2019

Uh oh!

LeonarddeR left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelDCurran commented Jun 12, 2019 via email

Uh oh!

LeonarddeR commented Jun 12, 2019 via email

Uh oh!

michaelDCurran commented Jun 12, 2019 via email

Uh oh!

feerrenrut commented Jun 13, 2019

Uh oh!

michaelDCurran commented Jun 14, 2019

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants