Update Unicode handling for Python 3#214
Conversation
| return s.decode("latin") | ||
| else: | ||
| return s.decode("utf-8", "strict") | ||
| return s.decode("utf_8", "strict") |
There was a problem hiding this comment.
should this not be equivalent?
There was a problem hiding this comment.
Sure, but utf_8 is the documented base name for the UTF-8 codec:
https://docs.python.org/3/library/codecs.html#standard-encodings
The rest are aliases. While utf-8 is not in the explicit list of aliases, - is equivalent to _:
Python comes with a number of codecs built-in, either implemented as C functions or with dictionaries as mapping tables. The following table lists the codecs by name, together with a few common aliases, and the languages for which the encoding is likely used. Neither the list of aliases nor the list of languages is meant to be exhaustive. Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g.
'utf-8'is a valid alias for the'utf_8'codec.CPython implementation detail: Some common encodings can bypass the codecs lookup machinery to improve performance. These optimization opportunities are only recognized by CPython for a limited set of (case insensitive) aliases: utf-8, utf8, latin-1, latin1, iso-8859-1, iso8859-1, mbcs (Windows only), ascii, us-ascii, utf-16, utf16, utf-32, utf32, and the same using underscores instead of dashes. Using alternative aliases for these encodings may result in slower execution.
|
Thanks! looks good :) could you squash the commits to a single commit with a meaningful message? that helps to keep the commit history a bit cleaner. |
- Get rid of unicode(). In Python 3, `unicode` is an alias of `str`. No need to cast a `str` to a `str`. - Consistently use the base name `utf_8` for the UTF-8 codec. https://docs.python.org/3/library/codecs.html#standard-encodings - Remove a piece of code copied from https://cython.readthedocs.io/en/latest/src/tutorial/strings.html Replace with the relevant code from teh overhauled Python 3 doc: https://github.com/minrk/cython-docs/blob/master/src/tutorial/strings.rst
0f6dbea to
20fbc68
Compare
|
Done. |
No description provided.