Update Unicode handling for Python 3 by DimitriPapadopoulos · Pull Request #214 · holgern/pyedflib

DimitriPapadopoulos · 2023-06-05T06:27:18Z

No description provided.

skjerns · 2023-06-23T20:08:55Z

pyedflib/edfreader.py

            return s.decode("latin")
        else:
-            return s.decode("utf-8", "strict")
+            return s.decode("utf_8", "strict")


should this not be equivalent?

Sure, but utf_8 is the documented base name for the UTF-8 codec:
https://docs.python.org/3/library/codecs.html#standard-encodings

The rest are aliases. While utf-8 is not in the explicit list of aliases, - is equivalent to _:

Python comes with a number of codecs built-in, either implemented as C functions or with dictionaries as mapping tables. The following table lists the codecs by name, together with a few common aliases, and the languages for which the encoding is likely used. Neither the list of aliases nor the list of languages is meant to be exhaustive. Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8' is a valid alias for the 'utf_8' codec.

CPython implementation detail: Some common encodings can bypass the codecs lookup machinery to improve performance. These optimization opportunities are only recognized by CPython for a limited set of (case insensitive) aliases: utf-8, utf8, latin-1, latin1, iso-8859-1, iso8859-1, mbcs (Windows only), ascii, us-ascii, utf-16, utf16, utf-32, utf32, and the same using underscores instead of dashes. Using alternative aliases for these encodings may result in slower execution.

util/gh_lists.py

skjerns · 2023-06-23T20:10:51Z

Thanks! looks good :) could you squash the commits to a single commit with a meaningful message? that helps to keep the commit history a bit cleaner.

- Get rid of unicode(). In Python 3, `unicode` is an alias of `str`. No need to cast a `str` to a `str`. - Consistently use the base name `utf_8` for the UTF-8 codec. https://docs.python.org/3/library/codecs.html#standard-encodings - Remove a piece of code copied from https://cython.readthedocs.io/en/latest/src/tutorial/strings.html Replace with the relevant code from teh overhauled Python 3 doc: https://github.com/minrk/cython-docs/blob/master/src/tutorial/strings.rst

DimitriPapadopoulos · 2023-06-24T10:01:32Z

Done.

skjerns reviewed Jun 23, 2023

View reviewed changes

util/gh_lists.py Show resolved Hide resolved

DimitriPapadopoulos force-pushed the unicode branch from 0f6dbea to 20fbc68 Compare June 24, 2023 10:01

skjerns merged commit fca95ce into holgern:master Jun 27, 2023

DimitriPapadopoulos deleted the unicode branch June 27, 2023 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Unicode handling for Python 3#214

Update Unicode handling for Python 3#214
skjerns merged 1 commit intoholgern:masterfrom
DimitriPapadopoulos:unicode

DimitriPapadopoulos commented Jun 5, 2023

Uh oh!

skjerns Jun 23, 2023

Uh oh!

DimitriPapadopoulos Jun 24, 2023 •

edited

Loading

Uh oh!

Uh oh!

skjerns commented Jun 23, 2023

Uh oh!

DimitriPapadopoulos commented Jun 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DimitriPapadopoulos commented Jun 5, 2023

Uh oh!

skjerns Jun 23, 2023

Choose a reason for hiding this comment

Uh oh!

DimitriPapadopoulos Jun 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skjerns commented Jun 23, 2023

Uh oh!

DimitriPapadopoulos commented Jun 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DimitriPapadopoulos Jun 24, 2023 •

edited

Loading