-
-
Notifications
You must be signed in to change notification settings - Fork 40
Closed
Labels
bugThis issue/PR relates to a bug.This issue/PR relates to a bug.
Milestone
Description
Please do not use chardet to detect document encoding. For UTF-8 texts it works more or less reliably only for Latin-1 and Latin-1 Supplement unicode blocks, for Latin Extended-A and Extended-B it fails in about 50% cases wrongly detecting Windows encodings, eg. for UTF-8 document with Latin Extended-A content:
$ chardet docs/index.rst
docs/index.rst: Windows-1252 with confidence 0.594336283186
$ file docs/index.rst
docs/index.rst: UTF-8 Unicode text
In fact chardet reports low confidence in this case.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugThis issue/PR relates to a bug.This issue/PR relates to a bug.