Do not use chardet to detect encoding because of poor accuracy

Please do not use chardet to detect document encoding. For UTF-8 texts it works more or less reliably only for Latin-1 and Latin-1 Supplement unicode blocks, for Latin Extended-A and Extended-B it fails in about 50% cases wrongly detecting Windows encodings, eg. for UTF-8 document with Latin Extended-A content:

```
$ chardet docs/index.rst 
docs/index.rst: Windows-1252 with confidence 0.594336283186
$ file docs/index.rst 
docs/index.rst: UTF-8 Unicode text
```

In fact chardet reports low confidence in this case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not use chardet to detect encoding because of poor accuracy #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Do not use chardet to detect encoding because of poor accuracy #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions