-
Notifications
You must be signed in to change notification settings - Fork 291
Wrong UTF-8 detection #134
Copy link
Copy link
Closed
Description
When there are not enough non-ascii char, chardet detect UTF-8 as ISO-8859-1
Here is an example:
>>> chardet.detect(u'foo é'.encode('utf-8'))
{'confidence': 0.73, 'language': '', 'encoding': 'ISO-8859-1'}
But with some more non-ascii:
>>> chardet.detect(u'foo é foo é'.encode('utf-8'))
{'confidence': 0.7525, 'language': '', 'encoding': 'utf-8'}
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels