Skip to content

iso-8859-7 encoding of non-breaking space prevents encoding detection #64

@DRMacIver

Description

@DRMacIver

The following string is detected as having a None encoding despite being a valid string in one of chardet's supported encodings:

u'<\xa0'.encode('iso-8859-7')

This remains true even if you pad it with ascii, it's not a length issue.

This behaviour is present in 9e419e9 (I just only was able to find it once #63 was fixed).

Shall I send you a pull request with the test that is finding these? It's not very complicated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions