Skip to content

Two different types of Windows-1252 encodings #337

@ember91

Description

@ember91

Hi!

I'm using the following script with chardet 7.0.1:

from pathlib import Path

import chardet

for file in ('a.txt', 'b.txt'):
    enc = chardet.detect(Path(file).read_bytes())['encoding']
    print(file, enc)

Where a.txt contains:

text

and b.txt contains:

<?xml encoding="Windows-1252"?>
<tag></tag>

The script prints:

a.txt Windows-1252
b.txt windows-1252

If we ignore that perhaps the files should be classified as ASCII, why does one encoding start with a capital W and one not?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions