Skip to content

charset detection not working #554

@bittlingmayer

Description

@bittlingmayer

The URL https://www.aksam.com.tr/guncel/baskan-erdogan-48inci-muhtarlar-toplantisinda-konusuyor/haber-795519 has:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9" />
<meta http-equiv="Content-Type" content="text/html; charset=windows-1254" />

However res.textConverted() returns
<p>Cumhurba�kan� Recep Tayyip Erdo�an, Be�tepe'de 48'inci Muhtarlar Toplant�s�'nda konu�tu.</p> <p><strong><em>Erdo�an'�n a��klamalar�ndan sat�r ba�lar� ��yle:</em></strong></p>

I see that in body.js convertBody the charset is supposed to be detected. I think the reason is that Content-Type is uppercase, but the regex in body.js only matches lowercase. The preview str should be lowercased.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions