I just found a test for Arabic. https://github.com/chardet/chardet/blob/master/tests/windows-1256-arabic/_chromium_windows-1256_with_no_encoding_specified.html According to this link, cp1256 is also an important encoding. http://w3techs.com/technologies/overview/character_encoding/all