PHP's json_encode() assumes the input is UTF-8.
Additionally, RFC-7159 §8.1 says
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.
(I don't know how real world clients clients parse JSON, though.)
Currently, WP-API will fail for non UTF-8 text.
https://gist.github.com/mdawaffe/d3ea2a827b61784c7e8f has an example.
In it, the "blog"'s charset is set to Windows-1252. Two JSON requests are made.
| Request |
Expected Response |
Actual Response |
| data.php |
{"text": "ạ ṕ ỉ"} |
{"text": "ạ ṕ ỉ"} |
|
{"text": "\u00e1\u00ba\u00a1 \u00e1\u00b9\u2022 \u00e1\u00bb\u2030"} |
{"text":"\u1ea1 \u1e55 \u1ec9"} |
| invalid.php |
{"text": "ÀÁ"} |
false |
|
{"text": "\u00c0\u00c1"} |
(JSON_ERROR_UTF8) |
Obviously, these examples are super contrived. Also, blog_charsets other than UTF-8 are deprecated in WordPress.
That said:
- other character sets do exist, and
- even for UTF-8 sites, it seems likely that invalid UTF-8 will get through somewhere, which breaks
json_encode().