Skip to content
This repository was archived by the owner on Sep 24, 2018. It is now read-only.
This repository was archived by the owner on Sep 24, 2018. It is now read-only.

blog_charset() and json_encode() #248

@mdawaffe

Description

@mdawaffe

PHP's json_encode() assumes the input is UTF-8.

Additionally, RFC-7159 §8.1 says

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.

(I don't know how real world clients clients parse JSON, though.)

Currently, WP-API will fail for non UTF-8 text.

https://gist.github.com/mdawaffe/d3ea2a827b61784c7e8f has an example.

In it, the "blog"'s charset is set to Windows-1252. Two JSON requests are made.

Request Expected Response Actual Response
data.php {"text": "ạ ṕ ỉ"} {"text": "ạ ṕ ỉ"}
{"text": "\u00e1\u00ba\u00a1 \u00e1\u00b9\u2022 \u00e1\u00bb\u2030"} {"text":"\u1ea1 \u1e55 \u1ec9"}
invalid.php {"text": "ÀÁ"} false
{"text": "\u00c0\u00c1"} (JSON_ERROR_UTF8)

Obviously, these examples are super contrived. Also, blog_charsets other than UTF-8 are deprecated in WordPress.

That said:

  • other character sets do exist, and
  • even for UTF-8 sites, it seems likely that invalid UTF-8 will get through somewhere, which breaks json_encode().

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions