Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
web_fetch appear to decode HTTP response bodies as UTF-8 unconditionally. Pages encoded with legacy charsets such as Shift_JIS, Big5, GBK, etc. return mojibake / replacement characters instead of readable text.
Suspected cause
The shared response reader appears to decode as UTF-8 by default.
In src/agents/tools/web-shared.ts, readResponseText() uses:
const decoder = new TextDecoder();
and later:
const text = await res.text();
Both paths default to UTF-8. This causes non-UTF-8 pages to be decoded incorrectly before HTML extraction/readability processing.
Suggested fix
Decode from raw bytes instead of calling res.text() directly:
- Read response as
ArrayBuffer / raw bytes.
- Detect charset from
Content-Type, e.g.:
Content-Type: text/html; charset=Shift_JIS
- If missing, scan the first few KB of HTML for:
or:
<meta http-equiv="Content-Type" content="text/html; charset=...">
- Decode with:
- Fall back to UTF-8 only if no charset can be determined.
Steps to reproduce
Use a known Shift_JIS page:
http://www.aozora.gr.jp/cards/000081/files/46268_23911.html
Call:
web_fetch({
url: "http://www.aozora.gr.jp/cards/000081/files/46268_23911.html",
extractMode: "text"
})
Expected behavior
The page should be decoded according to its declared charset and return readable Japanese text.
Actual behavior
Output contains mojibake, for example:
OpenClaw version
2026.04.24
Operating system
ubuntu 24.04.4 LTS
Install method
No response
Model
GTP-5.4
Provider / routing chain
Telegram → OpenClaw Gateway → model router / OpenAI API → gpt-5.4
Additional provider/model setup details
No response
Logs, screenshots, and evidence
Impact and severity
No response
Additional information
No response
Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
web_fetchappear to decode HTTP response bodies as UTF-8 unconditionally. Pages encoded with legacy charsets such as Shift_JIS, Big5, GBK, etc. return mojibake / replacement characters instead of readable text.Suspected cause
The shared response reader appears to decode as UTF-8 by default.
In
src/agents/tools/web-shared.ts,readResponseText()uses:and later:
Both paths default to UTF-8. This causes non-UTF-8 pages to be decoded incorrectly before HTML extraction/readability processing.
Suggested fix
Decode from raw bytes instead of calling
res.text()directly:ArrayBuffer/ raw bytes.Content-Type, e.g.:Content-Type: text/html; charset=Shift_JISor:
Steps to reproduce
Use a known Shift_JIS page:
Call:
Expected behavior
The page should be decoded according to its declared charset and return readable Japanese text.
Actual behavior
Output contains mojibake, for example:
OpenClaw version
2026.04.24
Operating system
ubuntu 24.04.4 LTS
Install method
No response
Model
GTP-5.4
Provider / routing chain
Telegram → OpenClaw Gateway → model router / OpenAI API → gpt-5.4
Additional provider/model setup details
No response
Logs, screenshots, and evidence
Impact and severity
No response
Additional information
No response