Skip to content

[Bug]: web_fetch returns mojibake for non-UTF-8 pages #72916

@nickyhk

Description

@nickyhk

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

web_fetch appear to decode HTTP response bodies as UTF-8 unconditionally. Pages encoded with legacy charsets such as Shift_JIS, Big5, GBK, etc. return mojibake / replacement characters instead of readable text.

Suspected cause

The shared response reader appears to decode as UTF-8 by default.

In src/agents/tools/web-shared.ts, readResponseText() uses:

const decoder = new TextDecoder();

and later:

const text = await res.text();

Both paths default to UTF-8. This causes non-UTF-8 pages to be decoded incorrectly before HTML extraction/readability processing.

Suggested fix

Decode from raw bytes instead of calling res.text() directly:

  1. Read response as ArrayBuffer / raw bytes.
  2. Detect charset from Content-Type, e.g.:
Content-Type: text/html; charset=Shift_JIS
  1. If missing, scan the first few KB of HTML for:
<meta charset="...">

or:

<meta http-equiv="Content-Type" content="text/html; charset=...">
  1. Decode with:
new TextDecoder(charset)
  1. Fall back to UTF-8 only if no charset can be determined.

Steps to reproduce

Use a known Shift_JIS page:

http://www.aozora.gr.jp/cards/000081/files/46268_23911.html

Call:

web_fetch({
  url: "http://www.aozora.gr.jp/cards/000081/files/46268_23911.html",
  extractMode: "text"
})

Expected behavior

The page should be decoded according to its declared charset and return readable Japanese text.

Actual behavior

Output contains mojibake, for example:

�{�V���� ���߂���̉�
...

OpenClaw version

2026.04.24

Operating system

ubuntu 24.04.4 LTS

Install method

No response

Model

GTP-5.4

Provider / routing chain

Telegram → OpenClaw Gateway → model router / OpenAI API → gpt-5.4

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions