fix(web-fetch): detect response charset from Content-Type and HTML meta#8
Open
suboss87 wants to merge 58 commits into
Open
fix(web-fetch): detect response charset from Content-Type and HTML meta#8suboss87 wants to merge 58 commits into
suboss87 wants to merge 58 commits into
Conversation
Before this fix, if onTimer rejected unexpectedly (e.g. a Node.js internal error or GC pressure causing an exception in the finally block's armTimer call), the .catch() handler only logged the error. The scheduler chain was then permanently broken with no timer set, silently halting all cron jobs until the next gateway restart. Fix: call armTimer(state) inside the .catch() handler so a rare unexpected rejection does not permanently stop the scheduler. Regression test exercises the path by making nowMs() throw on the 4th call (inside the finally block's armTimer), which causes onTimer to reject; the .catch() re-arm is then verified via state.timer. Closes openclaw#73166. https://claude.ai/code/session_01NHHoPHTrH4F9qFJBJHqjTk
web_fetch decoded all HTTP response bodies as UTF-8 unconditionally. readResponseText() used `new TextDecoder()` (UTF-8) in the streaming path and `res.text()` (also UTF-8 per WHATWG Fetch spec) in the fallback path, causing mojibake for legacy-charset pages such as Shift_JIS, Big5, and ISO-8859-1. Fix: in the streaming path, collect raw bytes before decoding and resolve the charset from the Content-Type `charset=` parameter; if absent and the response is HTML, scan the first 4 KB for a `<meta charset>` or http-equiv declaration, then decode with `TextDecoder(detectedCharset)`. The non-streaming fallback uses `arrayBuffer()` + the same charset resolution for environments that expose it, and retains the old `text()` path as a last resort. Closes openclaw#72916. https://claude.ai/code/session_01NHHoPHTrH4F9qFJBJHqjTk
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
web_fetchdecoded all HTTP response bodies as UTF-8 unconditionally, producing mojibake for legacy-charset pages (Shift_JIS, Big5, GBK, ISO-8859-1, etc.)readResponseText()usednew TextDecoder()(UTF-8) in the streaming path andres.text()(also UTF-8 per WHATWG Fetch spec) in the non-streaming fallback -- neither respects the declared charsetContent-Type: charset=parameter; if absent and content is HTML, scan the first 4 KB for a<meta charset>or<meta http-equiv="Content-Type" content="...charset=...">declaration; decode withTextDecoder(detectedCharset), falling back to UTF-8 for unknown/missing labelsFiles changed
src/agents/tools/web-shared.ts-- charset helpers + reworked streaming and fallback decode pathssrc/agents/tools/web-shared.charset.test.ts-- 7 new regression tests (all pass)Test plan
web-shared.charset.test.tscovering: Content-Type charset, HTML meta charset, http-equiv meta, UTF-8 fallback, non-HTML content, maxBytes truncation with charsetpnpm checkcleanCloses openclaw#72916
Generated by Claude Code