fix(feishu): recover Chinese filenames from Latin-1 mojibake in Content-Disposition by lishuaigit · Pull Request #50435 · openclaw/openclaw

lishuaigit · 2026-03-19T13:06:27Z

Summary

Problem: When receiving files via Feishu with Chinese characters in the filename (e.g. "何不同舟渡_2.txt"), the saved filename is garbled (e.g. "æµ_è_æ_ä_2.txt"). This is a classic UTF-8 → Latin-1 mojibake.
Root cause: Node.js HTTP parser decodes header values as ISO-8859-1 per RFC 7230. When Feishu returns Content-Disposition: attachment; filename="何不同舟渡_2.txt" using the plain filename parameter (without RFC 5987 filename*=UTF-8'...), each 3-byte UTF-8 Chinese character becomes 3 separate Latin-1 characters.
Fix: Add tryRecoverLatin1AsUtf8() which detects the mojibake pattern (all chars in U+0000–U+00FF range, contains non-ASCII) and reconstructs the original UTF-8 string. The recovery is safely skipped for pure ASCII strings and strings with genuine non-Latin-1 Unicode characters.

Change Type

Bug fix

Scope

Feishu/Lark channel

Linked Issue

Closes [Bug]: Feishu file names with Chinese characters are garbled (UTF-8 encoding issue) #48388

User-visible / Behavior Changes

Before: File "何不同舟渡_2.txt" → saved as "æµ_è_æ_ä_2---uuid.txt"

After: File "何不同舟渡_2.txt" → saved as "何不同舟渡_2---uuid.txt"

Security Impact

None. Pure string transformation with no external I/O.

Evidence

32 tests passing (30 existing + 2 new filename recovery tests)

✓ recovers Chinese filenames from Latin-1 mojibake in Content-Disposition
✓ preserves ASCII filenames without modification

Compatibility

Backward compatible. ASCII filenames are unchanged (fast path).
The filename*=UTF-8'... path (already correct) is tried first.

Risks

Minimal. The recovery only fires when ALL characters are in the Latin-1 range AND the bytes form valid UTF-8. If the bytes are not valid UTF-8, the original string is returned unchanged.

[AI-assisted development by OpenClaw agent 虾干 🦐]

greptile-apps · 2026-03-19T13:09:51Z

Greptile Summary

This PR fixes a well-known Latin-1 mojibake problem affecting Chinese filenames in Feishu Content-Disposition headers. Node.js decodes HTTP/1.x header values as ISO-8859-1, so a UTF-8 Chinese filename sent by Feishu (without filename*=UTF-8'' encoding) arrives as garbled Latin-1 characters. The new tryRecoverLatin1AsUtf8() helper detects this pattern and reconstructs the original string using TextDecoder("utf-8", { fatal: true }) for safe, lossless recovery.

Implementation is correct and well-guarded: the fast ASCII path skips the conversion entirely; the Latin-1 range guard prevents false attempts on strings that already contain genuine Unicode codepoints above U+00FF; and TextDecoder with fatal: true ensures the fallback to the original string whenever the bytes don't form valid UTF-8.
Known edge-case trade-off: a genuine Latin-1 filename whose bytes happen to form a valid UTF-8 sequence (e.g., Ã©file.txt ≡ UTF-8 ©file.txt) would be silently remapped. This is an inherent ambiguity of the heuristic and is acknowledged in the PR description; for a Chinese-first service like Feishu it is an acceptable risk in practice.
Test coverage simulates the exact Node.js HTTP parser behavior (String.fromCharCode over UTF-8 bytes) and verifies both the recovery and the ASCII no-op paths. Adding a test with a European Latin-1 filename (e.g., café.txt) would further document the safe-fallback boundary, though it is not blocking.
No regressions: the filename*=UTF-8'' (RFC 5987) path is unchanged and still tried first.

Confidence Score: 4/5

Safe to merge — the change is a targeted, backward-compatible string transformation with no external I/O and a safe fallback path.
The core logic is correct: ASCII fast path, Latin-1 range guard, and TextDecoder with fatal: true give the right behaviour for every tested category (Chinese UTF-8 mojibake, plain ASCII, invalid UTF-8 sequences). The one point deducted is for the acknowledged false-positive edge case with Latin-1 filenames whose bytes form a coincidentally valid UTF-8 sequence, and for the absence of a test that explicitly documents that boundary.
No files require special attention.

_{Last reviewed commit: "fix(feishu): recover..."}

WingedDragon

✅ Approved

Scope: Recovers Chinese filenames from Latin-1 mojibake in Feishu Content-Disposition headers.

Strengths:

tryRecoverLatin1AsUtf8 is textbook: fast ASCII path, Latin-1 range check, TextDecoder with fatal: true for strict UTF-8 validation, catch returns original
Handles the real-world scenario where Node.js HTTP parser decodes UTF-8 bytes as ISO-8859-1 per RFC 7230
Tests cover both Chinese filename recovery and ASCII passthrough
Integration with existing decodeDispositionFileName is clean — recovery runs on the plain filename= match, after the filename*=UTF-8'' check

No concerns. Ship it.

WingedDragon

✅ Approved

Scope: Fix Feishu Chinese filenames garbled by Latin-1 mojibake in Content-Disposition headers. Node.js HTTP parser decodes headers as ISO-8859-1 per RFC 7230, but Feishu sends UTF-8 filenames.

Strengths:

tryRecoverLatin1AsUtf8 is a well-considered implementation:
- Fast path for ASCII (no recovery needed)
- Guard: only attempts recovery if all chars are in Latin-1 range (U+0000–U+00FF)
- Uses TextDecoder with fatal: true — if the bytes aren't valid UTF-8, returns original unchanged
Two tests: Chinese filename recovery (何不同舟渡_2.txt) and ASCII preservation (report.pdf)
Applied at the decodeDispositionFileName level — covers all file downloads, not just a specific code path

No concerns. Correct fix for a real i18n encoding issue. Ship it.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b48b713cdc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T07:30:23Z

+  if (plainMatch?.[1]) {
+    const raw = plainMatch[1].trim();
+    return tryRecoverLatin1AsUtf8(raw);


Avoid rewriting valid Latin-1 filenames as UTF-8

Applying tryRecoverLatin1AsUtf8 to every non-ASCII filename= value corrupts some filenames that were already correct. For example, a real filename like Ã©.txt or Â£ rates.txt is valid Latin-1 text, but TextDecoder will reinterpret those bytes as UTF-8 and change it to é.txt / £ rates.txt. Since sanitizeFileNameForUpload preserves these names on upload, downloads through this path no longer round-trip those legitimate filenames.

Useful? React with 👍 / 👎.

…nt-Disposition Node.js HTTP parser decodes header values as ISO-8859-1 (per RFC 7230). When Feishu returns a Content-Disposition header with a UTF-8 filename in the plain `filename` parameter (without `filename*=UTF-8'...`), each multibyte UTF-8 character becomes multiple Latin-1 code points, producing garbled names like "æµ_è_æ_ä_2.txt" instead of "何不同舟渡_2.txt". Add tryRecoverLatin1AsUtf8() which detects the mojibake pattern (all chars in U+0000–U+00FF range) and attempts to reconstruct the original UTF-8 string from the byte values. The recovery is skipped for pure ASCII strings (fast path) and strings that already contain non-Latin-1 characters. Closes openclaw#48388

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af083d4ff6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-30T03:04:34Z

-  return plainMatch?.[1]?.trim();
+  if (plainMatch?.[1]) {
+    const raw = plainMatch[1].trim();
+    return tryRecoverLatin1AsUtf8(raw);


Gate UTF-8 recovery behind a mojibake heuristic

Calling tryRecoverLatin1AsUtf8 for every non-ASCII filename= value rewrites some valid Latin-1 names that were not mojibake (for example "Â£ rates.txt" becomes "£ rates.txt"), so legitimate filenames can be silently changed on download. I verified this with fresh evidence by running the new helper logic locally and reproducing that exact transformation, which means this can break filename round-tripping whenever users intentionally include byte patterns that are valid UTF-8 sequences.

Useful? React with 👍 / 👎.

vincentkoc · 2026-04-28T05:23:25Z

This is superseded by #72388, which has landed as the canonical ProjectClownfish fix path for this cluster.

Closing this now that the validated fix is merged. If this still reproduces on current main with a different path, reply here and we can reopen or split it back out.

openclaw-barnacle Bot added channel: feishu Channel integration: feishu size: S labels Mar 19, 2026

WingedDragon reviewed Mar 19, 2026

View reviewed changes

lishuaigit force-pushed the fix/feishu-chinese-filename-encoding branch from e6ba829 to b48b713 Compare March 20, 2026 07:26

chatgpt-codex-connector Bot reviewed Mar 20, 2026

View reviewed changes

openclaw-barnacle Bot added the docs Improvements or additions to documentation label Mar 20, 2026

lishuaigit force-pushed the fix/feishu-chinese-filename-encoding branch from d8beb21 to af083d4 Compare March 30, 2026 03:00

openclaw-barnacle Bot removed the docs Improvements or additions to documentation label Mar 30, 2026

chatgpt-codex-connector Bot reviewed Mar 30, 2026

View reviewed changes

ci: retrigger checks

fcaa7b0

vincentkoc mentioned this pull request Apr 26, 2026

fix(feishu): recover mojibake filenames from Content-Disposition #72388

Merged

vincentkoc added the clawsweeper Tracked by ClawSweeper automation label Apr 28, 2026

vincentkoc closed this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(feishu): recover Chinese filenames from Latin-1 mojibake in Content-Disposition#50435

fix(feishu): recover Chinese filenames from Latin-1 mojibake in Content-Disposition#50435
lishuaigit wants to merge 2 commits into
openclaw:mainfrom
lishuaigit:fix/feishu-chinese-filename-encoding

lishuaigit commented Mar 19, 2026

Uh oh!

greptile-apps Bot commented Mar 19, 2026

Uh oh!

WingedDragon left a comment

Uh oh!

WingedDragon left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Uh oh!

vincentkoc commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lishuaigit commented Mar 19, 2026

Summary

Change Type

Scope

Linked Issue

User-visible / Behavior Changes

Security Impact

Evidence

Compatibility

Risks

Uh oh!

greptile-apps Bot commented Mar 19, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

WingedDragon left a comment

Choose a reason for hiding this comment

Uh oh!

WingedDragon left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

vincentkoc commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants