fix(feishu): correct Chinese filename encoding in attachments by DoChaoing · Pull Request #59431 · openclaw/openclaw

DoChaoing · 2026-04-02T04:53:17Z

Summary

When receiving files with Chinese filenames via Feishu webhook, the filenames were displayed as garbled text (Latin-1 decoded UTF-8 bytes).

Root Cause

HTTP headers are typically parsed as ISO-8859-1 (Latin-1)
UTF-8 encoded filenames get misinterpreted as Latin-1 characters
Example: 六大组件全解析.md shows as å_å_ç_ä_å_è_æ.md

Solution

Added fixMisdecodedUtf8() function to detect and correct the encoding
Re-encodes Latin-1 chars back to bytes and decodes as UTF-8
Applied to both Content-Disposition header and API response file_name field

Test Plan

Send a file with Chinese filename via Feishu chat
Check the saved file in ~/.openclaw/media/inbound/
Verify filename is preserved correctly (e.g., 六大组件全解析---<uuid>.md)

🤖 Generated with ClawHub

Fixes openclaw#59409 When receiving files with Chinese filenames via Feishu webhook, the filenames were displayed as garbled text (Latin-1 decoded UTF-8 bytes). Root cause: - HTTP headers are typically parsed as ISO-8859-1 (Latin-1) - UTF-8 encoded filenames get misinterpreted as Latin-1 characters - Example: "六大组件全解析.md" shows as "å_å_ç_ä_å_è_æ.md" Solution: - Added fixMisdecodedUtf8() function to detect and correct the encoding - Re-encodes Latin-1 chars back to bytes and decodes as UTF-8 - Applied to both Content-Disposition header and API response file_name The fix preserves valid Latin-1 strings and only corrects misdecoded UTF-8.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ff4263db4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T04:57:03Z


+  // Apply UTF-8 fix to API response file_name field as well.
+  // The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses.
+  const fileName = rawFileName ? fixMisdecodedUtf8(rawFileName) : undefined;


Restrict UTF-8 repair to header-derived filenames

Applying fixMisdecodedUtf8 to responseWithOptionalFields.file_name is lossy for valid API-provided names, because JSON strings are already decoded text and not raw header bytes. In this path, legitimate filenames that happen to match mojibake-like byte patterns (for example "Â© budget.xlsx") are rewritten to different text ("© budget.xlsx"), so users can receive renamed files even when Feishu returned the correct name.

Useful? React with 👍 / 👎.

greptile-apps · 2026-04-02T04:57:15Z

Greptile Summary

This PR fixes garbled Chinese filenames in Feishu inbound attachments by introducing fixMisdecodedUtf8(), which re-encodes a Latin-1–interpreted string back to bytes and decodes it as UTF-8, applying this correction to both Content-Disposition header filenames and the fallback API response file_name fields.

The core fix is sound and well-commented:

The Latin-1 re-encode → UTF-8 decode technique correctly reverses the mis-parse.
The \uFFFD guard prevents obvious false positives (e.g., genuinely Latin-extended filenames that aren't valid UTF-8 byte sequences).
decodeDispositionFileName correctly skips the fix when a proper filename*=UTF-8''… encoded value is already present.

Two minor concerns worth noting:

fixMisdecodedUtf8 is applied twice to Content-Disposition-derived filenames (once inside decodeDispositionFileName, once in extractFeishuDownloadMetadata). This is benign today (post-fix Chinese characters are above \u00FF), but it creates a subtle coupling.
Applying the fix to JSON API response file_name fields is speculative — JSON responses are UTF-8 by spec and shouldn't require this treatment. If it addresses an observed SDK bug, a reference to that bug/SDK version would help future maintainers know when to remove the workaround.

Confidence Score: 4/5

Safe to merge — the fix is correct and well-guarded; the two style concerns are non-blocking.
The encoding fix logic is valid, the \uFFFD safety check limits false positives, and the change is narrowly scoped to one file. The redundant double-application and speculative JSON fix are style concerns that don't affect correctness for the targeted Chinese-filename use case.
No files require special attention; all changes are in extensions/feishu/src/media.ts.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 218-227

Comment:
**Double application of `fixMisdecodedUtf8` for Content-Disposition filenames**

When the filename comes from the `Content-Disposition` header, `fixMisdecodedUtf8` is applied twice: once inside `decodeDispositionFileName` (line 150) and then again here on `rawFileName` (line 227).

In practice this is harmless today — after a successful UTF-8 fix the resulting Chinese characters fall in the `\u4E00–\u9FFF` range, which is above the `\u0080-\u00FF` detection window, so the second call returns immediately. However, this hidden coupling means a future change to `fixMisdecodedUtf8`'s detection heuristic could silently double-process header-derived names while leaving API-derived names single-processed.

Consider applying the fix in only one place. The cleanest option is to remove the call from `decodeDispositionFileName` and rely solely on the unified fix in `extractFeishuDownloadMetadata`:

```
// In decodeDispositionFileName — return rawFileName directly:
  const plainMatch = value.match(/filename="?([^";]+)"?/i);
  return plainMatch?.[1]?.trim();
```

…and keep the single `fixMisdecodedUtf8(rawFileName)` call in `extractFeishuDownloadMetadata` as is. This way all filename sources are treated uniformly.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 225-227

Comment:
**Speculative application to JSON-derived filenames**

The comment says "The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses", but JSON bodies are defined to be UTF-8 (RFC 8259 §8.1) and standard SDK JSON parsers do not re-interpret strings as Latin-1. Applying `fixMisdecodedUtf8` to `file_name` / `fileName` values sourced from the JSON API response is therefore speculative. While the `\uFFFD` guard limits false-positives, it isn't perfect — a genuine Latin-extended filename (e.g., `résumé.pdf`) that happens to produce valid UTF-8 when re-encoded could be silently rewritten.

If this is targeting a known, observed bug in the Feishu SDK (where it was seen to return garbled JSON field values), it would be worth adding a concrete reference (SDK version, issue link) so the workaround can be removed once the upstream bug is fixed. If it's purely preventive, consider scoping the fix only to header-derived filenames where the Latin-1 misparse is documented behaviour.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(feishu): correct Chinese filename en..." | Re-trigger Greptile}

greptile-apps · 2026-04-02T04:57:19Z

+  const rawFileName =
    (disposition ? decodeDispositionFileName(disposition) : undefined) ??
    responseWithOptionalFields.file_name ??
    responseWithOptionalFields.fileName ??
    responseWithOptionalFields.data?.file_name ??
    responseWithOptionalFields.data?.fileName;

+  // Apply UTF-8 fix to API response file_name field as well.
+  // The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses.
+  const fileName = rawFileName ? fixMisdecodedUtf8(rawFileName) : undefined;


Double application of fixMisdecodedUtf8 for Content-Disposition filenames

When the filename comes from the Content-Disposition header, fixMisdecodedUtf8 is applied twice: once inside decodeDispositionFileName (line 150) and then again here on rawFileName (line 227).

In practice this is harmless today — after a successful UTF-8 fix the resulting Chinese characters fall in the \u4E00–\u9FFF range, which is above the \u0080-\u00FF detection window, so the second call returns immediately. However, this hidden coupling means a future change to fixMisdecodedUtf8's detection heuristic could silently double-process header-derived names while leaving API-derived names single-processed.

Consider applying the fix in only one place. The cleanest option is to remove the call from decodeDispositionFileName and rely solely on the unified fix in extractFeishuDownloadMetadata:

// In decodeDispositionFileName — return rawFileName directly: const plainMatch = value.match(/filename="?([^";]+)"?/i); return plainMatch?.[1]?.trim();

…and keep the single fixMisdecodedUtf8(rawFileName) call in extractFeishuDownloadMetadata as is. This way all filename sources are treated uniformly.

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/feishu/src/media.ts Line: 218-227 Comment: **Double application of `fixMisdecodedUtf8` for Content-Disposition filenames** When the filename comes from the `Content-Disposition` header, `fixMisdecodedUtf8` is applied twice: once inside `decodeDispositionFileName` (line 150) and then again here on `rawFileName` (line 227). In practice this is harmless today — after a successful UTF-8 fix the resulting Chinese characters fall in the `\u4E00–\u9FFF` range, which is above the `\u0080-\u00FF` detection window, so the second call returns immediately. However, this hidden coupling means a future change to `fixMisdecodedUtf8`'s detection heuristic could silently double-process header-derived names while leaving API-derived names single-processed. Consider applying the fix in only one place. The cleanest option is to remove the call from `decodeDispositionFileName` and rely solely on the unified fix in `extractFeishuDownloadMetadata`: ``` // In decodeDispositionFileName — return rawFileName directly: const plainMatch = value.match(/filename="?([^";]+)"?/i); return plainMatch?.[1]?.trim(); ``` …and keep the single `fixMisdecodedUtf8(rawFileName)` call in `extractFeishuDownloadMetadata` as is. This way all filename sources are treated uniformly. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-02T04:57:19Z

+  // Apply UTF-8 fix to API response file_name field as well.
+  // The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses.
+  const fileName = rawFileName ? fixMisdecodedUtf8(rawFileName) : undefined;


Speculative application to JSON-derived filenames

The comment says "The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses", but JSON bodies are defined to be UTF-8 (RFC 8259 §8.1) and standard SDK JSON parsers do not re-interpret strings as Latin-1. Applying fixMisdecodedUtf8 to file_name / fileName values sourced from the JSON API response is therefore speculative. While the \uFFFD guard limits false-positives, it isn't perfect — a genuine Latin-extended filename (e.g., résumé.pdf) that happens to produce valid UTF-8 when re-encoded could be silently rewritten.

If this is targeting a known, observed bug in the Feishu SDK (where it was seen to return garbled JSON field values), it would be worth adding a concrete reference (SDK version, issue link) so the workaround can be removed once the upstream bug is fixed. If it's purely preventive, consider scoping the fix only to header-derived filenames where the Latin-1 misparse is documented behaviour.

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/feishu/src/media.ts Line: 225-227 Comment: **Speculative application to JSON-derived filenames** The comment says "The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses", but JSON bodies are defined to be UTF-8 (RFC 8259 §8.1) and standard SDK JSON parsers do not re-interpret strings as Latin-1. Applying `fixMisdecodedUtf8` to `file_name` / `fileName` values sourced from the JSON API response is therefore speculative. While the `\uFFFD` guard limits false-positives, it isn't perfect — a genuine Latin-extended filename (e.g., `résumé.pdf`) that happens to produce valid UTF-8 when re-encoded could be silently rewritten. If this is targeting a known, observed bug in the Feishu SDK (where it was seen to return garbled JSON field values), it would be worth adding a concrete reference (SDK version, issue link) so the workaround can be removed once the upstream bug is fixed. If it's purely preventive, consider scoping the fix only to header-derived filenames where the Latin-1 misparse is documented behaviour. How can I resolve this? If you propose a fix, please make it concise.

vincentkoc · 2026-04-28T05:23:34Z

This is superseded by #72388, which has landed as the canonical ProjectClownfish fix path for this cluster.

Closing this now that the validated fix is merged. If this still reproduces on current main with a different path, reply here and we can reopen or split it back out.

openclaw-barnacle Bot added channel: feishu Channel integration: feishu size: S labels Apr 2, 2026

chatgpt-codex-connector Bot reviewed Apr 2, 2026

View reviewed changes

greptile-apps Bot reviewed Apr 2, 2026

View reviewed changes

clawsweeper Bot mentioned this pull request Apr 26, 2026

[Bug]: 1. Feishu channel: Chinese filenames garbled when saving inbound media (UTF-8 → Latin-1 mojibake) #66500

Closed

vincentkoc mentioned this pull request Apr 26, 2026

fix(feishu): recover mojibake filenames from Content-Disposition #72388

Merged

vincentkoc added the clawsweeper Tracked by ClawSweeper automation label Apr 28, 2026

vincentkoc closed this Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(feishu): correct Chinese filename encoding in attachments#59431

fix(feishu): correct Chinese filename encoding in attachments#59431
DoChaoing wants to merge 1 commit into
openclaw:mainfrom
DoChaoing:fix/issue-59409-v3

DoChaoing commented Apr 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 2, 2026

Uh oh!

greptile-apps Bot commented Apr 2, 2026

Uh oh!

greptile-apps Bot Apr 2, 2026

Uh oh!

greptile-apps Bot Apr 2, 2026

Uh oh!

vincentkoc commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

DoChaoing commented Apr 2, 2026

Summary

Root Cause

Solution

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Apr 2, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

vincentkoc commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants