Skip to content

fix(feishu): correct Chinese filename encoding in attachments#59431

Closed
DoChaoing wants to merge 1 commit into
openclaw:mainfrom
DoChaoing:fix/issue-59409-v3
Closed

fix(feishu): correct Chinese filename encoding in attachments#59431
DoChaoing wants to merge 1 commit into
openclaw:mainfrom
DoChaoing:fix/issue-59409-v3

Conversation

@DoChaoing

Copy link
Copy Markdown

Fixes #59409

Summary

When receiving files with Chinese filenames via Feishu webhook, the filenames were displayed as garbled text (Latin-1 decoded UTF-8 bytes).

Root Cause

  • HTTP headers are typically parsed as ISO-8859-1 (Latin-1)
  • UTF-8 encoded filenames get misinterpreted as Latin-1 characters
  • Example: 六大组件全解析.md shows as å_å_ç_ä_å_è_æ.md

Solution

  • Added fixMisdecodedUtf8() function to detect and correct the encoding
  • Re-encodes Latin-1 chars back to bytes and decodes as UTF-8
  • Applied to both Content-Disposition header and API response file_name field

Test Plan

  1. Send a file with Chinese filename via Feishu chat
  2. Check the saved file in ~/.openclaw/media/inbound/
  3. Verify filename is preserved correctly (e.g., 六大组件全解析---<uuid>.md)

🤖 Generated with ClawHub

Fixes openclaw#59409

When receiving files with Chinese filenames via Feishu webhook, the
filenames were displayed as garbled text (Latin-1 decoded UTF-8 bytes).

Root cause:
- HTTP headers are typically parsed as ISO-8859-1 (Latin-1)
- UTF-8 encoded filenames get misinterpreted as Latin-1 characters
- Example: "六大组件全解析.md" shows as "å_å_ç_ä_å_è_æ.md"

Solution:
- Added fixMisdecodedUtf8() function to detect and correct the encoding
- Re-encodes Latin-1 chars back to bytes and decodes as UTF-8
- Applied to both Content-Disposition header and API response file_name

The fix preserves valid Latin-1 strings and only corrects misdecoded UTF-8.
@openclaw-barnacle openclaw-barnacle Bot added channel: feishu Channel integration: feishu size: S labels Apr 2, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ff4263db4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


// Apply UTF-8 fix to API response file_name field as well.
// The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses.
const fileName = rawFileName ? fixMisdecodedUtf8(rawFileName) : undefined;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict UTF-8 repair to header-derived filenames

Applying fixMisdecodedUtf8 to responseWithOptionalFields.file_name is lossy for valid API-provided names, because JSON strings are already decoded text and not raw header bytes. In this path, legitimate filenames that happen to match mojibake-like byte patterns (for example "© budget.xlsx") are rewritten to different text ("© budget.xlsx"), so users can receive renamed files even when Feishu returned the correct name.

Useful? React with 👍 / 👎.

@greptile-apps

greptile-apps Bot commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes garbled Chinese filenames in Feishu inbound attachments by introducing fixMisdecodedUtf8(), which re-encodes a Latin-1–interpreted string back to bytes and decodes it as UTF-8, applying this correction to both Content-Disposition header filenames and the fallback API response file_name fields.

The core fix is sound and well-commented:

  • The Latin-1 re-encode → UTF-8 decode technique correctly reverses the mis-parse.
  • The \uFFFD guard prevents obvious false positives (e.g., genuinely Latin-extended filenames that aren't valid UTF-8 byte sequences).
  • decodeDispositionFileName correctly skips the fix when a proper filename*=UTF-8''… encoded value is already present.

Two minor concerns worth noting:

  • fixMisdecodedUtf8 is applied twice to Content-Disposition-derived filenames (once inside decodeDispositionFileName, once in extractFeishuDownloadMetadata). This is benign today (post-fix Chinese characters are above \u00FF), but it creates a subtle coupling.
  • Applying the fix to JSON API response file_name fields is speculative — JSON responses are UTF-8 by spec and shouldn't require this treatment. If it addresses an observed SDK bug, a reference to that bug/SDK version would help future maintainers know when to remove the workaround.

Confidence Score: 4/5

  • Safe to merge — the fix is correct and well-guarded; the two style concerns are non-blocking.
  • The encoding fix logic is valid, the \uFFFD safety check limits false positives, and the change is narrowly scoped to one file. The redundant double-application and speculative JSON fix are style concerns that don't affect correctness for the targeted Chinese-filename use case.
  • No files require special attention; all changes are in extensions/feishu/src/media.ts.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 218-227

Comment:
**Double application of `fixMisdecodedUtf8` for Content-Disposition filenames**

When the filename comes from the `Content-Disposition` header, `fixMisdecodedUtf8` is applied twice: once inside `decodeDispositionFileName` (line 150) and then again here on `rawFileName` (line 227).

In practice this is harmless today — after a successful UTF-8 fix the resulting Chinese characters fall in the `\u4E00–\u9FFF` range, which is above the `\u0080-\u00FF` detection window, so the second call returns immediately. However, this hidden coupling means a future change to `fixMisdecodedUtf8`'s detection heuristic could silently double-process header-derived names while leaving API-derived names single-processed.

Consider applying the fix in only one place. The cleanest option is to remove the call from `decodeDispositionFileName` and rely solely on the unified fix in `extractFeishuDownloadMetadata`:

```
// In decodeDispositionFileName — return rawFileName directly:
  const plainMatch = value.match(/filename="?([^";]+)"?/i);
  return plainMatch?.[1]?.trim();
```

…and keep the single `fixMisdecodedUtf8(rawFileName)` call in `extractFeishuDownloadMetadata` as is. This way all filename sources are treated uniformly.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 225-227

Comment:
**Speculative application to JSON-derived filenames**

The comment says "The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses", but JSON bodies are defined to be UTF-8 (RFC 8259 §8.1) and standard SDK JSON parsers do not re-interpret strings as Latin-1. Applying `fixMisdecodedUtf8` to `file_name` / `fileName` values sourced from the JSON API response is therefore speculative. While the `\uFFFD` guard limits false-positives, it isn't perfect — a genuine Latin-extended filename (e.g., `résumé.pdf`) that happens to produce valid UTF-8 when re-encoded could be silently rewritten.

If this is targeting a known, observed bug in the Feishu SDK (where it was seen to return garbled JSON field values), it would be worth adding a concrete reference (SDK version, issue link) so the workaround can be removed once the upstream bug is fixed. If it's purely preventive, consider scoping the fix only to header-derived filenames where the Latin-1 misparse is documented behaviour.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(feishu): correct Chinese filename en..." | Re-trigger Greptile

Comment on lines +218 to +227
const rawFileName =
(disposition ? decodeDispositionFileName(disposition) : undefined) ??
responseWithOptionalFields.file_name ??
responseWithOptionalFields.fileName ??
responseWithOptionalFields.data?.file_name ??
responseWithOptionalFields.data?.fileName;

// Apply UTF-8 fix to API response file_name field as well.
// The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses.
const fileName = rawFileName ? fixMisdecodedUtf8(rawFileName) : undefined;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Double application of fixMisdecodedUtf8 for Content-Disposition filenames

When the filename comes from the Content-Disposition header, fixMisdecodedUtf8 is applied twice: once inside decodeDispositionFileName (line 150) and then again here on rawFileName (line 227).

In practice this is harmless today — after a successful UTF-8 fix the resulting Chinese characters fall in the \u4E00–\u9FFF range, which is above the \u0080-\u00FF detection window, so the second call returns immediately. However, this hidden coupling means a future change to fixMisdecodedUtf8's detection heuristic could silently double-process header-derived names while leaving API-derived names single-processed.

Consider applying the fix in only one place. The cleanest option is to remove the call from decodeDispositionFileName and rely solely on the unified fix in extractFeishuDownloadMetadata:

// In decodeDispositionFileName — return rawFileName directly:
  const plainMatch = value.match(/filename="?([^";]+)"?/i);
  return plainMatch?.[1]?.trim();

…and keep the single fixMisdecodedUtf8(rawFileName) call in extractFeishuDownloadMetadata as is. This way all filename sources are treated uniformly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 218-227

Comment:
**Double application of `fixMisdecodedUtf8` for Content-Disposition filenames**

When the filename comes from the `Content-Disposition` header, `fixMisdecodedUtf8` is applied twice: once inside `decodeDispositionFileName` (line 150) and then again here on `rawFileName` (line 227).

In practice this is harmless today — after a successful UTF-8 fix the resulting Chinese characters fall in the `\u4E00–\u9FFF` range, which is above the `\u0080-\u00FF` detection window, so the second call returns immediately. However, this hidden coupling means a future change to `fixMisdecodedUtf8`'s detection heuristic could silently double-process header-derived names while leaving API-derived names single-processed.

Consider applying the fix in only one place. The cleanest option is to remove the call from `decodeDispositionFileName` and rely solely on the unified fix in `extractFeishuDownloadMetadata`:

```
// In decodeDispositionFileName — return rawFileName directly:
  const plainMatch = value.match(/filename="?([^";]+)"?/i);
  return plainMatch?.[1]?.trim();
```

…and keep the single `fixMisdecodedUtf8(rawFileName)` call in `extractFeishuDownloadMetadata` as is. This way all filename sources are treated uniformly.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +225 to +227
// Apply UTF-8 fix to API response file_name field as well.
// The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses.
const fileName = rawFileName ? fixMisdecodedUtf8(rawFileName) : undefined;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Speculative application to JSON-derived filenames

The comment says "The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses", but JSON bodies are defined to be UTF-8 (RFC 8259 §8.1) and standard SDK JSON parsers do not re-interpret strings as Latin-1. Applying fixMisdecodedUtf8 to file_name / fileName values sourced from the JSON API response is therefore speculative. While the \uFFFD guard limits false-positives, it isn't perfect — a genuine Latin-extended filename (e.g., résumé.pdf) that happens to produce valid UTF-8 when re-encoded could be silently rewritten.

If this is targeting a known, observed bug in the Feishu SDK (where it was seen to return garbled JSON field values), it would be worth adding a concrete reference (SDK version, issue link) so the workaround can be removed once the upstream bug is fixed. If it's purely preventive, consider scoping the fix only to header-derived filenames where the Latin-1 misparse is documented behaviour.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 225-227

Comment:
**Speculative application to JSON-derived filenames**

The comment says "The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses", but JSON bodies are defined to be UTF-8 (RFC 8259 §8.1) and standard SDK JSON parsers do not re-interpret strings as Latin-1. Applying `fixMisdecodedUtf8` to `file_name` / `fileName` values sourced from the JSON API response is therefore speculative. While the `\uFFFD` guard limits false-positives, it isn't perfect — a genuine Latin-extended filename (e.g., `résumé.pdf`) that happens to produce valid UTF-8 when re-encoded could be silently rewritten.

If this is targeting a known, observed bug in the Feishu SDK (where it was seen to return garbled JSON field values), it would be worth adding a concrete reference (SDK version, issue link) so the workaround can be removed once the upstream bug is fixed. If it's purely preventive, consider scoping the fix only to header-derived filenames where the Latin-1 misparse is documented behaviour.

How can I resolve this? If you propose a fix, please make it concise.

@vincentkoc

Copy link
Copy Markdown
Member

This is superseded by #72388, which has landed as the canonical ProjectClownfish fix path for this cluster.

Closing this now that the validated fix is merged. If this still reproduces on current main with a different path, reply here and we can reopen or split it back out.

@vincentkoc vincentkoc closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: feishu Channel integration: feishu clawsweeper Tracked by ClawSweeper automation size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Feishu plugin: Chinese filenames in attachments display as garbled (Latin-1 encoding issue)

2 participants