Skip to content

fix(feishu): recover UTF-8 filenames from Latin-1 encoded Content-Disposition#48578

Closed
alex-xuweilong wants to merge 1 commit intoopenclaw:mainfrom
alex-xuweilong:fix/feishu-utf8-filename-decoding
Closed

fix(feishu): recover UTF-8 filenames from Latin-1 encoded Content-Disposition#48578
alex-xuweilong wants to merge 1 commit intoopenclaw:mainfrom
alex-xuweilong:fix/feishu-utf8-filename-decoding

Conversation

@alex-xuweilong
Copy link
Copy Markdown

Summary

Fixes garbled Chinese/CJK filenames when receiving files via Feishu channel.

Root Cause

When the Feishu API returns Content-Disposition with a plain filename header (without the RFC 5987 filename*=UTF-8'' form), the HTTP client decodes raw UTF-8 bytes as Latin-1. Each 3-byte CJK character becomes 3 Latin-1 characters, producing mojibake.

Fix

Add a tryRecoverLatin1AsUtf8() helper that detects high-byte Latin-1 artifacts and re-decodes them as UTF-8 via Buffer. Applied to the plain filename extraction path. Falls back to the original string if recovery fails.

Changes

  • extensions/feishu/src/media.ts: 16 lines added, 1 removed

Fixes #48388

…position

When the Feishu API returns Content-Disposition with filename="..."
(without the RFC 5987 filename*=UTF-8'' form), the HTTP client may
decode UTF-8 bytes as Latin-1, corrupting Chinese/CJK filenames
(e.g. "何不同舟渡_2.txt" becomes "æµ_è_æ_ä_2.txt").

Add a tryRecoverLatin1AsUtf8 helper that detects high-byte Latin-1
artifacts and re-decodes them as UTF-8, applied to the plain
filename="..." extraction path.

Fixes openclaw#48388
@openclaw-barnacle openclaw-barnacle Bot added channel: feishu Channel integration: feishu size: XS labels Mar 17, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 17, 2026

Greptile Summary

This PR adds a small, targeted heuristic (tryRecoverLatin1AsUtf8) to fix garbled CJK filenames when the Feishu API returns a plain filename="…" Content-Disposition header instead of the RFC 5987 filename*=UTF-8''… form. The HTTP client decodes the raw UTF-8 bytes as Latin-1, and the helper re-encodes those Latin-1 chars back to bytes and re-decodes as UTF-8, falling back to the original string if UTF-8 decoding produces replacement characters.

  • The \uFFFD guard is the correct way to detect a failed UTF-8 decode in Node.js and prevents false positives for most real Latin-1 content.
  • The empty catch {} is appropriate since the function always falls back to the original string on any error.
  • The change is narrowly scoped to the plainMatch path; the existing filename*=UTF-8'' (RFC 5987) path is unaffected.
  • One inherent limitation: a Latin-1 filename whose byte sequence happens to be valid UTF-8 (e.g. two Western-European characters forming a 2-byte UTF-8 sequence) would be silently rewritten. This is an unavoidable trade-off with any header-recovery heuristic and is a very unlikely scenario in Feishu's environment.

Confidence Score: 4/5

  • Safe to merge; the fix is minimal, well-guarded, and falls back gracefully.
  • The change is a single, self-contained helper with a correct fallback mechanism. The \uFFFD guard and try/catch block protect against incorrect recovery. The only deduction is the inherent heuristic ambiguity (valid Latin-1 that forms valid UTF-8), which is an accepted trade-off and very unlikely in this context.
  • No files require special attention.

Last reviewed commit: cd566af

@vincentkoc
Copy link
Copy Markdown
Member

ProjectClownfish could not safely update this branch, so it opened a narrow replacement PR instead.

Replacement PR: #72388
Source PR: #48578
Contributor credit is preserved in the replacement PR body and changelog plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: feishu Channel integration: feishu size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Feishu file names with Chinese characters are garbled (UTF-8 encoding issue)

2 participants