fix(feishu): correct Chinese filename encoding in attachments#59431
fix(feishu): correct Chinese filename encoding in attachments#59431DoChaoing wants to merge 1 commit into
Conversation
Fixes openclaw#59409 When receiving files with Chinese filenames via Feishu webhook, the filenames were displayed as garbled text (Latin-1 decoded UTF-8 bytes). Root cause: - HTTP headers are typically parsed as ISO-8859-1 (Latin-1) - UTF-8 encoded filenames get misinterpreted as Latin-1 characters - Example: "六大组件全解析.md" shows as "å_å_ç_ä_å_è_æ.md" Solution: - Added fixMisdecodedUtf8() function to detect and correct the encoding - Re-encodes Latin-1 chars back to bytes and decodes as UTF-8 - Applied to both Content-Disposition header and API response file_name The fix preserves valid Latin-1 strings and only corrects misdecoded UTF-8.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9ff4263db4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| // Apply UTF-8 fix to API response file_name field as well. | ||
| // The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses. | ||
| const fileName = rawFileName ? fixMisdecodedUtf8(rawFileName) : undefined; |
There was a problem hiding this comment.
Restrict UTF-8 repair to header-derived filenames
Applying fixMisdecodedUtf8 to responseWithOptionalFields.file_name is lossy for valid API-provided names, because JSON strings are already decoded text and not raw header bytes. In this path, legitimate filenames that happen to match mojibake-like byte patterns (for example "© budget.xlsx") are rewritten to different text ("© budget.xlsx"), so users can receive renamed files even when Feishu returned the correct name.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR fixes garbled Chinese filenames in Feishu inbound attachments by introducing The core fix is sound and well-commented:
Two minor concerns worth noting:
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 218-227
Comment:
**Double application of `fixMisdecodedUtf8` for Content-Disposition filenames**
When the filename comes from the `Content-Disposition` header, `fixMisdecodedUtf8` is applied twice: once inside `decodeDispositionFileName` (line 150) and then again here on `rawFileName` (line 227).
In practice this is harmless today — after a successful UTF-8 fix the resulting Chinese characters fall in the `\u4E00–\u9FFF` range, which is above the `\u0080-\u00FF` detection window, so the second call returns immediately. However, this hidden coupling means a future change to `fixMisdecodedUtf8`'s detection heuristic could silently double-process header-derived names while leaving API-derived names single-processed.
Consider applying the fix in only one place. The cleanest option is to remove the call from `decodeDispositionFileName` and rely solely on the unified fix in `extractFeishuDownloadMetadata`:
```
// In decodeDispositionFileName — return rawFileName directly:
const plainMatch = value.match(/filename="?([^";]+)"?/i);
return plainMatch?.[1]?.trim();
```
…and keep the single `fixMisdecodedUtf8(rawFileName)` call in `extractFeishuDownloadMetadata` as is. This way all filename sources are treated uniformly.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 225-227
Comment:
**Speculative application to JSON-derived filenames**
The comment says "The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses", but JSON bodies are defined to be UTF-8 (RFC 8259 §8.1) and standard SDK JSON parsers do not re-interpret strings as Latin-1. Applying `fixMisdecodedUtf8` to `file_name` / `fileName` values sourced from the JSON API response is therefore speculative. While the `\uFFFD` guard limits false-positives, it isn't perfect — a genuine Latin-extended filename (e.g., `résumé.pdf`) that happens to produce valid UTF-8 when re-encoded could be silently rewritten.
If this is targeting a known, observed bug in the Feishu SDK (where it was seen to return garbled JSON field values), it would be worth adding a concrete reference (SDK version, issue link) so the workaround can be removed once the upstream bug is fixed. If it's purely preventive, consider scoping the fix only to header-derived filenames where the Latin-1 misparse is documented behaviour.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(feishu): correct Chinese filename en..." | Re-trigger Greptile |
| const rawFileName = | ||
| (disposition ? decodeDispositionFileName(disposition) : undefined) ?? | ||
| responseWithOptionalFields.file_name ?? | ||
| responseWithOptionalFields.fileName ?? | ||
| responseWithOptionalFields.data?.file_name ?? | ||
| responseWithOptionalFields.data?.fileName; | ||
|
|
||
| // Apply UTF-8 fix to API response file_name field as well. | ||
| // The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses. | ||
| const fileName = rawFileName ? fixMisdecodedUtf8(rawFileName) : undefined; |
There was a problem hiding this comment.
Double application of
fixMisdecodedUtf8 for Content-Disposition filenames
When the filename comes from the Content-Disposition header, fixMisdecodedUtf8 is applied twice: once inside decodeDispositionFileName (line 150) and then again here on rawFileName (line 227).
In practice this is harmless today — after a successful UTF-8 fix the resulting Chinese characters fall in the \u4E00–\u9FFF range, which is above the \u0080-\u00FF detection window, so the second call returns immediately. However, this hidden coupling means a future change to fixMisdecodedUtf8's detection heuristic could silently double-process header-derived names while leaving API-derived names single-processed.
Consider applying the fix in only one place. The cleanest option is to remove the call from decodeDispositionFileName and rely solely on the unified fix in extractFeishuDownloadMetadata:
// In decodeDispositionFileName — return rawFileName directly:
const plainMatch = value.match(/filename="?([^";]+)"?/i);
return plainMatch?.[1]?.trim();
…and keep the single fixMisdecodedUtf8(rawFileName) call in extractFeishuDownloadMetadata as is. This way all filename sources are treated uniformly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 218-227
Comment:
**Double application of `fixMisdecodedUtf8` for Content-Disposition filenames**
When the filename comes from the `Content-Disposition` header, `fixMisdecodedUtf8` is applied twice: once inside `decodeDispositionFileName` (line 150) and then again here on `rawFileName` (line 227).
In practice this is harmless today — after a successful UTF-8 fix the resulting Chinese characters fall in the `\u4E00–\u9FFF` range, which is above the `\u0080-\u00FF` detection window, so the second call returns immediately. However, this hidden coupling means a future change to `fixMisdecodedUtf8`'s detection heuristic could silently double-process header-derived names while leaving API-derived names single-processed.
Consider applying the fix in only one place. The cleanest option is to remove the call from `decodeDispositionFileName` and rely solely on the unified fix in `extractFeishuDownloadMetadata`:
```
// In decodeDispositionFileName — return rawFileName directly:
const plainMatch = value.match(/filename="?([^";]+)"?/i);
return plainMatch?.[1]?.trim();
```
…and keep the single `fixMisdecodedUtf8(rawFileName)` call in `extractFeishuDownloadMetadata` as is. This way all filename sources are treated uniformly.
How can I resolve this? If you propose a fix, please make it concise.| // Apply UTF-8 fix to API response file_name field as well. | ||
| // The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses. | ||
| const fileName = rawFileName ? fixMisdecodedUtf8(rawFileName) : undefined; |
There was a problem hiding this comment.
Speculative application to JSON-derived filenames
The comment says "The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses", but JSON bodies are defined to be UTF-8 (RFC 8259 §8.1) and standard SDK JSON parsers do not re-interpret strings as Latin-1. Applying fixMisdecodedUtf8 to file_name / fileName values sourced from the JSON API response is therefore speculative. While the \uFFFD guard limits false-positives, it isn't perfect — a genuine Latin-extended filename (e.g., résumé.pdf) that happens to produce valid UTF-8 when re-encoded could be silently rewritten.
If this is targeting a known, observed bug in the Feishu SDK (where it was seen to return garbled JSON field values), it would be worth adding a concrete reference (SDK version, issue link) so the workaround can be removed once the upstream bug is fixed. If it's purely preventive, consider scoping the fix only to header-derived filenames where the Latin-1 misparse is documented behaviour.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 225-227
Comment:
**Speculative application to JSON-derived filenames**
The comment says "The Feishu SDK may also misdecode UTF-8 as Latin-1 in JSON responses", but JSON bodies are defined to be UTF-8 (RFC 8259 §8.1) and standard SDK JSON parsers do not re-interpret strings as Latin-1. Applying `fixMisdecodedUtf8` to `file_name` / `fileName` values sourced from the JSON API response is therefore speculative. While the `\uFFFD` guard limits false-positives, it isn't perfect — a genuine Latin-extended filename (e.g., `résumé.pdf`) that happens to produce valid UTF-8 when re-encoded could be silently rewritten.
If this is targeting a known, observed bug in the Feishu SDK (where it was seen to return garbled JSON field values), it would be worth adding a concrete reference (SDK version, issue link) so the workaround can be removed once the upstream bug is fixed. If it's purely preventive, consider scoping the fix only to header-derived filenames where the Latin-1 misparse is documented behaviour.
How can I resolve this? If you propose a fix, please make it concise.|
This is superseded by #72388, which has landed as the canonical ProjectClownfish fix path for this cluster. Closing this now that the validated fix is merged. If this still reproduces on current main with a different path, reply here and we can reopen or split it back out. |
Fixes #59409
Summary
When receiving files with Chinese filenames via Feishu webhook, the filenames were displayed as garbled text (Latin-1 decoded UTF-8 bytes).
Root Cause
六大组件全解析.mdshows aså_å_ç_ä_å_è_æ.mdSolution
fixMisdecodedUtf8()function to detect and correct the encodingfile_namefieldTest Plan
~/.openclaw/media/inbound/六大组件全解析---<uuid>.md)🤖 Generated with ClawHub