Skip to content

media: add shared filename decoder#71517

Open
szupzj18 wants to merge 3 commits into
openclaw:mainfrom
szupzj18:codex/fix-filename-decoding
Open

media: add shared filename decoder#71517
szupzj18 wants to merge 3 commits into
openclaw:mainfrom
szupzj18:codex/fix-filename-decoding

Conversation

@szupzj18

@szupzj18 szupzj18 commented Apr 25, 2026

Copy link
Copy Markdown

Summary

  • Problem: filename decoding is split across media/channel paths, so Content-Disposition edge cases and Feishu mojibake fixes are easy to miss or duplicate.
  • Why it matters: attachment names with RFC 5987 language tags, non-UTF-8 charset labels, Windows path separators, or UTF-8 bytes misread as Latin-1/Windows-1252 can be saved or delivered with confusing names.
  • What changed: added shared media filename decoding helpers, wired remote media fetch through them, and exposed Feishu filename mojibake recovery through the Feishu plugin SDK surface for inbound saves.
  • What did NOT change (scope boundary): this does not migrate every channel filename path yet, so it is a partial implementation of the broader centralized utility work.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Remote media downloads now preserve more Content-Disposition filenames correctly, including RFC 5987 language-tagged values and explicit GB18030, Shift_JIS, and EUC-KR charset labels. Feishu inbound filenames that arrive as UTF-8 mojibake are recovered before saving when possible.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS / local Codex worktree
  • Runtime/container: Node 22.16.0 via corepack pnpm
  • Model/provider: N/A
  • Integration/channel (if any): media fetch, Feishu filename handling
  • Relevant config (redacted): N/A

Steps

  1. Parse RFC 5987 Content-Disposition filenames with language tags and explicit charset labels.
  2. Fetch remote media with quoted Windows-style header filenames.
  3. Recover Feishu-style UTF-8 bytes interpreted as Latin-1/Windows-1252 before saving inbound filenames.

Expected

  • Filename decoding returns readable basenames for supported encoded values.
  • Remote media fetch uses the shared decoder.
  • Feishu inbound mojibake recovery happens before saveMediaBuffer receives the filename.

Actual

Matches expected in targeted tests.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Targeted verification:

corepack pnpm exec vitest run --config vitest.unit.config.ts src/media/filename.test.ts src/media/fetch.test.ts
corepack pnpm exec oxfmt --check src/media/filename.ts src/media/filename.test.ts src/media/fetch.ts src/media/fetch.test.ts src/plugin-sdk/feishu.ts extensions/feishu/src/bot.ts

corepack pnpm tsgo was also run. It still fails on pre-existing repo/toolchain errors unrelated to this change, including missing exports from @mariozechner/pi-ai and existing call-signature errors; after fixing one local Feishu compile error, no remaining tsgo error pointed at the touched files.

Human Verification (required)

  • Verified scenarios: RFC 5987 filename* priority, language tags, GB18030 / Shift_JIS / EUC-KR decoding, Windows path basename reduction, Latin-1/Windows-1252 mojibake recovery, remote media fetch integration.
  • Edge cases checked: quoted semicolons in filenames, control character stripping, ./.. rejection, normal Latin-1 text fallback.
  • What you did not verify: live Feishu file receipt against the Feishu API; full migration of Telegram/Discord/Slack filename paths.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert this PR.
  • Files/config to restore: src/media/filename.ts, src/media/fetch.ts, src/plugin-sdk/feishu.ts, extensions/feishu/src/bot.ts, and related tests.
  • Known bad symptoms reviewers should watch for: unexpected filename recovery for unusual Latin-1-only filenames, or plugin SDK export bundling issues.

Risks and Mitigations

Real behavior proof

  • Behavior or issue addressed: Remote media filename decoding now handles RFC 5987 language-tagged filenames, mojibake plain filenames, and Windows-style path basenames before exposing/saving the filename.
  • Real environment tested: macOS local OpenClaw worktree, Node v22.16.0 via corepack pnpm, PR branch rebased onto openclaw/main 5955f35.
  • Exact steps or command run after this patch: Ran a real Node/tsx command against OpenClaw's media filename decoder and response-save path: node --import tsx --input-type=module importing src/media/filename.ts and src/media/fetch.ts, constructing a real Response, and calling saveResponseMedia.
  • Evidence after fix: Terminal output from the command above:
{
  "rfc5987Decoded": "✓-report.txt",
  "mojibakeDecoded": "中国银行.pdf",
  "savedFileName": "photo.txt",
  "savedContentType": "text/plain",
  "savedBytes": 2
}
  • Observed result after fix: The RFC 5987 header decoded to ✓-report.txt, the mojibake filename recovered to 中国银行.pdf, and the response save path reduced C:\temp\photo.txt to photo.txt while preserving text/plain content.
  • What was not tested: Live Feishu API file receipt against a real Feishu tenant; this remains covered by the plugin unit test harness and should be validated separately with channel credentials if needed.

@openclaw-barnacle openclaw-barnacle Bot added channel: feishu Channel integration: feishu size: M labels Apr 25, 2026
@szupzj18 szupzj18 force-pushed the codex/fix-filename-decoding branch from cf186c2 to 4fadbb9 Compare April 25, 2026 09:55
@szupzj18 szupzj18 marked this pull request as ready for review April 25, 2026 11:21
@greptile-apps

greptile-apps Bot commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a shared src/media/filename.ts utility for robust Content-Disposition filename decoding (RFC 5987, multi-charset, mojibake recovery), wires fetchRemoteMedia through it, and applies Feishu-specific mojibake recovery to inbound filenames before saveMediaBuffer.

  • P1 – missing path-traversal sanitisation in normalizeFeishuFileName: the helper calls recoverLatin1Utf8Mojibake but never basenameFromUntrustedFilename. A Feishu message with file_name: "../../secret" passes through unchanged, while the parallel decodeContentDispositionFilename correctly strips traversal components. Adding basenameFromUntrustedFilename to normalizeFeishuFileName (or its call sites) closes the gap.

Confidence Score: 3/5

Not safe to merge without addressing the missing path-sanitisation in normalizeFeishuFileName.

A P1 security-relevant finding (path traversal not sanitised in Feishu inbound filenames) prevents a high-confidence score. The core filename.ts utility and fetch.ts refactor are clean and well-tested; the gap is isolated to the Feishu integration layer.

extensions/feishu/src/bot-content.tsnormalizeFeishuFileName needs basenameFromUntrustedFilename.

Security Review

  • Path traversal in Feishu inbound filenames (extensions/feishu/src/bot-content.ts): normalizeFeishuFileName calls only recoverLatin1Utf8Mojibake and does not apply basenameFromUntrustedFilename. A Feishu-supplied file_name containing ../../ sequences reaches saveMediaBuffer unsanitized. The sibling decodeContentDispositionFilename correctly applies basenameFromUntrustedFilename; the same guard is missing here.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/feishu/src/bot-content.ts
Line: 44-50

Comment:
**`normalizeFeishuFileName` skips path-traversal sanitization**

`recoverLatin1Utf8Mojibake` corrects encoding but does not strip path separators or `..` components. A Feishu message carrying `../../secret.txt` as `file_name` would pass through unchanged and reach `saveMediaBuffer` with the traversal intact. `basenameFromUntrustedFilename` was added in this PR for exactly this purpose and is already used inside `decodeContentDispositionFilename`, but is not called here.

```suggestion
function normalizeFeishuFileName(fileName: unknown): string | undefined {
  if (typeof fileName !== "string") {
    return undefined;
  }
  const trimmed = fileName.trim();
  if (!trimmed) {
    return undefined;
  }
  return basenameFromUntrustedFilename(recoverLatin1Utf8Mojibake(trimmed));
}
```

(Also needs `import { basenameFromUntrustedFilename } from "openclaw/plugin-sdk/feishu"` or a direct import from the media module, whichever the project convention is.)

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/feishu/src/bot-content.ts
Line: 455-458

Comment:
**Redundant double-normalization of `mediaKeys.fileName`**

`mediaKeys.fileName` is the return value of `parseMediaKeys`, which already calls `normalizeFeishuFileName` on every branch (lines 307–314). Wrapping it again in `normalizeFeishuFileName` at this call site is a no-op (idempotent because `recoverLatin1Utf8Mojibake` short-circuits when no Latin-1 supplement chars remain), but it is misleading and wastes a regex test per call.

```suggestion
      normalizeFeishuFileName(result.fileName) || mediaKeys.fileName,
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "media: add shared filename decoder" | Re-trigger Greptile

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fadbb92e7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/feishu/src/bot-content.ts Outdated
Comment thread extensions/feishu/src/bot-content.ts
Comment thread extensions/feishu/src/bot-content.ts Outdated
@szupzj18 szupzj18 force-pushed the codex/fix-filename-decoding branch from 4fadbb9 to 0be9dc7 Compare April 25, 2026 11:45
@clawsweeper

clawsweeper Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Latest ClawSweeper review: 2026-05-21 18:17 UTC / May 21, 2026, 2:17 PM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds src/media/filename.ts, routes remote media Content-Disposition parsing through it, and adds Feishu filename normalization plus focused tests.

Reproducibility: yes. by source inspection. PR head still lets Feishu JSON download metadata reach storage unchanged before the fallback wrapper runs, and current-main tests pin that mojibake path.

PR rating
Overall: 🧂 unranked krab
Proof: 🦪 silver shellfish
Patch quality: 🦐 gold shrimp
Summary: The PR is useful but not merge-ready because Feishu behavior remains partially unfixed and the supplied proof does not cover that path.

Rank-up moves:

  • Move Feishu filename recovery to the metadata/save boundary and add focused JSON metadata plus fallback-name tests.
  • Add redacted terminal output, logs, or a recording showing the patched Feishu receive/save path after the fix; redact private identifiers, endpoints, phone numbers, and credentials.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs stronger real behavior proof before merge: The PR supplies terminal proof for the core decoder and synthetic response-save path, but it does not exercise the changed Feishu inbound receive/save path and the source path still has a recovery gap. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

Maintainer options:

  1. Fix The Feishu Metadata Save Boundary (recommended)
    Apply the same conservative filename recovery before saveMessageResourceWithType writes media, and add focused tests for JSON metadata plus fallback names.
  2. Narrow This PR To Core Remote Decoding
    Maintainers could remove or defer the Feishu receive behavior from this PR and keep the distinct JSON metadata bug on [Bug]: Feishu inbound JSON file_name CJK mojibake (distinct from Content-Disposition path) #81103.
  3. Pause Behind The Filename Cluster
    If the decoder ownership and interaction with fix(media): strip control and bidi characters from outbound filenames #72973 are unsettled, pause this PR until the shared filename plan is ordered under feat: centralized filename encoding utility for multi-encoding Content-Disposition handling #48788.

Next step before merge
The external PR needs a contributor code revision plus stronger real-behavior proof; ClawSweeper should not queue automated repair while the proof gate is still insufficient.

Security
Cleared: No concrete security or supply-chain regression was found; the diff adds parsing/normalization logic and tests without new dependencies, scripts, permissions, secrets handling, or network calls.

Review findings

  • [P2] Normalize before Feishu saves metadata filenames — extensions/feishu/src/bot-content.ts:430
Review details

Best possible solution:

Keep the core decoder direction, but move Feishu recovery/sanitization to the metadata/save boundary with regression tests for JSON metadata, fallback payload names, and valid Latin-1 preservation before merge.

Do we have a high-confidence way to reproduce the issue?

Yes, by source inspection. PR head still lets Feishu JSON download metadata reach storage unchanged before the fallback wrapper runs, and current-main tests pin that mojibake path.

Is this the best way to solve the issue?

No, not yet. The shared decoder is a reasonable direction, but the Feishu bug needs recovery at the metadata/save boundary rather than only around the post-download fallback wrapper.

Label justifications:

  • P2: This is a normal-priority media filename correctness fix with real Feishu and remote media impact but limited blast radius.
  • merge-risk: 🚨 compatibility: The PR changes filename decoding heuristics and can rename values that existing users currently see literally.
  • rating: 🧂 unranked krab: Current PR rating is 🧂 unranked krab because proof is 🦪 silver shellfish, patch quality is 🦐 gold shrimp, and The PR is useful but not merge-ready because Feishu behavior remains partially unfixed and the supplied proof does not cover that path.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: The PR supplies terminal proof for the core decoder and synthetic response-save path, but it does not exercise the changed Feishu inbound receive/save path and the source path still has a recovery gap. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Full review comments:

  • [P2] Normalize before Feishu saves metadata filenames — extensions/feishu/src/bot-content.ts:430
    normalizeFeishuFileName only runs in the fallback saveMediaBuffer branch here. In the real runtime path, saveMessageResourceFeishu returns { saved } after extensions/feishu/src/media.ts has already selected meta.fileName ?? originalFilename and written the file, while extractFeishuDownloadMetadata still returns JSON file_name/fileName unchanged. Feishu download metadata mojibake therefore bypasses this PR; move the recovery/sanitization to the metadata/save boundary and cover JSON metadata plus fallback names.
    Confidence: 0.91

Overall correctness: patch is incorrect
Overall confidence: 0.87

What I checked:

Likely related people:

  • steipete: Git history shows repeated recent work on Feishu media, shared media fetch, guarded fetch, and media pipeline helpers adjacent to this filename path. (role: recent area contributor; confidence: high; commits: fb40b09157d7, 65aac6494a74, 81c68f582d4a; files: extensions/feishu/src/media.ts, extensions/feishu/src/bot-content.ts, src/media/fetch.ts)
  • Takhoffman: Commit history includes Feishu media hardening and test work in the same plugin surface. (role: recent Feishu media hardening contributor; confidence: medium; commits: 3c6a49b27ea0, bc66a8fa81a8; files: extensions/feishu/src/media.ts, extensions/feishu/src/bot.test.ts)
  • luoyanglang: Current-main blame for the remote filename parser and Feishu metadata extraction/save lines points to this commit, though the repository history appears compacted around that import. (role: current implementation provenance; confidence: low; commits: 248169b646e6; files: src/media/fetch.ts, extensions/feishu/src/media.ts, extensions/feishu/src/bot-content.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 7d5afcbb3f83.

Copy link
Copy Markdown
Author

Status update after addressing automated review feedback:

This PR is intended as the first scoped fix for #48788, plus the related #48388 filename decoding behavior. It fixes the shared media download path and Feishu inbound filename handling without trying to migrate every remaining channel-specific filename path in one PR.

What was fixed in the latest pushed head:

  • src/media/filename.ts provides the shared filename decoding/sanitization helper used by fetchRemoteMedia.
  • extensions/feishu/src/bot-content.ts now applies Feishu mojibake recovery and basename sanitization before saving inbound filenames.
  • The Greptile P1 path traversal concern is addressed with basenameFromUntrustedFilename(recoverLatin1Utf8Mojibake(trimmed)).
  • The redundant mediaKeys.fileName double-normalization was removed.
  • The Codex P1 undefined media.fileName reference in the embedded post image path was fixed.

Verification run locally before push:

  • Feishu targeted Vitest coverage passed, including regressions for traversal-style filenames and embedded post media filename fallback.
  • Channel import guardrail tests passed.
  • Extension prod/test type checks passed.
  • Extension package compile boundary check passed.
  • Formatting check and git diff --check passed.

Remaining scope:

#48788 should stay open after this PR unless maintainers want to expand this change. Remaining channel filename migrations and any broader SDK/API documentation or baseline alignment can be handled as follow-up work under #48788.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 20, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 21, 2026
@szupzj18 szupzj18 force-pushed the codex/fix-filename-decoding branch from 0be9dc7 to 9c2a800 Compare May 21, 2026 17:16
@RomneyDa

Copy link
Copy Markdown
Member

Heads up: this PR needs to be updated against current main before the new required Dependency Guard check can pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: feishu Channel integration: feishu merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants