fix(media): anchor sanitizeMimeType regex and make case-insensitive per RFC 2045#61016
fix(media): anchor sanitizeMimeType regex and make case-insensitive per RFC 2045#61016shamsulalam1114 wants to merge 5 commits into
Conversation
Greptile SummaryThis PR tightens the Confidence Score: 5/5Safe to merge; the regex fix is correct and the only finding is a minor changelog placement style issue. All remaining findings are P2 (changelog entry placement). The code change itself is correct and an improvement — no logic, security, or runtime issues were found. CHANGELOG.md — entry should be appended to end of section, not inserted at the top. Prompt To Fix All With AIThis is a comment left during a code review.
Path: CHANGELOG.md
Line: 13
Comment:
**Changelog entry placed at top of section instead of bottom**
Per `CLAUDE.md`, new entries must be appended to the *end* of the target section — not inserted at the top. This entry was added as the first item in `### Changes` (line 13), but the section already has entries running through line 36. Move it to after line 36 (just before `### Fixes`).
Additionally, since this is a bug fix (the PR title uses `fix(media):`), consider placing it in `### Fixes` instead of `### Changes`, appended after the last existing fix entry.
**Context Used:** CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(media): anchor sanitizeMimeType rege..." | Re-trigger Greptile |
|
|
||
| ### Changes | ||
|
|
||
| - Media: fix `sanitizeMimeType` regex to anchor at end (`$`), handle optional MIME parameters (`; charset=...`), and apply case-insensitive matching per RFC 2045, preventing potential MIME type bypass via malicious suffix. (#9795) Thanks @shamsulalam1114. |
There was a problem hiding this comment.
Changelog entry placed at top of section instead of bottom
Per CLAUDE.md, new entries must be appended to the end of the target section — not inserted at the top. This entry was added as the first item in ### Changes (line 13), but the section already has entries running through line 36. Move it to after line 36 (just before ### Fixes).
Additionally, since this is a bug fix (the PR title uses fix(media):), consider placing it in ### Fixes instead of ### Changes, appended after the last existing fix entry.
Context Used: CLAUDE.md (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: CHANGELOG.md
Line: 13
Comment:
**Changelog entry placed at top of section instead of bottom**
Per `CLAUDE.md`, new entries must be appended to the *end* of the target section — not inserted at the top. This entry was added as the first item in `### Changes` (line 13), but the section already has entries running through line 36. Move it to after line 36 (just before `### Fixes`).
Additionally, since this is a bug fix (the PR title uses `fix(media):`), consider placing it in `### Fixes` instead of `### Changes`, appended after the last existing fix entry.
**Context Used:** CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e7f48c93b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| ### Changes | ||
|
|
||
| - Media: fix `sanitizeMimeType` regex to anchor at end (`$`), handle optional MIME parameters (`; charset=...`), and apply case-insensitive matching per RFC 2045, preventing potential MIME type bypass via malicious suffix. (#9795) Thanks @shamsulalam1114. |
There was a problem hiding this comment.
Append changelog item at end of active Changes block
This new changelog line is inserted at the top of ## Unreleased → ### Changes, but repository guidance requires new entries to be appended to the end of that section. Keeping insertion order consistent avoids churn/conflicts in a heavily edited changelog and preserves the project’s documented release-note workflow.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR updates MIME type sanitization in src/media-understanding/apply.ts to better validate and normalize attachment MIME types (motivated by a reported security/input-validation issue), and records the change in the changelog.
Changes:
- Update
sanitizeMimeTypeto use an end-anchored, case-insensitive regex and lowercase the extracted MIME type after matching. - Allow optional MIME parameters (e.g.
; charset=utf-8) in the sanitizer’s accepted input. - Add an Unreleased changelog entry describing the fix (#9795).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/media-understanding/apply.ts | Tightens sanitizeMimeType matching (end-anchored + case-insensitive) and changes normalization order. |
| CHANGELOG.md | Documents the MIME sanitizer change and its intended security impact. |
| // Anchor regex at end ($) and use case-insensitive flag per RFC 2045; | ||
| // also allow optional parameters (e.g. "; charset=utf-8") before the anchor. | ||
| const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:;.*)?$/i); |
There was a problem hiding this comment.
The new pattern (?:;.*)?$ is extremely permissive and still accepts values like image/jpeg; malicious-suffix<script> (the example called out in the PR/issue), returning image/jpeg. If the goal is to reject non-parameter junk, consider validating parameters more strictly (token/quoted-string) or only allowing a constrained parameter grammar instead of .* (or alternatively update the security rationale/changelog to reflect that arbitrary ;... tails are accepted).
| // Anchor regex at end ($) and use case-insensitive flag per RFC 2045; | |
| // also allow optional parameters (e.g. "; charset=utf-8") before the anchor. | |
| const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:;.*)?$/i); | |
| // Anchor regex at end ($) and use case-insensitive flag per RFC 2045. | |
| // Only allow optional MIME parameters of the form `; attribute=value`, | |
| // where the value is either a token or a quoted-string. | |
| const token = "[a-z0-9!#$&^_.+-]+"; | |
| const quotedString = "\"(?:[^\"\\\\\\r\\n]|\\\\.)*\""; | |
| const parameter = `(?:\\s*;\\s*${token}\\s*=\\s*(?:${token}|${quotedString}))`; | |
| const mimeTypePattern = new RegExp(`^(${token}\\/${token})(?:${parameter})*$`, "i"); | |
| const match = trimmed.match(mimeTypePattern); |
| function sanitizeMimeType(value?: string): string | undefined { | ||
| if (!value) { | ||
| return undefined; | ||
| } | ||
| const trimmed = value.trim().toLowerCase(); | ||
| const trimmed = value.trim(); | ||
| if (!trimmed) { | ||
| return undefined; | ||
| } | ||
| const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)/); | ||
| return match?.[1]; | ||
| // Anchor regex at end ($) and use case-insensitive flag per RFC 2045; | ||
| // also allow optional parameters (e.g. "; charset=utf-8") before the anchor. | ||
| const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:;.*)?$/i); | ||
| return match?.[1]?.toLowerCase(); |
There was a problem hiding this comment.
No tests were added for the updated sanitizeMimeType behavior (end-anchor, case-insensitivity, and parameter handling). Since this function gates which attachment MIME types are accepted, it would be good to add/extend unit coverage (e.g., Image/JPEG, trailing junk without ;, and a couple of valid ; charset=utf-8 cases) to prevent regressions.
There was a problem hiding this comment.
No tests were added for the updated
sanitizeMimeTypebehavior (end-anchor, case-insensitivity, and parameter handling). Since this function gates which attachment MIME types are accepted, it would be good to add/extend unit coverage (e.g.,Image/JPEG, trailing junk without;, and a couple of valid; charset=utf-8cases) to prevent regressions.
sanitizeMimeType is a private function and can't be directly unit-tested without extracting/exporting it. Its behavior is covered through the existing integration path — the "transcribes WhatsApp audio with parameterized MIME despite casing/whitespace" test in apply.test.ts already exercises parameterized MIME normalization through the same pipeline. Happy to extract it into a dedicated unit-testable helper in a follow-up if maintainers prefer.
|
|
||
| ### Changes | ||
|
|
||
| - Media: fix `sanitizeMimeType` regex to anchor at end (`$`), handle optional MIME parameters (`; charset=...`), and apply case-insensitive matching per RFC 2045, preventing potential MIME type bypass via malicious suffix. (#9795) Thanks @shamsulalam1114. |
There was a problem hiding this comment.
This changelog entry claims the change prevents MIME type bypass via a malicious suffix, but sanitizeMimeType currently allows arbitrary content after a ; (via (?:;.*)?$). Either tighten parameter parsing so obviously-invalid parameter tails are rejected, or adjust the wording to avoid overstating the protection provided.
| - Media: fix `sanitizeMimeType` regex to anchor at end (`$`), handle optional MIME parameters (`; charset=...`), and apply case-insensitive matching per RFC 2045, preventing potential MIME type bypass via malicious suffix. (#9795) Thanks @shamsulalam1114. | |
| - Media: fix `sanitizeMimeType` regex to anchor at end (`$`), handle optional MIME parameters (`; charset=...`), and apply case-insensitive matching per RFC 2045, avoiding matches on trailing non-parameter suffixes while preserving valid parameterized MIME types. (#9795) Thanks @shamsulalam1114. |
|
The CI failures are pre-existing and unrelated to this PR. This change only modifies a single regex function in
None of these are in or near the MIME sanitizer code. Happy to rebase on a newer |
|
Re: CI failures — All failing checks in this run are pre-existing on openclaw/main and unrelated to this PR: vite_ssr_import_9 — Vite SSR hoisting bug in facade-runtime.ts (affects all Discord/Matrix/iMessage/Slack tests) |
|
This is superseded by #73229, which has landed as the canonical ProjectClownfish fix path for this cluster. Closing this now that the validated fix is merged. If this still reproduces on current main with a different path, reply here and we can reopen or split it back out. |
Summary
Fixes a bug in
sanitizeMimeType(insrc/media-understanding/apply.ts) where the regex:image/jpeg; malicious-suffix<script>to silently pass through, since the regex only matched from the start and discarded the rest without validating the full string.Image/JPEGwould previously fail to match..toLowerCase()to after the match, which is the correct order for case-insensitive matching with theiflag.Changes
src/media-understanding/apply.ts: fixsanitizeMimeTyperegex$end anchor(?:;.*)?to allow optional MIME parameters (e.g.; charset=utf-8) before the anchoriflag for case-insensitive matching per RFC 2045.toLowerCase()to return value instead of pre-processingVerification
Closes [Security] sanitizeMimeType regex should be anchored and case-insensitive #9795