Skip to content

fix(media): anchor sanitizeMimeType regex and make case-insensitive per RFC 2045#61016

Closed
shamsulalam1114 wants to merge 5 commits into
openclaw:mainfrom
shamsulalam1114:fix/issue-9795-sanitize-mime-type-regex
Closed

fix(media): anchor sanitizeMimeType regex and make case-insensitive per RFC 2045#61016
shamsulalam1114 wants to merge 5 commits into
openclaw:mainfrom
shamsulalam1114:fix/issue-9795-sanitize-mime-type-regex

Conversation

@shamsulalam1114

Copy link
Copy Markdown

Summary

Fixes a bug in sanitizeMimeType (in src/media-understanding/apply.ts) where the regex:

  1. Lacked a $ anchor - allowing a value like image/jpeg; malicious-suffix<script> to silently pass through, since the regex only matched from the start and discarded the rest without validating the full string.
    1. Was not case-insensitive - per RFC 2045 Section 5.1, MIME type names and parameter names are case-insensitive. A value like Image/JPEG would previously fail to match.
    1. Lowercased before matching - the fix moves .toLowerCase() to after the match, which is the correct order for case-insensitive matching with the i flag.

Changes

  • src/media-understanding/apply.ts: fix sanitizeMimeType regex
    • Add $ end anchor
    • Add (?:;.*)? to allow optional MIME parameters (e.g. ; charset=utf-8) before the anchor
    • Add i flag for case-insensitive matching per RFC 2045
    • Move .toLowerCase() to return value instead of pre-processing

Verification

Copilot AI review requested due to automatic review settings April 4, 2026 19:46
@greptile-apps

greptile-apps Bot commented Apr 4, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR tightens the sanitizeMimeType regex in src/media-understanding/apply.ts by adding a $ end anchor, an optional (?:;.*)? parameter group, and the i case-insensitive flag per RFC 2045. The code change is correct: the function already only returned match?.[1] (the bare MIME type), so no malicious suffix could leak — but the $ anchor and i flag are still meaningful improvements to validation strictness and RFC compliance. The only finding is a P2 changelog placement issue (entry inserted at the top of ### Changes rather than appended to the end per project guidelines).

Confidence Score: 5/5

Safe to merge; the regex fix is correct and the only finding is a minor changelog placement style issue.

All remaining findings are P2 (changelog entry placement). The code change itself is correct and an improvement — no logic, security, or runtime issues were found.

CHANGELOG.md — entry should be appended to end of section, not inserted at the top.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: CHANGELOG.md
Line: 13

Comment:
**Changelog entry placed at top of section instead of bottom**

Per `CLAUDE.md`, new entries must be appended to the *end* of the target section — not inserted at the top. This entry was added as the first item in `### Changes` (line 13), but the section already has entries running through line 36. Move it to after line 36 (just before `### Fixes`).

Additionally, since this is a bug fix (the PR title uses `fix(media):`), consider placing it in `### Fixes` instead of `### Changes`, appended after the last existing fix entry.

**Context Used:** CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(media): anchor sanitizeMimeType rege..." | Re-trigger Greptile

Comment thread CHANGELOG.md Outdated

### Changes

- Media: fix `sanitizeMimeType` regex to anchor at end (`$`), handle optional MIME parameters (`; charset=...`), and apply case-insensitive matching per RFC 2045, preventing potential MIME type bypass via malicious suffix. (#9795) Thanks @shamsulalam1114.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Changelog entry placed at top of section instead of bottom

Per CLAUDE.md, new entries must be appended to the end of the target section — not inserted at the top. This entry was added as the first item in ### Changes (line 13), but the section already has entries running through line 36. Move it to after line 36 (just before ### Fixes).

Additionally, since this is a bug fix (the PR title uses fix(media):), consider placing it in ### Fixes instead of ### Changes, appended after the last existing fix entry.

Context Used: CLAUDE.md (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: CHANGELOG.md
Line: 13

Comment:
**Changelog entry placed at top of section instead of bottom**

Per `CLAUDE.md`, new entries must be appended to the *end* of the target section — not inserted at the top. This entry was added as the first item in `### Changes` (line 13), but the section already has entries running through line 36. Move it to after line 36 (just before `### Fixes`).

Additionally, since this is a bug fix (the PR title uses `fix(media):`), consider placing it in `### Fixes` instead of `### Changes`, appended after the last existing fix entry.

**Context Used:** CLAUDE.md ([source](https://app.greptile.com/review/custom-context?memory=fd949e91-5c3a-4ab5-90a1-cbe184fd6ce8))

How can I resolve this? If you propose a fix, please make it concise.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e7f48c93b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread CHANGELOG.md Outdated

### Changes

- Media: fix `sanitizeMimeType` regex to anchor at end (`$`), handle optional MIME parameters (`; charset=...`), and apply case-insensitive matching per RFC 2045, preventing potential MIME type bypass via malicious suffix. (#9795) Thanks @shamsulalam1114.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Append changelog item at end of active Changes block

This new changelog line is inserted at the top of ## Unreleased### Changes, but repository guidance requires new entries to be appended to the end of that section. Keeping insertion order consistent avoids churn/conflicts in a heavily edited changelog and preserves the project’s documented release-note workflow.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates MIME type sanitization in src/media-understanding/apply.ts to better validate and normalize attachment MIME types (motivated by a reported security/input-validation issue), and records the change in the changelog.

Changes:

  • Update sanitizeMimeType to use an end-anchored, case-insensitive regex and lowercase the extracted MIME type after matching.
  • Allow optional MIME parameters (e.g. ; charset=utf-8) in the sanitizer’s accepted input.
  • Add an Unreleased changelog entry describing the fix (#9795).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/media-understanding/apply.ts Tightens sanitizeMimeType matching (end-anchored + case-insensitive) and changes normalization order.
CHANGELOG.md Documents the MIME sanitizer change and its intended security impact.

Comment thread src/media-understanding/apply.ts Outdated
Comment on lines +81 to +83
// Anchor regex at end ($) and use case-insensitive flag per RFC 2045;
// also allow optional parameters (e.g. "; charset=utf-8") before the anchor.
const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:;.*)?$/i);

Copilot AI Apr 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new pattern (?:;.*)?$ is extremely permissive and still accepts values like image/jpeg; malicious-suffix<script> (the example called out in the PR/issue), returning image/jpeg. If the goal is to reject non-parameter junk, consider validating parameters more strictly (token/quoted-string) or only allowing a constrained parameter grammar instead of .* (or alternatively update the security rationale/changelog to reflect that arbitrary ;... tails are accepted).

Suggested change
// Anchor regex at end ($) and use case-insensitive flag per RFC 2045;
// also allow optional parameters (e.g. "; charset=utf-8") before the anchor.
const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:;.*)?$/i);
// Anchor regex at end ($) and use case-insensitive flag per RFC 2045.
// Only allow optional MIME parameters of the form `; attribute=value`,
// where the value is either a token or a quoted-string.
const token = "[a-z0-9!#$&^_.+-]+";
const quotedString = "\"(?:[^\"\\\\\\r\\n]|\\\\.)*\"";
const parameter = `(?:\\s*;\\s*${token}\\s*=\\s*(?:${token}|${quotedString}))`;
const mimeTypePattern = new RegExp(`^(${token}\\/${token})(?:${parameter})*$`, "i");
const match = trimmed.match(mimeTypePattern);

Copilot uses AI. Check for mistakes.
Comment on lines 73 to +84
function sanitizeMimeType(value?: string): string | undefined {
if (!value) {
return undefined;
}
const trimmed = value.trim().toLowerCase();
const trimmed = value.trim();
if (!trimmed) {
return undefined;
}
const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)/);
return match?.[1];
// Anchor regex at end ($) and use case-insensitive flag per RFC 2045;
// also allow optional parameters (e.g. "; charset=utf-8") before the anchor.
const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:;.*)?$/i);
return match?.[1]?.toLowerCase();

Copilot AI Apr 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests were added for the updated sanitizeMimeType behavior (end-anchor, case-insensitivity, and parameter handling). Since this function gates which attachment MIME types are accepted, it would be good to add/extend unit coverage (e.g., Image/JPEG, trailing junk without ;, and a couple of valid ; charset=utf-8 cases) to prevent regressions.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests were added for the updated sanitizeMimeType behavior (end-anchor, case-insensitivity, and parameter handling). Since this function gates which attachment MIME types are accepted, it would be good to add/extend unit coverage (e.g., Image/JPEG, trailing junk without ;, and a couple of valid ; charset=utf-8 cases) to prevent regressions.

sanitizeMimeType is a private function and can't be directly unit-tested without extracting/exporting it. Its behavior is covered through the existing integration path — the "transcribes WhatsApp audio with parameterized MIME despite casing/whitespace" test in apply.test.ts already exercises parameterized MIME normalization through the same pipeline. Happy to extract it into a dedicated unit-testable helper in a follow-up if maintainers prefer.

Comment thread CHANGELOG.md Outdated

### Changes

- Media: fix `sanitizeMimeType` regex to anchor at end (`$`), handle optional MIME parameters (`; charset=...`), and apply case-insensitive matching per RFC 2045, preventing potential MIME type bypass via malicious suffix. (#9795) Thanks @shamsulalam1114.

Copilot AI Apr 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changelog entry claims the change prevents MIME type bypass via a malicious suffix, but sanitizeMimeType currently allows arbitrary content after a ; (via (?:;.*)?$). Either tighten parameter parsing so obviously-invalid parameter tails are rejected, or adjust the wording to avoid overstating the protection provided.

Suggested change
- Media: fix `sanitizeMimeType` regex to anchor at end (`$`), handle optional MIME parameters (`; charset=...`), and apply case-insensitive matching per RFC 2045, preventing potential MIME type bypass via malicious suffix. (#9795) Thanks @shamsulalam1114.
- Media: fix `sanitizeMimeType` regex to anchor at end (`$`), handle optional MIME parameters (`; charset=...`), and apply case-insensitive matching per RFC 2045, avoiding matches on trailing non-parameter suffixes while preserving valid parameterized MIME types. (#9795) Thanks @shamsulalam1114.

Copilot uses AI. Check for mistakes.
@shamsulalam1114

Copy link
Copy Markdown
Author

The CI failures are pre-existing and unrelated to this PR. This change only modifies a single regex function in src/media-understanding/apply.ts and CHANGELOG.md. The failing jobs are caused by:

  • ReferenceError: Cannot access '__vite_ssr_import_9__' before initialization in src/plugin-sdk/facade-runtime.ts — a Vite SSR module initialization issue affecting all extension tests
  • Cannot find package 'openclaw/plugin-sdk/secret-input-runtime' — a missing SDK subpath in the Slack extension
  • lint:plugins:plugin-sdk-subpaths-exported — a plugin SDK export lint failure
  • zod-schema.providers-core.ts importing from extensions/telegram/config-api.js — an architecture guardrail violation in core config

None of these are in or near the MIME sanitizer code. Happy to rebase on a newer main if that would help confirm.

@shamsulalam1114

Copy link
Copy Markdown
Author

Re: CI failures — All failing checks in this run are pre-existing on openclaw/main and unrelated to this PR:

vite_ssr_import_9 — Vite SSR hoisting bug in facade-runtime.ts (affects all Discord/Matrix/iMessage/Slack tests)
secret-input-runtime — unregistered Slack SDK subpath (lint:plugins:plugin-sdk-subpaths-exported)
missing bundled channel plugin: slack — Slack not in bundled.ts registry
sanitizeFileNameForUpload is not a function — Feishu export regression
None of these touch src/media-understanding/. Happy to rebase again if needed.

@vincentkoc

Copy link
Copy Markdown
Member

This is superseded by #73229, which has landed as the canonical ProjectClownfish fix path for this cluster.

Closing this now that the validated fix is merged. If this still reproduces on current main with a different path, reply here and we can reopen or split it back out.

@vincentkoc vincentkoc closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper Tracked by ClawSweeper automation size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] sanitizeMimeType regex should be anchored and case-insensitive

3 participants