Skip to content

security(media): anchor sanitizeMimeType regex to reject malformed input#68456

Closed
bluesky6868 wants to merge 1 commit into
openclaw:mainfrom
security-for-ai-agent:fix/sanitize-mime-type-anchor
Closed

security(media): anchor sanitizeMimeType regex to reject malformed input#68456
bluesky6868 wants to merge 1 commit into
openclaw:mainfrom
security-for-ai-agent:fix/sanitize-mime-type-anchor

Conversation

@bluesky6868

Copy link
Copy Markdown

Summary

  • Problem: sanitizeMimeType in src/media-understanding/apply.ts matches MIME types with an unanchored regex, so malformed input silently truncates to a valid-looking prefix. Example: "image/png junk""image/png"; "image/png..etc/passwd""image/png..etc" (the char class includes ., -, _).
  • Why it matters: the returned prefix is then checked against allowedMimes at apply.ts:427. Malformed input can coerce to a known-good MIME and pass the allowlist, defeating the intent of strict validation at the media-ingest boundary. Maintainers labeled this security.
  • What changed: anchor the regex at end-of-string and accept standard MIME parameters ((?:;.*)?$). Export the helper for focused unit tests.
  • What did NOT change: callers, callsite, allowlist semantics, or any other MIME normalizer.

Change Type

  • Security hardening

Scope

  • Skills / tool execution (media-understanding inbound path)

Linked Issue/PR

Root Cause

  • Root cause: the regex /^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)/ had no end anchor. On malformed input, the first-match-wins behavior captured the leading valid-looking substring and discarded the rest, producing a MIME string that the downstream allowlist could accept.
  • Missing detection / guardrail: strict end-anchor on the MIME regex at an input boundary.
  • Contributing context: the helper is called on values already normalized by normalizeMimeType (src/media/mime.ts strips ; and lowercases), so valid inputs are unaffected. Only malformed inputs change behavior.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
  • Target test or file: src/media-understanding/apply.test.ts (new describe("sanitizeMimeType") block)
  • Scenario the test should lock in: anchor rejects "image/png junk", "image/png\nextra", "image/png/../etc/passwd", "image/png/evil"; still accepts plain MIMEs and standard ; param parameters; still lowercases mixed case; still returns undefined on empty/whitespace/missing input.
  • Why this is the smallest reliable guardrail: the function is pure and self-contained; unit tests pin its exact behavior without needing the full media pipeline.

User-visible / Behavior Changes

Malformed MIME hints that previously silently truncated now produce undefined, which the existing code path at apply.ts:412-416 already handles by skipping the attachment with a "file attachment skipped (unknown mime)" verbose-log line. No config change required; no user-facing surface touched.

Security Impact

  • New permissions/capabilities? No
  • Change in trust boundary? No (tightens an existing input-validation check)
  • Secrets/credentials touched? No

Verification

Gates run locally on macOS Darwin 24.6.0:

  • pnpm test src/media-understanding/apply.test.ts -t sanitizeMimeType — 7 passed
  • pnpm test src/media-understanding/apply.test.ts — 48 passed
  • pnpm test 'src/media-understanding/**/*.test.ts' — 138 passed across 20 files
  • pnpm check — formatter, oxlint (0 warnings / 0 errors across 11710 files), tsgo, import-cycle + madge checks all green
  • pnpm build — clean (touches an exported surface, so build is run as a hard gate per CLAUDE.md)

Human Verification

  • Verified scenarios: round-tripped representative inputs against both the old and new regex to confirm the anchor is the only behavioral delta; confirmed current callsite at apply.ts:405 already handles undefined return.
  • Edge cases checked: trailing whitespace/newlines, path-like suffixes sharing the allowed char class (., -, _), vendor-suffix MIMEs (application/vnd.api+json), ; charset= parameters.
  • What I did not verify: no changes to src/media/mime.ts::normalizeMimeType or src/media/input-files.ts::normalizeMimeType; those are out of scope for [Security] sanitizeMimeType regex should be anchored and case-insensitive #9795.

Review Conversations

  • I will reply to or resolve every bot review conversation I address in this PR.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: a benign input the existing allowlist accepted (via prefix truncation) now returns undefined and the attachment is skipped.
    • Mitigation: the truncation-to-allowed path only matters when input does not already match the strict shape; valid MIMEs (including ; param) and internal hint constants all still match. Existing tests covering real MIMEs still pass (138 tests green).

@greptile-apps

greptile-apps Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds an end anchor ($) and an optional MIME-parameter clause ((?:;.*)?) to the sanitizeMimeType regex in src/media-understanding/apply.ts, preventing malformed input from silently truncating to an allowlist-passing prefix. The function is exported and covered by a new focused describe("sanitizeMimeType") block in apply.test.ts.

Confidence Score: 5/5

This PR is safe to merge; it tightens an input-validation check without altering any callsite or allowlist semantics.

The regex change is minimal and correct: end-anchoring stops prefix truncation, (?:;.*) preserves valid MIME-with-parameter inputs, and normalizeOptionalLowercaseString trims whitespace before the regex runs so no valid MIME type that was previously accepted is newly rejected. All seven new unit tests accurately pin the behavioral contract, and the PR confirms build + full suite green.

No files require special attention.

Reviews (1): Last reviewed commit: "security(media): anchor sanitizeMimeType..." | Re-trigger Greptile

@prtags

prtags Bot commented Apr 23, 2026

Copy link
Copy Markdown

Related work from PRtags group charming-marten-atlb

Title: Open PR duplicate: sanitizeMimeType regex anchoring (#9795)

Number Title
#68225 fix(media): anchor sanitizeMimeType regex and reject trailing garbage (#9795)
#68456* security(media): anchor sanitizeMimeType regex to reject malformed input

* This PR

@clawsweeper

clawsweeper Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Closing this as duplicate or superseded after Codex automated review.

PR #68456 should close as duplicate/superseded. Current main still has the unanchored sanitizer, so the #9795 hardening is not implemented yet, but this PR duplicates the same remaining work already tracked by open PR #68225 and also overlaps older open PR #61016.

Best possible solution:

Close #68456 as duplicate and consolidate the #9795 sanitizer hardening in one existing open implementation path, preferably the directly PRtags-linked #68225 because it carries focused regression tests, while keeping #9795 open until a selected fix lands.

What I checked:

So I’m closing this here and keeping the remaining discussion on the canonical linked item.

Codex Review notes: model gpt-5.5, reasoning high; reviewed against d54d2d6b9b8a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] sanitizeMimeType regex should be anchored and case-insensitive

1 participant