security(media): anchor sanitizeMimeType regex to reject malformed input by bluesky6868 · Pull Request #68456 · openclaw/openclaw

bluesky6868 · 2026-04-18T07:35:10Z

Summary

Problem: sanitizeMimeType in src/media-understanding/apply.ts matches MIME types with an unanchored regex, so malformed input silently truncates to a valid-looking prefix. Example: "image/png junk" → "image/png"; "image/png..etc/passwd" → "image/png..etc" (the char class includes ., -, _).
Why it matters: the returned prefix is then checked against allowedMimes at apply.ts:427. Malformed input can coerce to a known-good MIME and pass the allowlist, defeating the intent of strict validation at the media-ingest boundary. Maintainers labeled this security.
What changed: anchor the regex at end-of-string and accept standard MIME parameters ((?:;.*)?$). Export the helper for focused unit tests.
What did NOT change: callers, callsite, allowlist semantics, or any other MIME normalizer.

Change Type

Security hardening

Scope

Skills / tool execution (media-understanding inbound path)

Linked Issue/PR

Closes [Security] sanitizeMimeType regex should be anchored and case-insensitive #9795
Related fix(security): anchor MIME sanitization regex and block fullwidth bypass (#9791, #9795) #10257 (prior combined fix was closed; this PR narrows scope to [Security] sanitizeMimeType regex should be anchored and case-insensitive #9795 only)
This PR fixes a bug or regression

Root Cause

Root cause: the regex /^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)/ had no end anchor. On malformed input, the first-match-wins behavior captured the leading valid-looking substring and discarded the rest, producing a MIME string that the downstream allowlist could accept.
Missing detection / guardrail: strict end-anchor on the MIME regex at an input boundary.
Contributing context: the helper is called on values already normalized by normalizeMimeType (src/media/mime.ts strips ; and lowercases), so valid inputs are unaffected. Only malformed inputs change behavior.

Regression Test Plan

Coverage level that should have caught this:
- Unit test
Target test or file: src/media-understanding/apply.test.ts (new describe("sanitizeMimeType") block)
Scenario the test should lock in: anchor rejects "image/png junk", "image/png\nextra", "image/png/../etc/passwd", "image/png/evil"; still accepts plain MIMEs and standard ; param parameters; still lowercases mixed case; still returns undefined on empty/whitespace/missing input.
Why this is the smallest reliable guardrail: the function is pure and self-contained; unit tests pin its exact behavior without needing the full media pipeline.

User-visible / Behavior Changes

Malformed MIME hints that previously silently truncated now produce undefined, which the existing code path at apply.ts:412-416 already handles by skipping the attachment with a "file attachment skipped (unknown mime)" verbose-log line. No config change required; no user-facing surface touched.

Security Impact

New permissions/capabilities? No
Change in trust boundary? No (tightens an existing input-validation check)
Secrets/credentials touched? No

Verification

Gates run locally on macOS Darwin 24.6.0:

pnpm test src/media-understanding/apply.test.ts -t sanitizeMimeType — 7 passed
pnpm test src/media-understanding/apply.test.ts — 48 passed
pnpm test 'src/media-understanding/**/*.test.ts' — 138 passed across 20 files
pnpm check — formatter, oxlint (0 warnings / 0 errors across 11710 files), tsgo, import-cycle + madge checks all green
pnpm build — clean (touches an exported surface, so build is run as a hard gate per CLAUDE.md)

Human Verification

Verified scenarios: round-tripped representative inputs against both the old and new regex to confirm the anchor is the only behavioral delta; confirmed current callsite at apply.ts:405 already handles undefined return.
Edge cases checked: trailing whitespace/newlines, path-like suffixes sharing the allowed char class (., -, _), vendor-suffix MIMEs (application/vnd.api+json), ; charset= parameters.
What I did not verify: no changes to src/media/mime.ts::normalizeMimeType or src/media/input-files.ts::normalizeMimeType; those are out of scope for [Security] sanitizeMimeType regex should be anchored and case-insensitive #9795.

Review Conversations

I will reply to or resolve every bot review conversation I address in this PR.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Risks and Mitigations

Risk: a benign input the existing allowlist accepted (via prefix truncation) now returns undefined and the attachment is skipped.
- Mitigation: the truncation-to-allowed path only matters when input does not already match the strict shape; valid MIMEs (including ; param) and internal hint constants all still match. Existing tests covering real MIMEs still pass (138 tests green).

greptile-apps · 2026-04-18T07:38:01Z

Greptile Summary

Adds an end anchor ($) and an optional MIME-parameter clause ((?:;.*)?) to the sanitizeMimeType regex in src/media-understanding/apply.ts, preventing malformed input from silently truncating to an allowlist-passing prefix. The function is exported and covered by a new focused describe("sanitizeMimeType") block in apply.test.ts.

Confidence Score: 5/5

This PR is safe to merge; it tightens an input-validation check without altering any callsite or allowlist semantics.

The regex change is minimal and correct: end-anchoring stops prefix truncation, (?:;.*) preserves valid MIME-with-parameter inputs, and normalizeOptionalLowercaseString trims whitespace before the regex runs so no valid MIME type that was previously accepted is newly rejected. All seven new unit tests accurately pin the behavioral contract, and the PR confirms build + full suite green.

No files require special attention.

_{Reviews (1): Last reviewed commit: "security(media): anchor sanitizeMimeType..." | Re-trigger Greptile}

prtags · 2026-04-23T09:51:26Z

Related work from PRtags group charming-marten-atlb

Title: Open PR duplicate: sanitizeMimeType regex anchoring (#9795)

Number	Title
#68225	fix(media): anchor sanitizeMimeType regex and reject trailing garbage (#9795)
#68456*	security(media): anchor sanitizeMimeType regex to reject malformed input

* This PR

clawsweeper · 2026-04-26T05:32:23Z

Closing this as duplicate or superseded after Codex automated review.

PR #68456 should close as duplicate/superseded. Current main still has the unanchored sanitizer, so the #9795 hardening is not implemented yet, but this PR duplicates the same remaining work already tracked by open PR #68225 and also overlaps older open PR #61016.

Best possible solution:

Close #68456 as duplicate and consolidate the #9795 sanitizer hardening in one existing open implementation path, preferably the directly PRtags-linked #68225 because it carries focused regression tests, while keeping #9795 open until a selected fix lands.

What I checked:

Current main still has unanchored sanitizer: sanitizeMimeType remains private and matches only a valid-looking MIME prefix with no end anchor, so inputs with trailing garbage can still be truncated instead of rejected. (src/media-understanding/apply.ts:79, d54d2d6b9b8a)
Allowlist path is the impacted behavior: The sanitized MIME is computed at the media-understanding attachment boundary and then checked with allowedMimes.has(mimeType), matching the security concern described by security(media): anchor sanitizeMimeType regex to reject malformed input #68456/[Security] sanitizeMimeType regex should be anchored and case-insensitive #9795. (src/media-understanding/apply.ts:434, d54d2d6b9b8a)
Current tests do not make security(media): anchor sanitizeMimeType regex to reject malformed input #68456 implemented: Current main has parameterized-MIME coverage but local search found no direct sanitizeMimeType regression tests for trailing junk/path-like malformed input, so this is not a shipped fix. (src/media-understanding/apply.test.ts:428, d54d2d6b9b8a)
Direct duplicate PR fix(media): anchor sanitizeMimeType regex and reject trailing garbage (#9795) #68225 is open: GitHub API data shows fix(media): anchor sanitizeMimeType regex and reject trailing garbage (#9795) #68225 is open, was created before security(media): anchor sanitizeMimeType regex to reject malformed input #68456, targets the same [Security] sanitizeMimeType regex should be anchored and case-insensitive #9795 sanitizer anchoring problem, and changes src/media-understanding/apply.ts plus a focused apply.sanitize-mime.test.ts regression test. Its head SHA is 7abcecdb1ff61fcfa0ae74278a89cbfaeae84fd6. (7abcecdb1ff6)
PRtags grouped fix(media): anchor sanitizeMimeType regex and reject trailing garbage (#9795) #68225 and security(media): anchor sanitizeMimeType regex to reject malformed input #68456 as duplicates: The provided PR discussion includes a PRtags group titled Open PR duplicate: sanitizeMimeType regex anchoring (#9795) listing fix(media): anchor sanitizeMimeType regex and reject trailing garbage (#9795) #68225 and security(media): anchor sanitizeMimeType regex to reject malformed input #68456 together.
Additional overlapping open PR exists: GitHub search shows fix(media): anchor sanitizeMimeType regex and make case-insensitive per RFC 2045 #61016 is also open and older, with the same sanitizer anchoring / case-insensitive MIME validation theme for [Security] sanitizeMimeType regex should be anchored and case-insensitive #9795. That reinforces that security(media): anchor sanitizeMimeType regex to reject malformed input #68456 is duplicate work, not the only remaining implementation path. (3d16b7bdca4c)

So I’m closing this here and keeping the remaining discussion on the canonical linked item.

Codex Review notes: model gpt-5.5, reasoning high; reviewed against d54d2d6b9b8a.

security(media): anchor sanitizeMimeType regex to reject malformed input

a6cba75

openclaw-barnacle Bot added the size: XS label Apr 18, 2026

prtags Bot mentioned this pull request Apr 23, 2026

fix(media): anchor sanitizeMimeType regex and reject trailing garbage (#9795) #68225

Closed

5 tasks

clawsweeper Bot closed this Apr 26, 2026

vincentkoc mentioned this pull request Apr 28, 2026

fix(media): tighten sanitizeMimeType anchoring #73229

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

security(media): anchor sanitizeMimeType regex to reject malformed input#68456

security(media): anchor sanitizeMimeType regex to reject malformed input#68456
bluesky6868 wants to merge 1 commit into
openclaw:mainfrom
security-for-ai-agent:fix/sanitize-mime-type-anchor

bluesky6868 commented Apr 18, 2026

Uh oh!

greptile-apps Bot commented Apr 18, 2026

Uh oh!

prtags Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented Apr 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

bluesky6868 commented Apr 18, 2026

Summary

Change Type

Scope

Linked Issue/PR

Root Cause

Regression Test Plan

User-visible / Behavior Changes

Security Impact

Verification

Human Verification

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

greptile-apps Bot commented Apr 18, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

prtags Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

prtags Bot commented Apr 23, 2026 •

edited

Loading

clawsweeper Bot commented Apr 26, 2026 •

edited

Loading