Skip to content

fix(media): tighten sanitizeMimeType anchoring#73229

Merged
vincentkoc merged 3 commits into
mainfrom
clownfish/ghcrawl-156640-autonomous-smoke
Apr 28, 2026
Merged

fix(media): tighten sanitizeMimeType anchoring#73229
vincentkoc merged 3 commits into
mainfrom
clownfish/ghcrawl-156640-autonomous-smoke

Conversation

@vincentkoc

Copy link
Copy Markdown
Member

Summary

Credit

Based on source PR #68225 by @ymaxgit / bluesky6868, with related review context from #61016 and #68456.

Validation

  • pnpm check:changed
  • Codex /review clean before merge

Closes #9795 when landed.

ProjectClownfish replacement details:

@greptile-apps

greptile-apps Bot commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the permissive \S.* tail in the sanitizeMimeType regex with a structured name=value parameter grammar that requires valid token syntax for both parameter names and values, anchors the end of input strictly with $, and excludes newlines/CR from quoted strings. It also adds targeted regression tests for quoted values, embedded newlines mid-chain, and malformed parameter tails (key without =, junk suffix, stray space-separated tokens). The fix is correct: normalizeOptionalLowercaseString calls .trim() so trailing newlines are stripped before the $ anchor is reached, and embedded newlines are rejected because neither [ ] (separator whitespace) nor [a-z0-9!#$&^_.+-] (token chars) matches .

Confidence Score: 5/5

Safe to merge — the regex tightening is logically correct and well-tested.

No P0 or P1 issues found. The new regex correctly structures parameter matching, the $ anchor behavior is properly handled by upstream .trim(), quoted-string CR/LF exclusion is explicit, and the test additions cover the new branches. No regressions are apparent in the CHANGELOG entry or test file.

No files require special attention.

Reviews (1): Last reviewed commit: "fix(media): tighten sanitizeMimeType anc..." | Re-trigger Greptile

@vincentkoc vincentkoc added the clawsweeper Tracked by ClawSweeper automation label Apr 28, 2026
@vincentkoc

Copy link
Copy Markdown
Member Author

ProjectClownfish pushed a narrow repair to this branch so the original contributor path can stay canonical.

Source PR: #73229
Validation: pnpm check:changed
Contributor credit is preserved in the branch history and PR context.

@vincentkoc vincentkoc force-pushed the clownfish/ghcrawl-156640-autonomous-smoke branch from 0bab8c0 to f3db162 Compare April 28, 2026 04:20
@vincentkoc

Copy link
Copy Markdown
Member Author

ProjectClownfish pushed a narrow repair to this branch so the original contributor path can stay canonical.

Source PR: #73229
Validation: pnpm check:changed
Contributor credit is preserved in the branch history and PR context.

@vincentkoc vincentkoc force-pushed the clownfish/ghcrawl-156640-autonomous-smoke branch from f3db162 to a64ebe0 Compare April 28, 2026 04:43
@aisle-research-bot

aisle-research-bot Bot commented Apr 28, 2026

Copy link
Copy Markdown

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium CR/LF allowed in MIME quoted-string parameters via regex escape range
2 🟡 Medium Unicode case-folding bypass in MIME type sanitizer due to /i regex
1. 🟡 CR/LF allowed in MIME quoted-string parameters via regex escape range
Property Value
Severity Medium
CWE CWE-113
Location src/media-understanding/apply.ts:82-86

Description

The MIME type validation regex in sanitizeMimeType allows literal CR/LF characters inside quoted-string parameter values.

  • HTTP_QUOTED_STRING permits escaped characters via \\[\t -~].
  • The character range \t -~ spans ASCII 0x09–0x7E, which includes \n (LF, 0x0A) and \r (CR, 0x0D).
  • As a result, an input like text/plain; title="a\\\nInjected: x" can match MIME_TYPE_WITH_OPTIONAL_PARAMS (the newline is part of the quoted-string), causing sanitizeMimeType() to accept a newline-containing header value.

Even though sanitizeMimeType() returns only the type/subtype, callers may incorrectly treat a non-undefined return as a signal that the original MIME string is safe to use in an HTTP header or log line, enabling CRLF/header-injection style issues.

Vulnerable code:

const HTTP_QUOTED_STRING = String.raw`"(?:[\t !#-\[\]-~]|\\[\t -~])*"`;

Recommendation

Reject CR/LF anywhere in the input, including within quoted strings.

Option A (tighten the escape character class): do not use a range starting at \t through space.

For example, only allow HTAB/space or visible ASCII after a backslash:

const QDTEXT = String.raw`[\t !#-\[\]-~]`;        // no CR/LF
const QUOTED_PAIR = String.raw`\\(?:[\t ]|[!-~])`; // no CR/LF
const HTTP_QUOTED_STRING = String.raw`"(?:${QDTEXT}|${QUOTED_PAIR})*"`;

Option B (defense-in-depth): explicitly reject any \r or \n in value before regex matching:

if (/[\r\n]/.test(trimmed)) return undefined;

Add tests to ensure inputs containing \r or \n inside quoted strings are rejected.

2. 🟡 Unicode case-folding bypass in MIME type sanitizer due to `/i` regex
Property Value
Severity Medium
CWE CWE-176
Location src/media-understanding/apply.ts:82-98

Description

sanitizeMimeType attempts to validate and normalize MIME types to an ASCII type/subtype form. However, the validation regex is created with the case-insensitive i flag:

  • Character classes like [a-z] under /i perform Unicode case folding in JavaScript.
  • This can allow certain non-ASCII code points that case-fold to ASCII letters (e.g., Kelvin sign U+212Ak) to match the ASCII-only pattern.
  • The function then returns match[1].toLowerCase(), which can transform the non-ASCII input into an ASCII string, defeating the intent to reject non-ASCII MIME values.

Vulnerable code:

const MIME_TYPE_WITH_OPTIONAL_PARAMS = new RegExp(
  String.raw`^${MIME_TYPE}(?:${MIME_PARAMETER})*$`,
  "i",
);
...
const match = trimmed.match(MIME_TYPE_WITH_OPTIONAL_PARAMS);
return match?.[1]?.toLowerCase();

Impact depends on downstream usage (e.g., allowlists/branching logic based on the sanitized MIME type): a crafted non-ASCII value can pass sanitization and be normalized to an ASCII token that may be treated as trusted.

Recommendation

Avoid Unicode case-folding during validation and explicitly enforce ASCII:

  1. Remove the i flag and validate against a lowercased ASCII-only input.
  2. Reject any non-ASCII characters before lowercasing.

Example fix:

const ASCII_ONLY = /^[\x00-\x7F]+$/;

export function sanitizeMimeType(value?: string): string | undefined {
  const trimmed = normalizeOptionalString(value);
  if (!trimmed || !ASCII_ONLY.test(trimmed)) return undefined;

  const lower = trimmed.toLowerCase();
  const match = lower.match(MIME_TYPE_WITH_OPTIONAL_PARAMS_CASE_SENSITIVE);
  return match?.[1];
}

const MIME_TYPE_WITH_OPTIONAL_PARAMS_CASE_SENSITIVE = new RegExp(
  String.raw`^${MIME_TYPE}(?:${MIME_PARAMETER})*$`
);

This ensures only ASCII input is accepted and prevents Unicode confusable/case-folding bypasses.


Analyzed PR: #73229 at commit a64ebe0

Last updated on: 2026-04-28T04:45:40Z

@vincentkoc vincentkoc merged commit 6fadc56 into main Apr 28, 2026
62 of 66 checks passed
@vincentkoc vincentkoc deleted the clownfish/ghcrawl-156640-autonomous-smoke branch April 28, 2026 04:48
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring
globalcaos pushed a commit to globalcaos/tinkerclaw that referenced this pull request May 13, 2026
* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring

* fix(media): tighten sanitizeMimeType anchoring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper Tracked by ClawSweeper automation maintainer Maintainer-authored PR size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] sanitizeMimeType regex should be anchored and case-insensitive

1 participant