fix(media): anchor sanitizeMimeType regex and reject trailing garbage (#9795)#68225
fix(media): anchor sanitizeMimeType regex and reject trailing garbage (#9795)#68225ymaxgit wants to merge 1 commit into
Conversation
Greptile SummaryAnchors the Confidence Score: 5/5Safe to merge — fix is correct, tests are thorough, and the only remaining finding is a minor hardening suggestion with no exploitable impact. No P0/P1 issues. The single P2 comment (using No files require special attention. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/media-understanding/apply.ts
Line: 88
Comment:
**`\s*` before `;` allows newlines in the parameters prefix**
`\s` includes `\n` and `\r\n`, so `text/plain\n;charset=utf-8` passes the regex and returns `text/plain`. The returned value is always safe, but the acceptance of a newline-before-semicolon input is technically non-RFC-7231-compliant. Replacing `\s*` with `[ \t]*` (horizontal whitespace only) would reject those edge-case inputs without affecting any real-world MIME strings.
```suggestion
const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:[ \t]*;.*)?$/);
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(media): anchor sanitizeMimeType rege..." | Re-trigger Greptile |
| // Trailing garbage (e.g. `text/plain<script>` or `text/plain\ngarbage`) is | ||
| // rejected outright instead of silently truncated. Input is already | ||
| // lowercased by normalizeOptionalLowercaseString. | ||
| const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:\s*;.*)?$/); |
There was a problem hiding this comment.
\s* before ; allows newlines in the parameters prefix
\s includes \n and \r\n, so text/plain\n;charset=utf-8 passes the regex and returns text/plain. The returned value is always safe, but the acceptance of a newline-before-semicolon input is technically non-RFC-7231-compliant. Replacing \s* with [ \t]* (horizontal whitespace only) would reject those edge-case inputs without affecting any real-world MIME strings.
| const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:\s*;.*)?$/); | |
| const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:[ \t]*;.*)?$/); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media-understanding/apply.ts
Line: 88
Comment:
**`\s*` before `;` allows newlines in the parameters prefix**
`\s` includes `\n` and `\r\n`, so `text/plain\n;charset=utf-8` passes the regex and returns `text/plain`. The returned value is always safe, but the acceptance of a newline-before-semicolon input is technically non-RFC-7231-compliant. Replacing `\s*` with `[ \t]*` (horizontal whitespace only) would reject those edge-case inputs without affecting any real-world MIME strings.
```suggestion
const match = trimmed.match(/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)(?:[ \t]*;.*)?$/);
```
How can I resolve this? If you propose a fix, please make it concise.|
Related work from PRtags group Title: Open PR duplicate: sanitizeMimeType regex anchoring (#9795)
|
|
Closing this as duplicate or superseded after Codex automated review. Close PR #68225 as duplicate/superseded. Current main still has the unanchored Best possible solution: Consolidate the #9795 sanitizer hardening on the open canonical duplicate PR #68456, carrying over any useful review detail from #68225 as needed, and keep #9795 open until the selected fix is merged and shipped. What I checked:
So I’m closing this here and keeping the remaining discussion on the canonical linked item. Codex Review notes: model gpt-5.5, reasoning high; reviewed against d251932fcfd6. |
Summary
sanitizeMimeTypeinsrc/media-understanding/apply.tsused an unanchored regex (/^(type/subtype)/). A string liketext/plain<script>...ortext/plain garbagesilently matched and returnedtext/plain, accepting the overall value as a valid MIME even though the input was not a well-formed RFC 7231 media type.textHintand mime strings coming intoapplyMediaUnderstanding. Accepting malformed values risks surprising callers that rely on it to reject garbage (e.g. hints propagated from untrusted text). Reported as a security input-validation bug in [Security] sanitizeMimeType regex should be anchored and case-insensitive #9795.(?:\s*;.*)?$). Parameters are still dropped from the returned value, but anything else after the subtype (whitespace + garbage, control characters, junk) causes the function to returnundefined.normalizeMimeTypeinsrc/media/mime.ts(which already splits on;), no change to any caller ofsanitizeMimeType, no change to the allowed character class for type/subtype. The function was exported purely so the new regression test can exercise it directly.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause
/^([a-z0-9!#$&^_.+-]+\/[a-z0-9!#$&^_.+-]+)/with no end anchor, so any valid-looking prefix was captured and the rest was ignored.applyMediaUnderstandingsuite).Regression Test Plan
src/media-understanding/apply.sanitize-mime.test.ts(new)type/subtypeaccepted.; charset=utf-8,; boundary=...) stripped to justtype/subtype.<script>, spaces + junk, embedded newlines) rejected (returnsundefined).undefined.User-visible / Behavior Changes
Very narrow: values that previously coerced to a type/subtype despite invalid trailing bytes now cause the caller to see
mimeTypeasundefined, which then falls back to the existing detection paths. No config changes.Security Impact (required)
Verification
pnpm test src/media-understanding/apply.sanitize-mime.test.ts— 6 passedpnpm test 'src/media-understanding/**/*.test.ts'— 131 passed (full media-understanding suite, on the pre-cherry-pick branch)pnpm check— 0 warnings, 0 errors (pre-cherry-pick branch)Human Verification
type/subtypeoptionally followed by;....text/plain\nContent-Type: text/html) now rejected; upper-case input still lowercased before matching vianormalizeOptionalLowercaseString.Compatibility / Migration
Risks and Mitigations
textHintvalues will now getundefinedand fall through to detection.applyMediaUnderstandingsuite continues to pass (131/131) on the scoped run, so no regressions were surfaced.