Skip to content

Media: reject spoofed input_image MIME payloads#38289

Merged
vincentkoc merged 3 commits intomainfrom
vincentkoc-code/input-image-mime-spoofing-fix
Mar 6, 2026
Merged

Media: reject spoofed input_image MIME payloads#38289
vincentkoc merged 3 commits intomainfrom
vincentkoc-code/input-image-mime-spoofing-fix

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • Problem: normalizeInputImage trusted caller-declared non-HEIC image MIME types before the allowlist check, so a request could claim image/png while supplying concrete non-image bytes.
  • Why it matters: that let spoofed input_image payloads bypass the intended image MIME validation boundary on Gateway HTTP APIs.
  • What changed: image inputs now sniff bytes again before allowlist enforcement, reject concrete non-image detections for declared image/* payloads, and still keep HEIC/HEIF normalization scoped to actual HEIC inputs.
  • What did NOT change (scope boundary): this does not expand accepted MIME types, relax URL fetching policy, or change the existing HEIC -> JPEG normalization path.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Gateway input_image requests now reject spoofed non-image payloads even when the request declares an allowed image MIME type.

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22 / pnpm workspace
  • Model/provider: n/a
  • Integration/channel (if any): Gateway HTTP APIs
  • Relevant config (redacted): defaults

Steps

  1. Send an input_image base64 or URL request that declares image/png.
  2. Provide bytes that detect as application/pdf instead of an image.
  3. Observe Gateway validation.

Expected

  • Spoofed payloads are rejected before provider delivery.

Actual

  • The previous merged follow-up trusted declared non-HEIC MIME types and could accept spoofed payloads.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: HEIC base64 normalization, HEIC URL normalization, spoofed base64 image rejection, spoofed URL image rejection, OpenAI chat completions gateway path, OpenResponses parity validation.
  • Edge cases checked: non-HEIC images keep declared MIME after validation; HEIC image budget tests remain green.
  • What you did not verify: install smoke locally in Docker.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert this PR.
  • Files/config to restore: src/media/input-files.ts, src/media/input-files.fetch-guard.test.ts, CHANGELOG.md
  • Known bad symptoms reviewers should watch for: valid image inputs getting rejected due to incorrect MIME detection.

Risks and Mitigations

  • Risk: MIME sniffing could reintroduce non-HEIC behavior drift.
    • Mitigation: tests assert non-HEIC images keep their declared MIME after validation while concrete non-image detections are rejected.

AI Assistance

  • AI-assisted: yes
  • Testing: fully tested on the focused Gateway/media suites listed below

Verification:

  • pnpm vitest run src/media/input-files.fetch-guard.test.ts src/gateway/openai-http.test.ts src/gateway/openresponses-parity.test.ts src/gateway/openai-http.image-budget.test.ts
  • pnpm exec oxfmt --check src/media/input-files.ts src/media/input-files.fetch-guard.test.ts CHANGELOG.md

@vincentkoc vincentkoc self-assigned this Mar 6, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: S maintainer Maintainer-authored PR labels Mar 6, 2026
@vincentkoc vincentkoc marked this pull request as ready for review March 6, 2026 19:34
@vincentkoc vincentkoc merged commit 084dfd2 into main Mar 6, 2026
29 of 30 checks passed
@vincentkoc vincentkoc deleted the vincentkoc-code/input-image-mime-spoofing-fix branch March 6, 2026 19:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ffeeba6938

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/media/input-files.ts
Comment on lines +245 to +247
(detectedMime && HEIC_INPUT_IMAGE_MIMES.has(detectedMime)) ||
(HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && !detectedMime)
? (detectedMime ?? declaredMime)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor detected non-HEIC image types before conversion

The new sourceMime selection ignores detectedMime whenever it is an image type other than HEIC/HEIF, so a payload declared as image/heic but sniffed as image/png/image/jpeg is still treated as HEIC and routed through convertHeicToJpeg. In practice (for both base64 and URL inputs with incorrect mediaType/Content-Type), this can recompress or even fail valid non-HEIC images, which regresses the prior behavior and contradicts the intent to scope HEIC normalization to actual HEIC bytes.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 6, 2026

Greptile Summary

This PR hardens normalizeInputImage in src/media/input-files.ts by unconditionally calling detectMime on all image inputs and rejecting payloads where the declared MIME is image/* but the detected bytes are a concrete non-image type. The security hardening goal is achieved by the guard on line 241, which correctly prevents non-image bytes from bypassing validation.

However, there is a logic regression in the sourceMime selection (lines 244–248). When declaredMime is "image/heic"/"image/heif" but detectedMime is a different image type (e.g., "image/jpeg"), neither ternary condition is satisfied and the code falls back to sourceMime = declaredMime. This causes convertHeicToJpeg to be called on non-HEIC image bytes, which will fail at runtime. The previous code used detectedMime ?? declaredMime for all HEIC-declared inputs, gracefully handling this case. A test case for this "declared HEIC + detected non-HEIC image" scenario is also missing.

The spoofing-rejection guard is correct and new tests verify the intended behaviour for spoofed base64 and URL payloads. The CHANGELOG.md update is clear and appropriately scoped.

Confidence Score: 2/5

  • The security hardening goal for spoofed MIME payloads is achieved, but a logic regression will cause runtime errors when HEIC is declared but non-HEIC image bytes are detected.
  • The spoofing-rejection guard (line 241) correctly prevents non-image bytes from slipping through, and new tests verify this behaviour. However, the sourceMime ternary (lines 244–248) has a regression: when declaredMime is HEIC but detectedMime is a different image type, the code incorrectly sets sourceMime = declaredMime and proceeds to call convertHeicToJpeg on non-HEIC bytes. This will cause runtime failures for valid images with mismatched type declarations. The fix is surgical but critical before merging.
  • src/media/input-files.ts (lines 244–248) requires the sourceMime ternary fix to handle the "declared HEIC + detected non-HEIC image" case. A test case for this scenario should also be added.

Last reviewed commit: ffeeba6

Comment thread src/media/input-files.ts
Comment on lines +244 to +248
const sourceMime =
(detectedMime && HEIC_INPUT_IMAGE_MIMES.has(detectedMime)) ||
(HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && !detectedMime)
? (detectedMime ?? declaredMime)
: declaredMime;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEIC conversion triggered on non-HEIC image bytes when declared MIME is HEIC

When declaredMime is "image/heic" (or "image/heif") but detectedMime is a non-HEIC image type like "image/jpeg", the current logic sets sourceMime = declaredMime = "image/heic" and proceeds to call convertHeicToJpeg on non-HEIC bytes. This is a regression from the previous behavior.

Trace for declaredMime = "image/heic", detectedMime = "image/jpeg":

  • Guard check (line 241): both are image/* → no rejection (correct)
  • Part 1: "image/jpeg" && HEIC_INPUT_IMAGE_MIMES.has("image/jpeg")false
  • Part 2: HEIC_INPUT_IMAGE_MIMES.has("image/heic") && !"image/jpeg"false
  • Overall condition → false, takes the else branch
  • sourceMime = declaredMime = "image/heic"
  • Line 253: tries to convert JPEG bytes as HEIC

The old code handled this gracefully by using detectedMime ?? declaredMime for all HEIC-declared inputs, so a JPEG claimed as HEIC would be returned directly without conversion.

Fix: also use detectedMime when declaredMime is HEIC but a non-HEIC image was actually detected:

Suggested change
const sourceMime =
(detectedMime && HEIC_INPUT_IMAGE_MIMES.has(detectedMime)) ||
(HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && !detectedMime)
? (detectedMime ?? declaredMime)
: declaredMime;
const sourceMime =
(detectedMime && HEIC_INPUT_IMAGE_MIMES.has(detectedMime)) ||
(HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && !detectedMime)
? (detectedMime ?? declaredMime)
: HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && detectedMime
? detectedMime
: declaredMime;

This restores the old behaviour of using the detected MIME for HEIC-declared inputs while preserving all the new spoofing-rejection logic. A corresponding test case for "declared HEIC + detected non-HEIC image" is also missing.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/media/input-files.ts
Line: 244-248

Comment:
**HEIC conversion triggered on non-HEIC image bytes when declared MIME is HEIC**

When `declaredMime` is `"image/heic"` (or `"image/heif"`) but `detectedMime` is a non-HEIC image type like `"image/jpeg"`, the current logic sets `sourceMime = declaredMime = "image/heic"` and proceeds to call `convertHeicToJpeg` on non-HEIC bytes. This is a regression from the previous behavior.

Trace for `declaredMime = "image/heic"`, `detectedMime = "image/jpeg"`:
- Guard check (line 241): both are `image/*` → no rejection (correct)
- Part 1: `"image/jpeg" && HEIC_INPUT_IMAGE_MIMES.has("image/jpeg")``false`
- Part 2: `HEIC_INPUT_IMAGE_MIMES.has("image/heic") && !"image/jpeg"``false`
- Overall condition → **`false`**, takes the else branch
- **`sourceMime = declaredMime = "image/heic"`**
- Line 253: tries to convert JPEG bytes as HEIC

The old code handled this gracefully by using `detectedMime ?? declaredMime` for all HEIC-declared inputs, so a JPEG claimed as HEIC would be returned directly without conversion.

Fix: also use `detectedMime` when `declaredMime` is HEIC but a non-HEIC image was actually detected:

```suggestion
  const sourceMime =
    (detectedMime && HEIC_INPUT_IMAGE_MIMES.has(detectedMime)) ||
    (HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && !detectedMime)
      ? (detectedMime ?? declaredMime)
      : HEIC_INPUT_IMAGE_MIMES.has(declaredMime) && detectedMime
        ? detectedMime
        : declaredMime;
```

This restores the old behaviour of using the detected MIME for HEIC-declared inputs while preserving all the new spoofing-rejection logic. A corresponding test case for "declared HEIC + detected non-HEIC image" is also missing.

How can I resolve this? If you propose a fix, please make it concise.

mrosmarin added a commit to mrosmarin/openclaw that referenced this pull request Mar 6, 2026
* main: (37 commits)
  feat(gateway): add channel-backed readiness probes (openclaw#38285)
  CI: enable report-only Knip deadcode job
  Tooling: wire deadcode scripts to Knip
  Tooling: add Knip workspace config
  CI: skip detect-secrets on main temporarily
  Install Smoke: fetch docs base on demand
  CI: fetch base history on demand
  CI: add base-commit fetch helper
  Docs: clarify main secret scan behavior
  CI: keep full secret scans on main
  Docs: update secret scan reproduction steps
  CI: scope secret scans to changed files
  Media: reject spoofed input_image MIME payloads (openclaw#38289)
  chore: code/dead tests cleanup (openclaw#38286)
  Install Smoke: cache docker smoke builds
  Install Smoke: allow reusing prebuilt test images
  Install Smoke: shallow docs-scope checkout
  CI: shallow scope checkouts
  feat(onboarding): add web search to onboarding flow (openclaw#34009)
  chore: disable contributor labels
  ...
vincentkoc added a commit to BryanTegomoh/openclaw-upstream that referenced this pull request Mar 8, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
jenawant pushed a commit to jenawant/openclaw that referenced this pull request Mar 10, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
V-Gutierrez pushed a commit to V-Gutierrez/openclaw-vendor that referenced this pull request Mar 17, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 20, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening

(cherry picked from commit 084dfd2)
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 20, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening

(cherry picked from commit 084dfd2)
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
* Media: reject spoofed input image MIME types

* Media: cover spoofed input image MIME regressions

* Changelog: note input image MIME hardening
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant