Skip to content

feat(daemon): clamp oversized inline media on the prompt path#4646

Merged
wenshao merged 4 commits into
daemon_mode_b_mainfrom
feat/daemon-inline-media-clamp
May 31, 2026
Merged

feat(daemon): clamp oversized inline media on the prompt path#4646
wenshao merged 4 commits into
daemon_mode_b_mainfrom
feat/daemon-inline-media-clamp

Conversation

@doudouOUC

Copy link
Copy Markdown
Collaborator

Summary

  • Add clampInlineMediaPart utility that replaces inline image/audio/blob payloads exceeding a configurable byte ceiling (QWEN_CODE_MAX_INLINE_MEDIA_BYTES, default 10 MB) with a sanitized text placeholder, preventing oversized daemon media from blowing up request size or token budget.
  • Wire the clamp into Session.#resolvePrompt and advertise audio: true in HTTP daemon promptCapabilities.
  • Fix a type narrowing issue where fileData.fileUri became optional after clamping.

Test plan

  • Unit tests pass for inlineMediaLimit.ts (covers base64/buffer thresholds, audio/image/blob, env-var override)
  • Session integration test verifies oversized inline parts are replaced with placeholder text
  • Existing daemon E2E tests still pass

🤖 Generated with Qwen Code

doudouOUC added 2 commits May 31, 2026 00:52
Replace inline image/audio/blob payloads exceeding a configurable byte
ceiling (QWEN_CODE_MAX_INLINE_MEDIA_BYTES, default 10MB) with a sanitized
text placeholder via clampInlineMediaPart, wired into
Session.#resolvePrompt so oversized daemon media cannot blow up request
size or token budget. Also advertise audio:true in the HTTP daemon
promptCapabilities to match acpAgent and the actual #resolvePrompt
handling.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
clampInlineMediaPart returns the genai Part type, widening the resolved
parts union so fileData.fileUri is optional; assert it where resource_link
file paths are collected (that branch always sets fileUri).

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Copilot AI review requested due to automatic review settings May 30, 2026 16:53
@github-actions

Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR introduces a clamp mechanism for oversized inline media (image/audio/blob) payloads on the daemon prompt path, replacing them with a sanitized text placeholder when they exceed a configurable byte ceiling (default 10 MB). The implementation is well-tested, follows existing patterns, and addresses a real production risk of oversized requests blowing up token budgets.

🔍 General Feedback

  • Good architectural choice: Centralizing the clamp logic in packages/core as a pure utility function makes it reusable and easy to test independently.
  • Defense-in-depth: The clamp is applied at the #resolvePrompt boundary, catching both user-provided inline media and embedded context blobs from MCP servers.
  • Security-conscious: The mime type sanitization via sanitizeMimeForPlaceholder is correctly applied before embedding into the placeholder text.
  • Type safety improvement: The change from part.fileData!.fileUri to part.fileData!.fileUri! acknowledges that clamping can remove fileData from the part, making the type narrowing explicit.
  • Test coverage: Unit tests cover edge cases (base64 padding, data URL prefixes, env var validation, injection attempts), and the integration test verifies end-to-end behavior.

🎯 Specific Feedback

🟡 High

  • File: packages/cli/src/acp-integration/session/Session.ts:2752 - The non-null assertion part.fileData!.fileUri! is now required because clamping can convert a fileData part into a text part. However, this creates a potential runtime risk if clampInlineMediaPart is ever called on a fileData part (which it currently isn't).

    Recommendation: Add a type guard or filter to ensure atCommandParts only includes parts that still have fileData after clamping. For example:

    const pathSpecsToRead: string[] = atPathCommandParts
      .filter((part): part is { fileData: { fileUri: string } } => 'fileData' in part && part.fileData?.fileUri != null)
      .map((part) => part.fileData.fileUri);

    This protects against future refactoring where clampInlineMediaPart might be applied to fileData parts.

🟢 Medium

  • File: packages/core/src/core/inlineMediaLimit.ts:58 - The approxBase64Bytes function uses Math.floor((length * 3) / 4) - padding. While this is correct for standard base64, the comment about "padding chars are always trailing" could be strengthened by stripping whitespace first, as some base64 strings may contain newlines.

    Suggestion: Add .replace(/\s/g, '') before calculating length if there's any chance of whitespace in the input, or document that the function assumes no internal whitespace.

  • File: packages/cli/src/serve/acpHttp/dispatch.ts:269 - Changing audio: false to audio: true in promptCapabilities is correct given that #resolvePrompt now handles audio identically to image. However, this capability advertisement should be verified against the actual ACP client expectations to ensure no client sends audio data assuming the old limit.

    Suggestion: Consider adding a comment or documentation in the ACP protocol spec (if it exists) noting that audio support was added in this PR.

🔵 Low

  • File: packages/core/src/core/inlineMediaLimit.ts:69 - The formatMb function always uses .toFixed(1), which means 10 MB displays as "10.0 MB" and 10.5 MB as "10.5 MB". This is fine, but for very large files (e.g., 100+ MB), showing one decimal may be unnecessary.

    Suggestion: Consider using .toFixed(bytes >= 100 * 1024 * 1024 ? 0 : 1) to show "100 MB" instead of "100.0 MB" for cleaner output.

  • File: packages/core/src/core/inlineMediaLimit.ts:82 - The placeholder message says "Ask the user to resize/compress it, or reference it via an @file path". This is good guidance, but the message could be slightly more actionable by mentioning specific tools or common size limits.

    Suggestion: Consider adding an example like "common image formats under 10 MB work best" or linking to documentation if available.

  • File: packages/core/src/core/inlineMediaLimit.test.ts:17-23 - The base64 test cases use inline comments (// "ABC") which is fine, but adding a couple more test cases for real-world image sizes (e.g., a small PNG header) would strengthen confidence in the estimation formula.

    Suggestion: Add a test case like expect(approxBase64Bytes('iVBORw0KGgoAAAANS...')).toBeGreaterThan(1000) with a known-size payload.

✅ Highlights

  • Excellent security practice: The sanitizeMimeForPlaceholder function is correctly used to prevent injection attacks when embedding untrusted mime types into the placeholder text. The test at line 87-95 specifically validates this against newline and bracket injection.
  • Clean separation of concerns: The core logic (approxBase64Bytes, clampInlineMediaPart, oversizedMediaPlaceholder) is in packages/core with no side effects, making it easy to reason about and test.
  • Environment variable override: The QWEN_CODE_MAX_INLINE_MEDIA_BYTES env var provides operational flexibility without code changes, with proper validation (rejects non-numeric and non-positive values).
  • Performance-conscious: The byte estimation avoids decoding the base64, which is critical for multi-MB payloads on the hot path. The comment at line 37-38 clearly explains this design decision.
  • Comprehensive test coverage: The unit tests cover all edge cases including empty input, padding variations, data URL prefixes, and the integration test verifies the full prompt resolution path.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a prompt-path safeguard to prevent oversized inline media (image/audio/blob) from inflating daemon request sizes by clamping oversized inlineData parts to a sanitized text placeholder, and updates advertised ACP prompt capabilities accordingly.

Changes:

  • Introduce inlineMediaLimit utilities (approxBase64Bytes, env-configurable ceiling, and clampInlineMediaPart) and export them from @qwen-code/qwen-code-core.
  • Apply clamping during ACP Session prompt resolution for image, audio, and embedded blob resources.
  • Advertise audio: true in HTTP daemon promptCapabilities and add unit/integration tests covering the clamp behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/core/src/index.ts Re-exports the new inline media limiting utilities from the core public API.
packages/core/src/core/inlineMediaLimit.ts Implements env-configurable inline media size ceiling and clamping to a text placeholder.
packages/core/src/core/inlineMediaLimit.test.ts Adds unit tests for base64 sizing, env override, clamping, and mime sanitization.
packages/cli/src/serve/acpHttp/dispatch.ts Updates ACP initialize capabilities to advertise audio prompt support.
packages/cli/src/acp-integration/session/Session.ts Wires clamping into #resolvePrompt for inline image/audio and embedded blob parts.
packages/cli/src/acp-integration/session/Session.test.ts Adds an integration-style test ensuring oversized inline images are degraded before model send.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/core/src/core/inlineMediaLimit.ts
Keep both our inline-media-clamp test and upstream conversation_finished
/ tool-outcome telemetry tests.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Comment thread packages/cli/src/acp-integration/session/Session.ts
@wenshao

wenshao commented May 31, 2026

Copy link
Copy Markdown
Collaborator

Verification Report

Branch: feat/daemon-inline-media-clamp
Base: daemon_mode_b_main
Environment: macOS Darwin 25.4.0, Node.js v22.17.0

Test Results

Check Command Result
Unit Tests vitest run packages/core/src/core/inlineMediaLimit.test.ts ✅ 12 tests passed (3ms)
Session Integration vitest run packages/cli/src/acp-integration/session/Session.test.ts ✅ 85 tests passed (392ms)
Core Type Check npm run typecheck --workspace=packages/core ✅ Clean
CLI Type Check npm run typecheck --workspace=packages/cli ✅ Clean (0 errors)
Core Build npm run build --workspace=packages/core ✅ Success
ESLint eslint on all changed files ✅ No errors
Whitespace git diff --check ✅ Clean

Code Review Notes

  • approxBase64Bytes() correctly estimates decoded size from base64 length without allocating a decode buffer — efficient for the hot prompt path
  • clampInlineMediaPart() properly handles all Part variants (inlineData, text, fileData) and passes non-media parts through untouched
  • Mime type sanitization via sanitizeMimeForPlaceholder() prevents injection through untrusted resource/MCP server-provided mime types (test confirms no newlines or bracket sequences leak)
  • QWEN_CODE_MAX_INLINE_MEDIA_BYTES env var override has proper validation (positive, finite, floor'd)
  • The placeholder message is user-actionable: suggests resize/compress or @file path as alternatives
  • audio: true in HTTP dispatch is consistent with acpAgent.ts since #resolvePrompt handles audio identically to image

Note

Initial test run hit a workspace symlink staleness issue (not PR-related). After npm install --ignore-scripts to refresh workspace links, all tests passed cleanly.

Verdict

✅ Ready to merge — all test plan items pass, implementation is well-tested with 12 dedicated unit tests and 1 integration test covering the clamping path.


Verified by wenshao

Comment thread packages/cli/src/acp-integration/session/Session.ts
The readManyFiles result path pushed non-string contentParts (binary
files from @file references) directly into processedQueryParts without
clamping, bypassing the inline media size guard this PR introduces.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review round 2 summary

Thread Author Action Commit
readManyFiles @file path unguarded (Session.ts:2857) wenshao Fixed — wrapped non-string contentParts with clampInlineMediaPart 3077ab6

Resolved 1/1 threads.

const mime = sanitizeMimeForPlaceholder(mimeType);
return (
`[Media omitted: ${mime} is ~${formatMb(bytes)}MB, exceeding the ` +
`${formatMb(limitBytes)}MB inline limit. Ask the user to resize/compress ` +

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The placeholder text advises "reference it via an @file path so it can be read from disk" but this is misleading for 2 of the 3 call sites:

  • Session.ts:2939 (readManyFiles): the user already used @file — the advice is redundant.
  • Session.ts:2961 (MCP blob): the data comes from an MCP resource with no local file — the advice is nonsensical.

Only the direct user-input path (image/audio) benefits from this suggestion. Consider removing the @file advice from the generic placeholder — "Ask the user to resize/compress it" is universally correct.

Suggested change
`${formatMb(limitBytes)}MB inline limit. Ask the user to resize/compress ` +
return (
`[Media omitted: ${mime} is ~${formatMb(bytes)}MB, exceeding the ` +
`${formatMb(limitBytes)}MB inline limit. Ask the user to resize or ` +
`compress the file before attaching.]`
);

— claude-opus-4-6 via Qwen Code /review

/**
* Build the placeholder text substituted for an oversized inline media part.
*/
export function oversizedMediaPlaceholder(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] oversizedMediaPlaceholder is exported and re-exported through the barrel (export * from './core/inlineMediaLimit.js') but has zero external consumers — it is only called by clampInlineMediaPart in this same file. This exposes an implementation detail (including the exact string format) as public API surface, constraining future refactoring of placeholder wording.

Suggested change
export function oversizedMediaPlaceholder(
function oversizedMediaPlaceholder(

— claude-opus-4-6 via Qwen Code /review

if (bytes <= limitBytes) {
return part;
}
const mimeType = part.inlineData?.mimeType ?? 'application/octet-stream';

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The mimeType ?? 'application/octet-stream' fallback is untested — every test case provides an explicit mimeType. Since the Part type makes mimeType optional, real callers (e.g., the blob path with a missing contextPart.mimeType) can hit this branch.

it('falls back to application/octet-stream when mimeType is missing', () => {
  const part = { inlineData: { data: 'A'.repeat(2000) } };
  const result = clampInlineMediaPart(part as Part, 1000);
  expect(result.text).toContain('application/octet-stream');
  expect(result.text?.toLowerCase()).toContain('omitted');
});

— claude-opus-4-6 via Qwen Code /review

@chiga0 chiga0 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QoderWork Code Review — PR #4646

Decision: APPROVED

Summary

feat(daemon): clamp oversized inline media on the prompt path — prevents oversized inline image/audio/blob payloads from inflating daemon request size and token budget.

Review Notes

  • clampInlineMediaPart: Clean pure-function guard on individual Part objects. Returns text placeholder for oversized media, passes through unchanged otherwise.
  • approxBase64Bytes: O(1) estimation via string length arithmetic — no decode/copy, safe on multi-MB payloads on the prompt hot path. Correctly handles empty input, padding (==, =), and data: URL prefix.
  • Wiring completeness: All 3 inline media entry points in #resolvePrompt covered — direct image/audio parts, @file readManyFiles binary parts, and blob embedded context.
  • Security: Mime type in the placeholder text is sanitized via sanitizeMimeForPlaceholder against prompt injection (tested with [SYSTEM: hijack payload).
  • dispatch.ts: audio: true correctly mirrors actual #resolvePrompt handling (both image and audio become inlineData Parts).
  • fileUri! assertion: Correct — only applies on resource_link path where clampInlineMediaPart returns the original part unchanged.
  • Env-configurable ceiling: QWEN_CODE_MAX_INLINE_MEDIA_BYTES with 10MB default. Lazy-evaluated at each call, so runtime env changes take effect immediately.
  • Test coverage: 8 unit tests + 1 integration test covering base64 sizing, env override edge cases, clamping behavior, mime sanitization, and end-to-end Session prompt degradation.

No findings. LGTM.


This review was generated by QoderWork AI

@wenshao wenshao merged commit 8e3e983 into daemon_mode_b_main May 31, 2026
8 checks passed
@doudouOUC doudouOUC deleted the feat/daemon-inline-media-clamp branch May 31, 2026 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants