fix(core): truncate model-facing tool output#4520
Conversation
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] prompt_id not passed to truncateToolOutput — ToolOutputTruncatedEvent is constructed with prompt_id: '' (in truncation.ts:141), but scheduledCall.request.prompt_id is available here. Truncation telemetry events cannot be correlated with specific prompts or turns. Fix: add a promptId parameter to truncateToolOutput and pass it from the scheduler.
— qwen3.7-max via Qwen Code /review
| } | ||
| } | ||
|
|
||
| if (typeof content === 'string') { |
There was a problem hiding this comment.
[Critical] truncateToolOutput failure converts successful tool calls into errors
The await truncateToolOutput(...) call has no local try/catch. It sits inside an outer try whose catch treats any exception as a tool execution failure. If config.getTruncateToolOutputThreshold(), crypto.randomBytes, or logToolOutputTruncated throws, a tool that executed successfully gets reported as failed to the model — with a cryptic error that doesn't mention truncation.
Truncation is a presentation-layer concern and must never cause a successful tool call to appear as an error.
| if (typeof content === 'string') { | |
| if (typeof content === 'string') { | |
| try { | |
| const truncated = await truncateToolOutput( | |
| this.config, | |
| toolName, | |
| content, | |
| ); | |
| content = truncated.content; | |
| contentLength = content.length; | |
| } catch (truncErr) { | |
| // Truncation is best-effort. If it fails, pass the original | |
| // content through unchanged rather than failing the tool call. | |
| console.warn( | |
| `[CoreToolScheduler] truncation failed for ${toolName}:`, | |
| truncErr instanceof Error ? truncErr.message : truncErr, | |
| ); | |
| } | |
| } |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in e3075df. The scheduler now wraps truncation in a local try/catch, logs a warning on truncation failure, and preserves the original successful tool output. Added a regression test that forces threshold lookup to throw and asserts the tool call still completes as success with the original output.
| if (toolResult.error === undefined) { | ||
| let content = toolResult.llmContent; | ||
| const contentLength = | ||
| let contentLength = |
There was a problem hiding this comment.
[Suggestion] contentLength stale after hook/reminder appends when truncation doesn't fire
The const → let change only updates contentLength inside the truncation block. But content is also mutated at two earlier points (PostToolUse hook additionalContext and conditional rules/skill reminders) that do NOT update contentLength. When hooks/reminders append text but the total stays under threshold, contentLength underreports the actual content size.
Fix: add a single contentLength recalculation after ALL content mutations, just before convertToFunctionResponse:
contentLength = typeof content === 'string' ? content.length : undefined;— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in e3075df. contentLength is now recomputed after post-hook context, rule/skill reminders, and truncation handling. Added a regression test where PostToolUse appends context without truncation and asserts contentLength matches the final model-facing content.
| toolName, | ||
| content, | ||
| ); | ||
| content = truncated.content; |
There was a problem hiding this comment.
[Suggestion] truncated.outputFile discarded — user-facing returnDisplay not updated
The scheduler only reads truncated.content and ignores truncated.outputFile. For tools that don't self-truncate (the exact target audience of this safety net), the user's TUI display (returnDisplay) never mentions that truncation occurred or where the full output was saved. Compare with shell.ts which appends "Output too long and was saved to: <path>" to the display message.
Consider surfacing the truncation in returnDisplay:
if (truncated.outputFile) {
toolResult.returnDisplay =
(typeof toolResult.returnDisplay === 'string' ? toolResult.returnDisplay + '\n' : '') +
`Output too long and was saved to: ${truncated.outputFile}`;
}— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in e3075df. When truncation successfully saves the full output and returnDisplay is a string, the scheduler appends Output too long and was saved to: <path>. Added regression coverage for the user-facing display path.
| config: mockConfig, | ||
| onAllToolCallsComplete, | ||
| onToolCallsUpdate, | ||
| getPreferredEditor: () => 'vscode', |
There was a problem hiding this comment.
[Suggestion] No negative test for within-threshold passthrough
The single test only verifies that oversized output IS truncated. A regression that always truncates (e.g., an off-by-one in the threshold comparison, or truncateToolOutput being called with wrong arguments) would pass the existing test undetected.
Add a second test with output under the threshold (e.g., 50 chars with threshold=100) asserting the output equals the original content exactly and does not contain [CONTENT TRUNCATED].
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added in e3075df. The new passthrough regression test uses output below the threshold and asserts the model-facing output equals the original string exactly and does not contain [CONTENT TRUNCATED].
pomelo-nwu
left a comment
There was a problem hiding this comment.
Hi @Jerry2003826, thank you for your continued contributions — 9 PRs in a short time is impressive! 🎉
As we review your changes, we'd like to ask you to update each PR to follow the latest PR template on the main branch. The most important section is the Reviewer Test Plan, which significantly accelerates the review and merge process.
Specifically, for each PR please include:
- How to verify — clear reproduction steps so a reviewer can confirm the fix/feature
- Evidence (Before & After) — use the
tmux-real-user-testingskill (or manual tmux capture) to show before/after screenshots of the TUI behavior. Side-by-side evidence makes it much faster for maintainers to validate and merge - Tested on — fill in the OS table (macOS / Windows / Linux)
PRs with a complete Reviewer Test Plan are prioritized for review — without it, review may be delayed.
You can see the full template at: .github/pull_request_template.md
Thanks again for your effort — looking forward to getting these merged! 🚀
中文说明
你好 @Jerry2003826,感谢你的持续贡献——短时间内提交了 9 个 PR,非常高效!🎉
在 review 过程中,我们希望你能按照 main 分支上最新的 PR 模版更新每个 PR 的描述。其中最关键的部分是 Reviewer Test Plan,它能显著加速审核和合并流程。
具体来说,请为每个 PR 补充:
- How to verify — 清晰的复现步骤,让 reviewer 能确认修复/功能的效果
- Evidence (Before & After) — 使用
tmux-real-user-testingskill(或手动 tmux 截取)展示修改前后的 TUI 截图对比,前后对比能让维护者更快地验证和合并 - Tested on — 填写操作系统测试表格(macOS / Windows / Linux)
有完整 Reviewer Test Plan 的 PR 会被优先审核——缺少该部分可能会导致审核延迟。
完整模版见:.github/pull_request_template.md
再次感谢你的付出,期待尽快把这些 PR 合并!🚀
— Qwen Code
61c7865 to
e3075df
Compare
|
Updated in e3075df. This addresses the remaining review items: scheduler now passes Validation run locally on Windows: npm run test --workspace=@qwen-code/qwen-code-core -- src/utils/truncation.test.ts -t "caller prompt id"
npm run test --workspace=@qwen-code/qwen-code-core -- src/core/coreToolScheduler.test.ts -t "tool output truncation"
npm run test --workspace=@qwen-code/qwen-code-core -- src/utils/truncation.test.ts
npm run test --workspace=@qwen-code/qwen-code-core -- src/core/coreToolScheduler.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
npm run lint --workspace=@qwen-code/qwen-code-core
npm run typecheck --workspace=@qwen-code/qwen-code-core
git diff --check |
|
The Current code passes Current PR checks are green across lint, CodeQL, macOS, Ubuntu, and Windows. |
|
Updated the PR description to match the latest reviewer template and removed the stale/garbled details block. The code is unchanged from e3075df; current CI is green across lint, CodeQL, macOS, Ubuntu, and Windows. Ready for re-review when you have time. |
|
|
||
| if (typeof content === 'string') { | ||
| try { | ||
| const truncated = await truncateToolOutput( |
There was a problem hiding this comment.
[Suggestion] Truncation runs AFTER hook context and <system-reminder> blocks are appended
At this point, content includes the raw tool output PLUS PostToolUse hook additionalContext, conditional-rule reminders, and skill-activation notices. The head/tail truncation algorithm is structure-blind — if the combined payload exceeds the threshold, it will bisect <system-reminder> XML envelopes, producing malformed markup the model then tries to interpret. Additionally, skill activation side effects (refreshSkills, notifyChangeListeners) have already fired, but the reminder telling the model about the new skill may get truncated away.
Consider moving the truncation block to run immediately after content = toolResult.llmContent (before PostToolUse hooks, conditional rules, and skill activation), so only raw tool output is measured and truncated. Then let hooks and reminders append onto the already-truncated content — they always reach the model intact.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Updated in ad9ee15. I moved the truncation block to run before PostToolUse additionalContext, conditional-rule reminders, and skill-activation reminders are appended. That keeps the head/tail truncator focused on the raw tool output and lets hook/reminder metadata reach the model intact after truncation.
| completedCalls[0].response.responseParts, | ||
| ); | ||
|
|
||
| expect(output).toContain('[CONTENT TRUNCATED]'); |
There was a problem hiding this comment.
[Suggestion] contentLength not asserted when truncation fires
This test triggers truncation and checks the model-facing output, but never reads completedCalls[0].response.contentLength. The new post-truncation contentLength reassignment (line ~2983 in coreToolScheduler.ts — the reason const was changed to let) has zero behavioral verification in the truncation path. Test 4 checks contentLength but only in the no-truncation case.
Consider adding:
expect(completedCalls[0].response.contentLength).toBe(output.length);
expect(completedCalls[0].response.contentLength).toBeLessThan(5000);— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added in ad9ee15. The large-output truncation regression now asserts completedCalls[0].response.contentLength === output.length, so the post-truncation recomputation is covered when truncation actually fires.
| expect(completedCalls[0].response.resultDisplay).toEqual( | ||
| expect.stringContaining('Output too long and was saved to:'), | ||
| ); | ||
| }); |
There was a problem hiding this comment.
[Suggestion] No test for hook context + truncation interaction
No test exercises both PostToolUse hook additionalContext AND truncation firing simultaneously. Test 4 uses hooks with threshold: 1000 and a 12-char output (no truncation fires). Test 1 triggers truncation without hooks. The integration boundary — where hook context pushes combined content over the threshold — is untested.
This matters especially if the truncation-ordering suggestion above is not adopted: hook-injected content (e.g., policy reminders) could silently disappear when the combined payload triggers truncation.
Consider adding a test with hooks enabled, output + hook context exceeding the threshold, and asserting the truncation marker is present, contentLength reflects the truncated length, and the full-output file contains the hook context.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added a regression test in ad9ee15: preserves hook context outside truncated tool output. It enables PostToolUse hooks, forces truncation, asserts the truncation marker/contentLength, and verifies the full hook context is appended after the truncated tool output instead of being bisected. Since the implementation now truncates raw tool output before appending hook/reminder metadata, the saved full-output file intentionally captures the raw tool output rather than the later hook context.
| config: Config, | ||
| toolName: string, | ||
| content: string, | ||
| promptId = '', |
There was a problem hiding this comment.
[Suggestion] promptId = '' default masks telemetry gaps in existing callers
shell.ts:2062 and mcp-tool.ts:438 call truncateToolOutput with 3 arguments (no promptId), so their telemetry events carry prompt_id: ''. This defeats the purpose of adding promptId for telemetry correlation — two of the three callers emit truncation events that cannot be traced to a specific prompt.
Consider either making promptId required (so the compiler enforces it), or — if the plan is to consolidate truncation at the scheduler level — removing the tool-level truncateToolOutput calls from shell.ts and mcp-tool.ts entirely. That would also eliminate the double-truncation edge case where scheduler-level truncation runs on already-truncated content after hooks push it back over the threshold.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Documented the fallback in ad9ee15. The scheduler-level path now passes scheduledCall.request.prompt_id, which covers the model-facing history truncation added by this PR. I kept promptId optional for the existing direct tool-level callers because shell/MCP execute() do not receive scheduler request context today, and removing those direct truncation guards would widen this PR by changing behavior for direct tool invocations outside the scheduler. The JSDoc now calls out that scheduler callers should pass the prompt id while direct tool implementations may omit it when they do not have that context.
e3075df to
ad9ee15
Compare
|
Updated in ad9ee15 for the latest review pass. Changes made:
Validation run locally on Windows: npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts -t "preserves hook context outside truncated tool output|truncates large model-facing tool output|reports contentLength after hook context"
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts
npm run typecheck --workspace=packages/core
npm run lint --workspace=packages/core
git diff --check |
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] fs.writeFile in truncation.ts:89 lacks { mode: 0o600 } — the saved tool output files inherit the process umask (typically 0o644 on Linux), making them world-readable on shared systems. Tool output frequently contains secrets (env vars, API keys). This PR extends truncation from shell/MCP-only to ALL tools, widening the exposure. Other sensitive-file writers in this codebase use { mode: 0o600 } (see file-token-storage.ts, oauth-token-storage.ts, sessionService.ts). Consider adding { mode: 0o600 } to the writeFile call and { mode: 0o700 } to the mkdir call. Pre-existing issue, but the wider surface area makes it worth addressing in a follow-up.
— qwen3.7-max via Qwen Code /review
| } | ||
| } | ||
|
|
||
| if (typeof content === 'string') { |
There was a problem hiding this comment.
[Critical] Double truncation for shell tool outputs
shell.ts:2062 already calls truncateToolOutput on llmContent before returning to the scheduler. The scheduler then calls it a second time here. truncateAndSaveToFile produces output of approximately threshold + ~330 chars (the header text) and truncateLines + ~8 lines (header + separator), which exceeds both the char threshold and line limit on the second pass — triggering actual re-truncation with the default config (25K chars, 1000 lines).
Consequences:
- Nested truncation headers: the model sees two
[CONTENT TRUNCATED]envelopes - Two temp files: file B (from the second pass) contains the already-truncated content, not the original. File A (from the first pass, containing the real full output) is orphaned — the model-facing message points to B, not A
- Duplicate
returnDisplaymutation: shell.ts:2072 already appends"Output too long and was saved to: <A>", and the scheduler appends a second"Output too long and was saved to: <B>" - Duplicate telemetry:
logToolOutputTruncatedfires twice per tool call
Suggested fix: remove the tool-level truncateToolOutput calls from shell.ts (lines 2061-2076) and let the scheduler be the single truncation point. This aligns with the PR's stated goal of centralizing truncation and eliminates the dual-responsibility. mcp-tool.ts is safe because it returns Part[] (non-string), which the typeof guard skips.
| if (typeof content === 'string') { | |
| if (typeof content === 'string' && !content.startsWith('Tool output was too large')) { | |
| try { | |
| const truncated = await truncateToolOutput( |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 8ef3cb3. Shell no longer calls truncateToolOutput directly, so the scheduler is the single model-facing string-output truncation point. The scheduler also skips content that already starts with TOOL_OUTPUT_TRUNCATED_PREFIX, which protects existing direct truncation callers from being wrapped again.
| if (toolResult.error === undefined) { | ||
| let content = toolResult.llmContent; | ||
| const contentLength = | ||
| let contentLength = |
There was a problem hiding this comment.
[Suggestion] Dead initial contentLength assignment
This let contentLength is always overwritten by the identical expression at line ~2984 before it is ever read. The only early-return path between the two assignments (shouldStop) never reads contentLength.
The landmine: if a future refactor removes the recalculation at line ~2984 (thinking "contentLength is already set at the top"), contentLength silently regresses to the raw tool output size, and downstream consumers see a value that excludes truncation, hook context, and reminders.
| let contentLength = | |
| let contentLength: number | undefined; |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 8ef3cb3. Removed the top-of-block assignment and now declares contentLength only at the final post-mutation calculation before convertToFunctionResponse.
| }); | ||
| }); | ||
|
|
||
| describe('CoreToolScheduler tool output truncation', () => { |
There was a problem hiding this comment.
[Suggestion] No test for the double-truncation scenario
All tests here use LargeOutputTool, which returns raw un-truncated content. The most dangerous production path — shell tool output exceeding the threshold — is precisely the scenario where truncateToolOutput runs twice (once in shell.ts, once in the scheduler), and it has zero coverage.
Consider adding a test where the tool's execute() returns content that already includes the "Tool output was too large" preamble (simulating what shell.ts produces after its own truncation), and verify the scheduler does not produce a second truncation header or a second file write.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added in 8ef3cb3. The new regression returns already-truncated content, asserts the model-facing output is unchanged, keeps the existing returnDisplay, and verifies fs.writeFile is not called a second time.
ad9ee15 to
8ef3cb3
Compare
|
Updated in 8ef3cb3 for the latest review pass. Changes made:
Validation run locally on Windows: npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts src/utils/truncation.test.ts -t "already-truncated|line limit exceeded|binding constraint|very long lines|correct file path|path traversal"
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts
npm run test --workspace=packages/core -- src/utils/truncation.test.ts
npm run test --workspace=packages/core -- src/tools/shell.test.ts -t "appends the hint after command output is assembled"
npm run test --workspace=packages/core -- src/tools/shell.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/tools/shell.ts packages/core/src/tools/shell.test.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/tools/shell.ts packages/core/src/tools/shell.test.ts
npm run typecheck --workspace=packages/core
npm run lint --workspace=packages/core
git diff --check |
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] Stale comment in shell.ts still references truncation I/O (~line 1980)
The comment block near shell.ts:~1980 reads "intentionally BEFORE the post-processing block below (truncation I/O, output-file write)" and "Truncation time is bounded by the temp-dir backend and isn't representative of the command's actual wait." Both references describe behavior removed by this PR — truncation I/O and output-file writes no longer happen in shell.ts. The PR already updated surrounding comments (returnDisplayMessage build order, long-run advisory) but missed this one.
Future readers will waste time looking for a truncation block that no longer exists in shell.ts, or may be misled about what the timing measurement accounts for.
// - Wall-clock duration >= threshold. Measured spawn -> resultPromise
// settle, intentionally BEFORE post-processing (attribution,
// returnDisplay build, hint append). The hint reports how long
// the COMMAND blocked the agent, not how long the tool call
// spent including post-processing.
(This finding is outside the PR diff, so it's posted in the review body rather than as an inline comment.)
— qwen3.7-max via Qwen Code /review
8ef3cb3 to
a3bdc99
Compare
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] fs.writeFile / fs.mkdir failure swallows the underlying error (truncation.ts:104-108)
The catch (_error) block inside truncateAndSaveToFile returns a generic "[Note: Could not save full output to file]" message but discards the actual error. At 3 AM investigating why truncation silently fell back on every tool call (disk full, temp dir deleted, permissions change), there would be no log, no telemetry, and no error details. Since this is a scheduler-level code path every tool hits, a systemic temp-dir problem would cascade silently.
Consider adding debugLogger.warn(\Failed to save truncated tool output to ${outputFile}: ` + (error instanceof Error ? error.message : String(error)))` inside the catch block before returning.
— qwen3.7-max via Qwen Code /review
| content = truncated.content; | ||
| if ( | ||
| truncated.outputFile && | ||
| typeof toolResult.returnDisplay === 'string' |
There was a problem hiding this comment.
[Suggestion] typeof toolResult.returnDisplay === 'string' guard not covered by tests
When truncation fires and returnDisplay is a structured type (FileDiff, TodoResultDisplay, AnsiOutputDisplay, etc.), this typeof guard prevents appending the "Output too long and was saved to: <path>" notice. No scheduler test currently exercises this branch with a non-string returnDisplay — all truncation tests use LargeOutputTool which returns a string returnDisplay.
If a future refactor were to remove this guard and unconditionally concatenate, structured display objects would be corrupted (string concat on an object produces [object Object]). No test would fail.
Consider adding a regression test where LargeOutputTool returns a structured returnDisplay (e.g., { type: 'ansi', content: '...' }) alongside oversized string content, then assert content is truncated but resultDisplay is the original structured object unchanged.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 378c2ac. Added a scheduler regression test with a structured returnDisplay; the model-facing content is truncated while resultDisplay remains the original structured object unchanged.
| await fs.mkdir(projectTempDir, { recursive: true }); | ||
| await fs.writeFile(outputFile, content); | ||
| await fs.mkdir(projectTempDir, { recursive: true, mode: 0o700 }); | ||
| await fs.writeFile(outputFile, content, { mode: 0o600 }); |
There was a problem hiding this comment.
[Suggestion] ToolOutputTruncatedEvent does not carry the saved-file path
result.outputFile is available at the call site below (around line 148) but is not forwarded into ToolOutputTruncatedEvent. When investigating a truncation event in production (e.g., "why did the model behave oddly after truncation?"), there is no way to correlate the event with the actual file that holds the full output.
Consider adding an output_file: string field to ToolOutputTruncatedEvent (in packages/core/src/telemetry/types.ts) and forwarding result.outputFile here, so post-mortem analysis can jump straight to the saved file.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 378c2ac. ToolOutputTruncatedEvent now carries output_file, truncateToolOutput forwards the saved path, and the telemetry logger/QwenLogger coverage was updated.
| } | ||
| } catch (truncationError) { | ||
| debugLogger.warn( | ||
| `Tool output truncation failed for ${toolName}: ` + |
There was a problem hiding this comment.
[Suggestion] Truncation-failure log lacks callId / prompt_id
The warning only includes toolName. When the same tool runs many times in a session (common for Bash or read_file), a truncation-failure warning can't be correlated to the specific invocation that failed.
scheduledCall.callId and scheduledCall.request.prompt_id are both in scope here — including them makes the warning actionable when paging through logs.
| `Tool output truncation failed for ${toolName}: ` + | |
| debugLogger.warn( | |
| `Tool output truncation failed for ${toolName} ` + | |
| `(callId=${scheduledCall.callId}, prompt_id=${scheduledCall.request.prompt_id}): ` + | |
| (truncationError instanceof Error | |
| ? truncationError.message | |
| : String(truncationError)), | |
| ); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in 378c2ac. The truncation-failure warning now includes the scheduled request callId and prompt_id; current types store the call id on scheduledCall.request.callId.
a3bdc99 to
97531e2
Compare
|
Updated in 97531e2 for the latest review-body suggestion. Changes made:
Validation run locally on Windows: npm run test --workspace=packages/core -- src/utils/truncation.test.ts -t "file write errors"
npm run test --workspace=packages/core -- src/utils/truncation.test.ts
npx prettier --check packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
npx eslint packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts
git diff --check |
Local verification report (Linux, tmux)Verified PR head Environment
Results
Behavior coveredThe new scheduler-level
Diff-level sanity checks (read alongside the tests)
Notes
ConclusionLGTM from a verification standpoint. All commands listed in the PR's reviewer test plan pass locally on Linux; the additional Verification artifacts (logs) retained locally at |
| try { | ||
| await fs.mkdir(projectTempDir, { recursive: true }); | ||
| await fs.writeFile(outputFile, content); | ||
| await fs.mkdir(projectTempDir, { recursive: true, mode: 0o700 }); |
There was a problem hiding this comment.
[Suggestion] mode: 0o700 on fs.mkdir is silently defeated by prior directory creation
logger.ts (line ~181) calls await fs.mkdir(this.qwenDir, { recursive: true }) on the same getProjectTempDir() path early in session initialization — without a restrictive mode. Since fs.mkdir with recursive: true does not modify permissions of existing directories, by the time truncateAndSaveToFile runs, the directory already exists with default permissions (typically 0o755). The 0o700 mode here is effectively dead code.
The file-level 0o600 on writeFile still works correctly for newly-created files, so the immediate data protection is intact. But directory listing by other local users remains possible on shared systems.
Consider one of:
- (a) Add
mode: 0o700where the directory is first created (logger.tsorstorage.ts) - (b) Add
await fs.chmod(projectTempDir, 0o700)after themkdircall here to enforce the mode regardless of who created it - (c) Remove the misleading
modeparameter and add a comment explaining that directory permission hardening is handled elsewhere
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in a90adf9. After mkdir, truncateAndSaveToFile now best-effort chmods the project temp dir to 0o700 so an existing dir with looser permissions is tightened before writing the saved output. chmod failure is logged but does not prevent saving the 0o600 file; added regression coverage for that fallback.
| } | ||
|
|
||
| if ( | ||
| typeof content === 'string' && |
There was a problem hiding this comment.
[Suggestion] Shell-level metadata (long-run hint, attribution warning) is subject to head/tail truncation without protection
Shell.ts appends the long-run advisory hint (~530 chars) and attributionWarning to llmContent before returning to the scheduler. The scheduler's truncation then treats this combined string as raw tool output. Unlike PostToolUse hook context — which is explicitly deferred to after truncation (lines 2901-2905) — tool-level metadata appended inside the tool implementation receives no equivalent protection.
With the default threshold (25,000 chars), the tail budget is generous enough that the hint reliably survives. The practical risk is low. However, a future reduction in default thresholds or a user-configured low threshold could bisect the hint or attribution warning mid-sentence. The old shell test that pinned the hint's survival through truncation was removed and replaced with one that doesn't exercise the truncation interaction.
Consider either:
- (a) Documenting in
shell.tsthat the hint and attribution warning are intentionally subject to scheduler truncation - (b) Having tools expose metadata separately (e.g., on
toolResult) so the scheduler can append it after truncation, similar topostToolUseAdditionalContext
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Handled in a90adf9 with the lower-risk option. I documented in shell.ts that the model-facing copies of the long-run hint and attribution warning are intentionally part of shell llmContent and therefore subject to scheduler truncation, while returnDisplay still keeps the full user-facing metadata. I did not add a separate tool metadata API in this PR to avoid widening the change beyond the truncation fix.
97531e2 to
378c2ac
Compare
|
Updated PR description to the latest template with Reviewer Test Plan, Evidence, Tested on, and the Chinese explanation section. Also pushed 378c2ac for the latest inline feedback:
Validation run locally on Windows: npm run test --workspace=packages/core -- src/utils/truncation.test.ts
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts
npm run test --workspace=packages/core -- src/telemetry/loggers.test.ts
npx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/telemetry/types.ts packages/core/src/telemetry/loggers.test.ts packages/core/src/telemetry/qwen-logger/qwen-logger.ts
npx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts packages/core/src/utils/truncation.ts packages/core/src/utils/truncation.test.ts packages/core/src/telemetry/types.ts packages/core/src/telemetry/loggers.test.ts packages/core/src/telemetry/qwen-logger/qwen-logger.ts
npm run typecheck --workspace=packages/core
git diff --check |
|
CI note: the latest Qwen Code CI run did not reach lint/test execution. The Lint job failed during |
|
Updated in 3b375ba to scope this PR down to the minimal scheduler-level fix. What changed:
Validation:
|
✅ Re-verification (updated scope) — PR #4520
结论 / Verdict: 新方案(在 CoreToolScheduler 集中截断)构建/测试/运行时/静态检查全部通过,建议合并。 The new scheduler-centric approach passes build, tests, runtime and static checks — recommend merge. What changed since the last verification
The new design: the scheduler calls the existing Method
Build & static ✅
Tests ✅
🎯 Runtime check (built
|
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI failing: Post Coverage Comment, Test (${{ matrix.os }}, Node ${{ matrix.node-version }}), Lint, CodeQL, Classify PR. — qwen3.7-max via Qwen Code /review
✅ 本地 tmux 真实构建 + 测试 + 运行时复验 — PR #4520
结论 / Verdict: 构建、类型检查、lint/格式、全部受影响测试、以及真实运行时行为全部通过,重构在构建产物中得到验证 —— 建议合并。 Build, typecheck, lint/format, all affected tests, and real runtime behaviour all pass; the refactor is verified in the built artifacts — recommend merge. 复验前提 / Setup
当前改动范围(4 文件,+291/-104) / Current scope — verified against the diff, not the body把"超大模型可见工具输出"的截断从 shell 工具上移到
1) 构建 & 类型检查 / Build & typecheck
构建产物结构性证据 / structural evidence in
2) 单元测试 / Tests
3 个新增用例均通过: 3) Lint / 格式 / Lint & format
4) 真实运行时复验(核心 PR,TUI 截图 N/A) / Real runtime exercise直接驱动构建产物。除了跑作者用 mock 验证 scheduler 接线的单测外,这里用未 mock 的真实 即:超大输出在进入模型历史前被有界截断,完整内容无损落盘,小输出原样透传 —— 与 #4049 的目标一致。 备注 / Notes(均不阻塞合并 / non-blocking)
在隔离 worktree( |
wenshao
left a comment
There was a problem hiding this comment.
R18: Rescoped PR (4 files, +291/-104 vs 14 files, +1652/-146). No review findings on the minimal scheduler-level truncation fix. Downgraded from Approve to Comment: CI failing: Post Coverage Comment, Test, Lint, CodeQL, Classify PR. 3 low-confidence observations for human review only (no try-catch around truncateToolOutput, potential double-truncation for self-truncating tools, non-string resultDisplay drops TUI notice). 205 tests pass. — qwen3.7-max via Qwen Code /review
|
Updated the PR description to match the rescoped 4-file diff. Also checked the latest CI after R18: the current run is green now. Current status:
I left the three R18 low-confidence observations as non-blocking for this minimal PR:
|
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI failing: Post Coverage Comment, Test (${{ matrix.os }}, Node ${{ matrix.node-version }}), Lint, CodeQL, Classify PR. — qwen-latest-series-invite-beta-v38 via Qwen Code /review
…-history-truncation
3b375ba to
2ea9955
Compare
BZ-D
left a comment
There was a problem hiding this comment.
Reviewed the diff and ran the targeted scheduler, shell, and truncation tests; no blocking issues found.
DragonnZhang
left a comment
There was a problem hiding this comment.
Review — PR #4520 (commit 2ea9955)
Direction: Sound. Scheduler-level truncation is the correct place to bound model-facing string output, and the rescope to 5 files keeps the change focused.
CI: all_pass (17/17 checks).
Deterministic checks: tsc=0, eslint=0.
Findings: 1 new Suggestion (combined guard re-truncation edge case). All other observations are already covered by the extensive prior review thread (40+ existing comments).
One inline comment posted.
— qwen-code via Qwen Code /review
| const shouldTruncateCombinedContent = | ||
| content.length > combinedThreshold || | ||
| content.split('\n').length > combinedLines; | ||
| if (shouldTruncateCombinedContent) { |
There was a problem hiding this comment.
[Suggestion] Combined guard can re-truncate already-truncated content
When raw tool output exceeds the default threshold, the first truncateToolOutput call (line ~2916) wraps it in a truncation envelope (starting with "Tool output was too large and has been truncated.") and saves the full output to a temp file. After PostToolUse hook context is appended, this combined guard checks whether the total exceeds combinedThreshold (2x default) and calls truncateToolOutput a second time.
In the edge case where both raw output AND hook context independently exceed the threshold, the combined guard re-truncates content that already contains a truncation envelope. The result:
- Nested truncation envelopes — the model sees two layers of "Tool output was too large..." markers
- Two temp files saved to disk (with two different random names)
- Two "Output too long and was saved to: ..." lines accumulated in
resultDisplay - The second head/tail split may bisect the first envelope's marker text
The test "guards the final combined output after PostToolUse context is appended" only exercises the path where the first truncation does NOT fire (rawOutput = 'A'.repeat(90) < threshold = 100), so this interaction is untested.
Consider tracking whether the first pass already truncated (the first call's outputFile is set) and skipping the combined guard in that case, or add a test that exercises both passes firing simultaneously.
— qwen-code via Qwen Code /review
There was a problem hiding this comment.
Fixed in a811f425d. The final combined guard now skips the second truncation pass when the raw tool output was already truncated and persisted by the first pass. That avoids nested truncation envelopes, duplicate temp files, and duplicate resultDisplay save notices while still appending the PostToolUse context.
Added coverage:
does not re-truncate already-truncated output after PostToolUse context is appended
Validated:
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.tsnpx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.tsnpx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts
| content, | ||
| ); | ||
| content = truncated.content; | ||
| if (truncated.outputFile && typeof resultDisplay === 'string') { |
There was a problem hiding this comment.
[Suggestion] toolOutputAlreadyTruncated flag is gated on typeof resultDisplay === 'string', but content = truncated.content on line 3174 runs unconditionally. When a tool returns string llmContent that gets truncated with an outputFile, but a non-string resultDisplay (e.g., FileDiff from edit tool, AgentResultDisplay from agent tool), the flag stays false even though content has already been truncated. The combined-content guard at line ~3272 will then re-truncate already-truncated content when PostToolUse context or conditional-rule reminders are appended — the exact double-truncation scenario this commit aims to prevent.
Decouple the flag from the display type check:
| if (truncated.outputFile && typeof resultDisplay === 'string') { | |
| if (truncated.outputFile) { | |
| toolOutputAlreadyTruncated = true; | |
| if (typeof resultDisplay === 'string') { | |
| resultDisplay += | |
| (resultDisplay ? '\n' : '') + | |
| `Output too long and was saved to: ${truncated.outputFile}`; | |
| } | |
| } |
The regression test covers the string-display path but does not exercise a structured returnDisplay with large string llmContent + PostToolUse hook context. Consider adding a test case for that edge case as well.
— qwen3.7-plus via Qwen Code /review
There was a problem hiding this comment.
Handled in the current branch. toolOutputAlreadyTruncated is now set whenever the first truncation pass persists raw string output to an outputFile, independent of whether resultDisplay is a string. The display save notice remains string-only.
Added a regression with structured FileDiff result display plus PostToolUse additional context. It asserts the raw output truncation runs once, the appended context is preserved, and the structured display object is left unchanged.
Validated:
npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.ts -t "structured result display"npm run test --workspace=packages/core -- src/core/coreToolScheduler.test.tsnpx prettier --check packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.tsnpx eslint packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts
…-history-truncation # Conflicts: # packages/core/src/core/coreToolScheduler.ts
37a65fe to
efa179a
Compare
DragonnZhang
left a comment
There was a problem hiding this comment.
Review — PR #4520 (commit efa179a)
Direction: Sound. The toolOutputAlreadyTruncated guard added in a811f425d correctly prevents the combined guard from re-truncating content that was already persisted by the first truncation pass. This closes the nested-truncation-envelope concern from the previous review.
CI: all_pass (24/24 checks).
Detected issues: None beyond the extensively-discussed backlog (39 stale inline comments, all addressed by subsequent commits). The rescoped 4-file core change is focused and well-tested.
Incremental since R19 (commit 2ea9955): +8 lines in coreToolScheduler.ts (the toolOutputAlreadyTruncated flag + combined-guard skip), +54 lines of regression tests. No regressions in the diff.
— qwen3-coder via Qwen Code /review
|
Addressed the latest truncation review batch in What changed:
Validated locally:
|
|
Pushed additional review follow-ups in Changes added:
Validation:
One intentional behavior note: I kept structured |
What this PR does
Moves model-facing string tool-output truncation from the shell tool into
CoreToolScheduler, so any tool that returns stringllmContentcan be bounded before the result is recorded into conversation history.The PR intentionally keeps the scope narrow:
truncateToolOutput()helper and temp-file behavior;Part[]outputs on the existing path.Why it's needed
Unbounded tool results can overflow context tokens and make the session unable to continue. Shell output already had a truncation helper, but other string-returning tools could still place large outputs directly into model history.
Reviewer Test Plan
How to verify
Evidence (Before & After)
Before: string outputs from non-shell tools could enter model history without passing through the existing tool-output truncation helper.
After:
CoreToolSchedulerinvokestruncateToolOutput()for stringllmContentbefore converting the result into a function response. Tests cover large string output truncation, PostToolUse context placement after raw output truncation, shell long-run hint behavior after removing shell-local truncation, and non-stringPart[]passthrough.This is model-facing history behavior, so TUI screenshots are N/A.
Tested on
Environment (optional)
Local Windows/PowerShell checkout with repository npm workspaces. No tmux/TUI capture is included because the behavior is core scheduler/history logic rather than a visible TUI state.
Risk & Scope
Part[]output remains unchanged.ToolResultAPI changes, split-budget hook-context truncation, UI changes, and unrelated refactors.Linked Issues
Fixes #4049