refactor(core): route side-query LLM calls through runSideQuery chokepoint#3775
Conversation
…point Folds every one-shot side-query call site through a single `runSideQuery` entry point with `thinkingConfig.includeThoughts: false` and `fastModel` (falling back to main) as the default policy. Adds a text-mode sibling to the existing JSON-mode helper, plus a `BaseLlmClient.generateText` primitive that calls `ContentGenerator.generateContent` directly so side queries get neither user-memory wrapping nor the main-prompt fallback that `geminiClient.generateContent` applies. Migrated call sites: session title, recap, tool-use summary, /rename, follow-up suggestion (direct path), ACP rewrite, project /summary, arena approach summary, chat compression, web-fetch, insight analysis, subagent spec generation. Six call sites override the helper defaults explicitly (subagent gen, suggestion, ACP rewrite, /summary, compression, insight) where main-model quality or caller-supplied model matters. The /summary path additionally fixes a latent bug: text extraction previously did not strip thought parts, so on thinking models the saved `.qwen/PROJECT_SUMMARY.md` could leak `reasoning_content` into the file. The chokepoint now strips thought parts and the request itself goes out with thinking off. Best-effort cosmetic callers (recap, tool-use summary, kebab rename, suggestion) opt into `maxAttempts: 1` so transient outages don't burn seven retries on output the user will likely never see. `isInternalPromptId` recognises the `side-query:` prefix automatically so new call sites are filtered without per-site allowlist updates. Removes the `getAgentContentGenerator` workaround in `InProcessBackend` and the `getAgentSummaryGenerator` indirection in `ArenaManager` — arena approach summaries now run through the chokepoint against `fastModel`, giving every agent a neutral arbiter rather than a self-summary on its own model.
E2E test reportRun against fresh build at Per-test verdict
Web-fetch (C1) — headlineThe web-fetch summarization request landed on
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
[Critical] Typecheck fails in packages/cli/src/ui/hooks/useGeminiStream.test.tsx because two test events use ServerGeminiEventType.Text, but the server event enum defines Content, not Text. The failing locations are lines 2927 and 3233; both should use ServerGeminiEventType.Content.
— gpt-5.5 via Qwen Code /review
Reconciled main's per-agent runtimeView/registry-cleanup work in InProcessBackend with the side-query chokepoint refactor on this branch. Migrated two new chatCompressionService tests added on main to use the BaseLlmClient.generateText mock pattern that runSideQuery now goes through.
logToolCall calls isInternalPromptId(event.prompt_id), and tool-call events from useToolScheduler can carry an undefined prompt_id. The side-query refactor added promptId.startsWith(SIDE_QUERY_PROMPT_PREFIX) without a falsy guard, so the missing id crashed the logger and broke six useToolScheduler tests across all OS / Node matrix entries on CI.
|
@wenshao Fixed — all |
wenshao
left a comment
There was a problem hiding this comment.
测试覆盖缺失(无法映射到特定代码行):
-
[Critical]
maxAttempts选项在sideQuery.test.ts中完全未测试——此属性同时在SideQueryJsonOptions和SideQueryTextOptions上定义,并在generateJson/generateText调用中传递,但从未验证是否正确转发。 -
[Critical] 来自
generateJson/generateText的错误传播未测试——runSideQuery最关键的故障模式(API 失败、网络错误、中止)完全没有测试覆盖。 -
[Critical]
side-query:前缀在isInternalPromptId测试中缺失——测试覆盖了prompt_suggestion、forked_query等,但未覆盖side-query:next-speaker或side-query:chat-compression,而这些才是runSideQuery实际生成的 ID。
— DeepSeek/deepseek-v4-pro via Qwen Code /review
- Cap web-fetch, chat-compression, and ACP rewrite at maxAttempts: 1. These paths degrade gracefully on failure (tool error, NOOP fallback, null return), so 7 retries only delays the user-visible outcome. - /summary now carries the main session's system instruction so the summarizer keeps the coding-assistant role, project context, and user memory instead of summarizing the chat in isolation. - Add isInternalPromptId tests for the side-query: prefix so future callers minted via runSideQuery stay filtered out of recordings.
…n errors
- Add JSDoc on the model and config fields of SideQueryJsonOptions and
SideQueryTextOptions so the fastModel-first defaulting and the
thinkingConfig.includeThoughts: false default are visible at the API
surface, not buried in resolveDefaultModel / applyThinkingDefault.
- BaseLlmClient.generate{Json,Text} error wraps now include promptId
in the message and pass { cause: error }, so a side-query failure
identifies which call site failed and preserves the original stack.
- Add tests covering maxAttempts forwarding (present + omitted) and
rejection propagation for both JSON and text modes — the conditional
spread is non-trivial and was previously unverified.
|
BaseLlmClient was bound to the main session's ContentGenerator and only swapped the request `model` field, so side queries targeting a fast or alternate model inherited the main provider's baseUrl, credentials, and sampling settings — breaking cross-provider configurations. Move per-model generator/authType resolution out of GeminiClient and into BaseLlmClient as `resolveForModel`. Both generateJson and generateText now build a per-model ContentGenerator (with cache) when the request targets a non-main model and pass the resolved retry authType through to retryWithBackoff. GeminiClient.generateContent delegates to the same resolver so there is a single source of truth. Also pin the /forget destructive selector to the main model — the runSideQuery default moved to fast model in this branch, but /forget acts on the selection without confirmation, so a weaker fast model could silently delete the wrong managed-memory entries.
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] Test coverage gaps (cannot be mapped to specific diff lines)
Three test files are missing assertions for new behavior introduced by this refactor:
baseLlmClient.test.ts:generateTexthas no standalone tests (abort signal, error formatting with promptId, empty response, maxAttempts forwarding)suggestionGenerator.test.ts:generateViaBaseLlm(rewritten to userunSideQuerytext mode) has zero test coveragechatCompressionService.test.ts: tests mockgenerateTextbut never assert thatthinkingConfig,maxAttempts, ormodelare forwarded — thethinkingConfig: { includeThoughts: true }override is especially important to verify since it directly affects compression quality
— deepseek-v4-pro via Qwen Code /review
…-query-chokepoint # Conflicts: # packages/core/src/core/client.ts # packages/core/src/services/chatCompressionService.ts
…pression The compression caller of runSideQuery sets thinkingConfig.includeThoughts=true and maxAttempts=1. A future refactor that silently drops either would degrade compression quality without test failure; this assertion locks the contract.
Addressed the highest-value gap in 462d6c7: Deferring the other two:
|
…tor/unified-side-query-chokepoint # Conflicts: # packages/core/src/tools/syntheticOutput.test.ts
…point (#3775) * refactor(core): route side-query LLM calls through runSideQuery chokepoint Folds every one-shot side-query call site through a single `runSideQuery` entry point with `thinkingConfig.includeThoughts: false` and `fastModel` (falling back to main) as the default policy. Adds a text-mode sibling to the existing JSON-mode helper, plus a `BaseLlmClient.generateText` primitive that calls `ContentGenerator.generateContent` directly so side queries get neither user-memory wrapping nor the main-prompt fallback that `geminiClient.generateContent` applies. Migrated call sites: session title, recap, tool-use summary, /rename, follow-up suggestion (direct path), ACP rewrite, project /summary, arena approach summary, chat compression, web-fetch, insight analysis, subagent spec generation. Six call sites override the helper defaults explicitly (subagent gen, suggestion, ACP rewrite, /summary, compression, insight) where main-model quality or caller-supplied model matters. The /summary path additionally fixes a latent bug: text extraction previously did not strip thought parts, so on thinking models the saved `.qwen/PROJECT_SUMMARY.md` could leak `reasoning_content` into the file. The chokepoint now strips thought parts and the request itself goes out with thinking off. Best-effort cosmetic callers (recap, tool-use summary, kebab rename, suggestion) opt into `maxAttempts: 1` so transient outages don't burn seven retries on output the user will likely never see. `isInternalPromptId` recognises the `side-query:` prefix automatically so new call sites are filtered without per-site allowlist updates. Removes the `getAgentContentGenerator` workaround in `InProcessBackend` and the `getAgentSummaryGenerator` indirection in `ArenaManager` — arena approach summaries now run through the chokepoint against `fastModel`, giving every agent a neutral arbiter rather than a self-summary on its own model. * fix(core): guard isInternalPromptId against undefined prompt_id logToolCall calls isInternalPromptId(event.prompt_id), and tool-call events from useToolScheduler can carry an undefined prompt_id. The side-query refactor added promptId.startsWith(SIDE_QUERY_PROMPT_PREFIX) without a falsy guard, so the missing id crashed the logger and broke six useToolScheduler tests across all OS / Node matrix entries on CI. * fix(cli,core): polish runSideQuery callers from review feedback - Cap web-fetch, chat-compression, and ACP rewrite at maxAttempts: 1. These paths degrade gracefully on failure (tool error, NOOP fallback, null return), so 7 retries only delays the user-visible outcome. - /summary now carries the main session's system instruction so the summarizer keeps the coding-assistant role, project context, and user memory instead of summarizing the chat in isolation. - Add isInternalPromptId tests for the side-query: prefix so future callers minted via runSideQuery stay filtered out of recordings. * refactor(core): document runSideQuery defaults and surface promptId in errors - Add JSDoc on the model and config fields of SideQueryJsonOptions and SideQueryTextOptions so the fastModel-first defaulting and the thinkingConfig.includeThoughts: false default are visible at the API surface, not buried in resolveDefaultModel / applyThinkingDefault. - BaseLlmClient.generate{Json,Text} error wraps now include promptId in the message and pass { cause: error }, so a side-query failure identifies which call site failed and preserves the original stack. - Add tests covering maxAttempts forwarding (present + omitted) and rejection propagation for both JSON and text modes — the conditional spread is non-trivial and was previously unverified. * fix(core): preserve per-model provider routing in side queries BaseLlmClient was bound to the main session's ContentGenerator and only swapped the request `model` field, so side queries targeting a fast or alternate model inherited the main provider's baseUrl, credentials, and sampling settings — breaking cross-provider configurations. Move per-model generator/authType resolution out of GeminiClient and into BaseLlmClient as `resolveForModel`. Both generateJson and generateText now build a per-model ContentGenerator (with cache) when the request targets a non-main model and pass the resolved retry authType through to retryWithBackoff. GeminiClient.generateContent delegates to the same resolver so there is a single source of truth. Also pin the /forget destructive selector to the main model — the runSideQuery default moved to fast model in this branch, but /forget acts on the selection without confirmation, so a weaker fast model could silently delete the wrong managed-memory entries. * test(core): assert thinkingConfig/maxAttempts/model forwarding in compression The compression caller of runSideQuery sets thinkingConfig.includeThoughts=true and maxAttempts=1. A future refactor that silently drops either would degrade compression quality without test failure; this assertion locks the contract. * fix(cli): route dynamic localization through side query * refactor(core): remove unused memory governance review
Summary
This PR routes every stateless one-shot side-query LLM call through
runSideQuery, extending the helper with a text mode backed byBaseLlmClient.generateText. The chokepoint now owns the common side-query policy: fast model by default, hidden thinking off by default, shared prompt-id/model/abort plumbing, and no accidental main-turn system prompt or memory wrapping. Cache-aware forked queries stay onrunForkedAgentbecause they need cloned conversation history and prompt-cache reuse, not stateless side-query plumbing.The key review surface is the policy matrix below. It documents every
runSideQuerycaller's model choice, response shape, and thinking decision, including the deliberate exceptions that stay on the main/caller-selected model or opt into thinking.Side-query Policy Matrix
Grouped by model policy. Within each group, JSON-shaped side queries come first, then Text-shaped side queries. Shape is
JSONfor schema-validatedgenerateJsonresponses andTextfor plain textgenerateTextresponses. Thinking means hidden model thoughts requested viathinkingConfig.includeThoughts.Fast/default model callers
Unless a row says otherwise, Default means
config.getFastModel() ?? config.getModel()with thinking off. Fast-only means the caller skips instead of falling back to the main model.{ title }metadata.<recap>extraction discards anything else.Main/caller-pinned model callers
These pass an explicit model because fast-model quality, provider choice, or cache behavior matters more than the default.
systemPromptquality matters.runSideQuerypath when cache sharing is unavailable.rewriteModelor main/summaryAdjacent:
runForkedAgentNot part of this PR's
runSideQuerymigration, but worth keeping in view:runForkedAgentcovers side work that needs cloned conversation history, prompt-cache reuse, or multi-turn agent execution. Cache-path callers are single-turn and tool-free; agent-path callers are multi-turnAgentHeadlessruns with scoped tools.runSideQuery/btwValidation
npm run typecheck, then run the impacted unit tests, then exercise the E2E smoke flow (single tmux session: "Create a file hello.sh that prints 'hello world', then list the files" +/summary) and confirm.qwen/PROJECT_SUMMARY.mdcontains clean markdown (no reasoning preamble) and that web-fetch summarization lands on the main model in headless stats.fastModel,/summaryand web-fetch on main, no reasoning leaks into saved files.Scope / Risk
fastModelby default. If a future caller forgets to override and inheritsfastModelwhen they need main, output quality silently drops; the matrix above calls out every current main/caller-pinned exception and every fast-only local guard.InProcessBackend.getAgentContentGenerator). After this PR, all summaries run through the chokepoint againstfastModel. Trade-off discussed in the design doc — neutral arbiter and consistent style across agents, at the cost of model-voice summaries ("claude saying what claude did").InProcessBackend.getAgentContentGeneratormethod and theArenaManager.getAgentSummaryGeneratorindirection. No external consumers.Testing Matrix
Testing matrix notes:
Linked Issues / Bugs
Refs #3760.