Conversation
Manual test results (tmux, dev mode against real API)All three test-plan items verified end-to-end. Used Test A — deferred tool discoveryPrompt: "请用 ask_user_question 工具问我喜欢哪种颜色(红/蓝/绿)" Model trajectory:
Confirms: the model sees Test B —
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…hemas Large MCP deployments push the function-declaration list past 15K tokens per request. This change lets tools opt out of the initial declaration list via `shouldDefer`, and adds a new `ToolSearch` tool the model calls to fetch schemas on demand — either by exact name (`select:Name1,Name2`) or keyword search with name/description/searchHint scoring. - `DeclarativeTool` gains `shouldDefer`, `alwaysLoad`, `searchHint` opts. - MCP tools default to `shouldDefer=true`; lsp, cron_*, ask_user_question, and exit_plan_mode are flagged too. - `ToolRegistry.getFunctionDeclarations()` filters deferred tools by default; `revealDeferredTool()` re-includes them after ToolSearch loads their schemas. - `getCoreSystemPrompt` appends a "Deferred Tools" list (names + first line of description) so the model knows what's reachable. - Subagent wildcard inheritance keeps including deferred tools so existing `tools: ['*']` configs still see MCP schemas. - Resume-session support: `startChat` scans history for prior calls to deferred tools and re-reveals them so the API doesn't reject follow-up calls. `resetChat` clears the revealed set for a clean slate. - Skipped when ToolSearch is filtered out by the permission manager.
Registers a synthetic `structured_output` tool whose parameter schema IS the user-supplied JSON Schema. In headless mode (`qwen -p`), the first successful call terminates the session and exposes the validated payload via the result message's `structured_result` field. Invalid schemas are rejected at CLI parse time via a new strict Ajv compile helper so they can't silently no-op at runtime.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR introduces on-demand loading for deferred tool schemas via a new ToolSearch tool to reduce prompt/tool-declaration token usage, and adds a synthetic structured_output tool for --json-schema structured results in non-interactive CLI mode.
Changes:
- Add tool deferral controls (
shouldDefer,alwaysLoad,searchHint) and session “reveal” tracking to selectively include tool schemas in function declarations. - Implement
ToolSearchplus prompt updates to advertise deferred tools only when discovery is available. - Add
--json-schemasupport: strict schema compilation, syntheticstructured_outputtool registration, and structured result emission for headless runs.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/utils/schemaValidator.ts | Adds compileStrict() to fail fast on invalid JSON Schemas. |
| packages/core/src/utils/schemaValidator.test.ts | Adds test cases for compileStrict(). |
| packages/core/src/tools/tools.ts | Extends DeclarativeTool with deferral/search metadata flags. |
| packages/core/src/tools/tool-search.ts | Adds ToolSearch tool, scoring/tokenization helpers. |
| packages/core/src/tools/tool-search.test.ts | Comprehensive tests for ToolSearch modes/scoring/reveal behavior. |
| packages/core/src/tools/tool-registry.ts | Filters deferred tools by default; adds reveal tracking + deferred summary. |
| packages/core/src/tools/tool-registry.test.ts | Tests deferred filtering, includeDeferred, alwaysLoad, reveal, summary. |
| packages/core/src/tools/tool-names.ts | Adds tool_search and structured_output tool names/display names. |
| packages/core/src/tools/syntheticOutput.ts | Adds synthetic structured_output tool for JSON-schema final output. |
| packages/core/src/tools/syntheticOutput.test.ts | Tests schema passthrough + validation behavior for synthetic tool. |
| packages/core/src/tools/mcp-tool.ts | Marks MCP tools as deferred and adds server-name search hint. |
| packages/core/src/tools/lsp.ts | Marks LSP tool as deferred and adds search hints. |
| packages/core/src/tools/exitPlanMode.ts | Marks ExitPlanMode tool as deferred and adds search hints. |
| packages/core/src/tools/cron-list.ts | Marks cron-list tool as deferred and adds search hints. |
| packages/core/src/tools/cron-delete.ts | Marks cron-delete tool as deferred and adds search hints. |
| packages/core/src/tools/cron-create.ts | Marks cron-create tool as deferred and adds search hints. |
| packages/core/src/tools/askUserQuestion.ts | Marks ask-user-question tool as deferred and adds search hints. |
| packages/core/src/test-utils/mock-tool.ts | Extends MockTool to support new deferral/search flags in tests. |
| packages/core/src/index.ts | Exports ToolSearch + SyntheticOutput tool types. |
| packages/core/src/core/prompts.ts | Appends a “Deferred Tools” section to system prompts when applicable. |
| packages/core/src/core/client.ts | Warms tools before prompt build, re-reveals deferred tools on resume, clears on /clear. |
| packages/core/src/core/client.test.ts | Updates test stubs/expectations for new prompt argument + registry methods. |
| packages/core/src/config/config.ts | Adds jsonSchema to config + registers ToolSearch and synthetic output tool. |
| packages/core/src/agents/runtime/agent-core.ts | Ensures wildcard/all-tool inheritance includes deferred tools. |
| packages/cli/src/ui/commands/contextCommand.ts | Ensures token estimates include deferred tools for consistency. |
| packages/cli/src/nonInteractiveCli.ts | Implements structured-output termination behavior in headless mode. |
| packages/cli/src/nonInteractiveCli.test.ts | Updates config mock to include getJsonSchema(). |
| packages/cli/src/nonInteractive/io/BaseJsonOutputAdapter.ts | Adds structured_result and forces result to JSON string when structured. |
| packages/cli/src/nonInteractive/io/BaseJsonOutputAdapter.test.ts | Tests structured_result emission and back-compat omission when undefined. |
| packages/cli/src/config/jsonSchemaArg.test.ts | New tests for parsing/reading/validating --json-schema. |
| packages/cli/src/config/config.ts | Adds --json-schema arg parsing + strict Ajv compile-time validation. |
Comments suppressed due to low confidence (1)
packages/core/src/utils/schemaValidator.ts:1
compileStrict()claims the schema must be a JSON object, but the current guard allows arrays (typeof [] === 'object'). This yields less clear errors (Ajv compile message) instead of the intended descriptive message; update the guard to explicitly rejectArray.isArray(schema).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Resolves 2 #3589 review threads: - `max_results` schema: declared as unconstrained `number` but the implementation clamps to the integer range [1, 20]. Updated to `type: 'integer'` with `minimum: 1`, `maximum: HARD_MAX_RESULTS`, `default: DEFAULT_MAX_RESULTS` so the model gets accurate contract guidance and out-of-range values fail validation early instead of silently being clamped after a wasted call. - `execute()` signature now takes `_signal: AbortSignal` to match the base `ToolInvocation.execute` contract. The signal is unused today (the search is sync), but matching the shared signature avoids accidental divergence and makes future cancellation wiring trivial. Test: existing `enforces max_results cap` split into: - schema-rejection (`max_results: 100` → throws at build time) - clamp-on-in-range (`max_results: 20` capped on the candidate side) 21/21 tool-search.test.ts pass; tsc + ESLint clean.
wenshao
left a comment
There was a problem hiding this comment.
测试覆盖缺口(无法映射到具体行)
以下关键路径缺少测试,建议补充:
packages/cli/src/nonInteractiveCli.ts—structured_output成功路径(模型调用 structured_output → emitResult 带 structuredResult → 提前返回)无测试覆盖。packages/cli/src/nonInteractiveCli.ts—--json-schema纯文本错误路径(模型产生纯文本而非调用 structured_output →process.exitCode=1)无测试覆盖。packages/core/src/core/client.ts—startChat中延迟工具恢复扫描(遍历历史重新暴露已用延迟工具)无测试覆盖。packages/core/src/core/client.ts—deferredTools传给系统提示的条件逻辑无测试覆盖。
…tools Closes 3 #3589 review threads: - Critical: `setTools()` failure during reveal was silently swallowed via `debugLogger.warn` (off in production). Schemas appeared in `llmContent` so the model thought the tools were callable, but the chat's declaration list never updated — the next call surfaced as an "unknown tool" API error, leaving the session in an unrecoverable degraded state. Now returns a proper `ToolResult.error` with the concrete failure reason and instructions to retry; schemas are withheld from `llmContent` so the model doesn't act on a non-ready tool. - Critical: `collectCandidates` returned every deferred tool that matched `shouldDefer && !alwaysLoad` regardless of whether ToolSearch had already revealed it earlier in the session. Already-revealed tools are in the model's declaration list, so re-surfacing them in later keyword searches wasted tokens and risked the model retrying a load it already had. Filter now also skips tools where `registry.isDeferredToolRevealed(name) === true`. `select:<name>` mode is unaffected (the model may legitimately want to re-inspect the schema of a loaded tool). - Suggestion: `--json-schema` plain-text terminal path set `process.exitCode = 1` and emitted `isError: true` to the JSON adapter, but TEXT-mode users only saw a silent exit-code-1 with no visible context (`emitResult` is a no-op for the TEXT-mode error case). Echo the full `'Model produced plain text instead of calling the structured_output tool as required by --json-schema.'` line to stderr so headless runs are debuggable without scraping `--output-format json`. Tests: 2 new in `tool-search.test.ts`: - `keyword search excludes already-revealed deferred tools`: pins the dedupe behavior across two consecutive searches. - `returns an error result when setTools() throws`: pins that failures don't expose schemas as "ready" and the agent gets the underlying message in `error.message`. 23/23 tool-search.test.ts pass; tsc + ESLint clean. DEFERRED to follow-up PRs (replied on threads): - Critical: structured_output + side-effect-tool race in same turn — needs a pre-scan + synthesized "skipped" tool_results, design overlaps with #3598 PR-2's existing skippedOutput pattern. - Suggestion: `+` prefix parsing edge cases (C++, `+ slack`). - Suggestion: `instanceof DiscoveredMCPTool` hard couple — needs a type tag on AnyDeclarativeTool, broader API surface change. - Suggestion: SyntheticOutputTool registered in interactive mode. - Suggestion: resume scan O(history × parts) early-exit. - Suggestion: deferredToolsSection cap.
wenshao
left a comment
There was a problem hiding this comment.
[Critical] --json-schema cannot work in bare mode because createToolRegistry() returns from the bare-mode branch before registering the synthetic structured_output tool. The CLI accepts the flag in headless mode, but with bare mode the model never receives the only tool that can satisfy the structured-output contract. Register structured_output before the bare-mode early return, or reject --json-schema with bare mode during argument/config validation.
— gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Reviewed the full diff (+1572/-13, 31 files). The deferred-tool mechanism and ToolSearch are well-designed — scoring, select/keyword modes, reveal lifecycle, resume support, and the structured_output flow all check out. No Critical issues found. A few suggestions below.
The two non-interactive exit paths in `main()` hardcoded `process.exit(0)` after `runNonInteractive` / `runNonInteractiveStreamJson` returned. This silently overwrote any `process.exitCode = 1` set inside the run — most visibly the `--json-schema` plain-text contract: the JSON adapter emits `isError: true` and stderr gets the explanation, but the shell saw exit code 0 and assumed success. Replace the hardcoded 0 with `process.exit(process.exitCode ?? 0)` on both paths so non-zero exits propagate. The success case is unchanged (exitCode is undefined → exits 0).
Closes review-flagged coverage gaps for #3589: `json-schema.test.ts` (6 cases) covers the headless structured-output contract end-to-end: - structured_result emits when the model fills the schema (success path) - @path/to/schema.json file-load works - parse-time validation rejects invalid JSON, invalid JSON Schema, and missing files (no LLM, fast) - plain-text path: when structured_output is not callable (`--exclude-tools structured_output`), the run exits 1 with `is_error: true` and the contract error message — locks in the exit-code fix from the prior commit. `tool-search.test.ts` (3 cases) covers the deferred-tool flow: - select:<name> reveals a tool and the model can invoke it in the same turn (asserts call order so a missed reveal would surface as an unknown-tool API error instead of a silent pass) - keyword query (no select: prefix) hits the tool_search tool - feature-flag-off: with experimental.cron disabled, cron tools are never registered and never appear in tool calls LLM-dependent tests use the cron tools as a deterministic deferred target (gated by experimental.cron, no MCP server required).
Closes 3 #3589 review threads: - Schemas like `{"type":"string"}` and `{"type":"array"}` compiled fine (they're valid JSON Schemas in isolation), but the `--json-schema` value becomes the synthetic structured_output tool's parameter schema and tool-call arguments are object-shaped. Reject any non-undefined top-level type that is not "object" so the user sees the contract violation at parse time, not as an unrecoverable runtime mismatch. - `SchemaValidator.compileStrict` accepted arrays since `typeof [] === 'object'` — Ajv would later emit a confusing error. Add an explicit `Array.isArray` guard so the contract stated by the function name is honored at the boundary. - `compileStrict` shared the project-wide Ajv instances configured with `strictSchema: false` (intentionally lenient so MCP servers can ship custom keywords without breaking runtime validation). That leniency is wrong for the `--json-schema` surface — typos like `propertees` were silently ignored. Compile inside a dedicated `strict: true` Ajv so user-supplied schemas surface mistakes immediately. Tests: - jsonSchemaArg: rejects non-object top-level type ("string"/"array"). - schemaValidator.compileStrict: rejects arrays; flags unknown keywords (typos) under strict mode.
Closes 1 #3589 review thread. `loadAndReturnSchemas` revealed each requested tool BEFORE calling `setTools()` because `getFunctionDeclarations()` filters by the revealedDeferred set — the reveal has to be in place when setTools() rebuilds the chat's declaration list. But if setTools() throws (e.g. chat not yet initialised), the registry was left holding orphaned reveals: the tool was marked "revealed" while the API never received its schema. Subsequent keyword searches would then exclude that tool from candidates (per `collectCandidates`'s isDeferredToolRevealed filter), making it unreachable until `/clear`. Track the names this call NEWLY revealed (skipping tools that were already revealed by an earlier ToolSearch in the same session) and unreveal them on setTools() failure. Added `unrevealDeferredTool` to the registry as the one-tool inverse of `revealDeferredTool`; `clearRevealedDeferredTools` is unchanged and still wipes the whole set on `/clear`. Test: extends the existing `setTools() throws` test to also assert that (a) the failed call's reveal is rolled back and (b) a tool revealed by an earlier call stays revealed (not whole-set wiped).
Closes one of the test-coverage gaps in #3589 reviews (gpt-5.5 review S8). Adds two deterministic L1 unit tests in nonInteractiveCli.test.ts that mock the LLM at sendMessageStream — no model API hit, no flake, ~10ms total. 1. structured_output success path: model fires the synthetic tool once, runtime sets structuredSubmission, aborts background tasks, and emitResult fires exactly once with `structuredResult` matching the model's args. No follow-up turn is issued (single-shot contract). 2. plain-text error path under --json-schema: model emits text only; runtime sets process.exitCode=1, writes the contract-violation line to stderr, and emits an isError result with the canonical "Model produced plain text" message. Both tests inject a stub adapter via runNonInteractive's `options.adapter` hook, so they assert against direct emitResult calls instead of parsing JSON stdout. process.exitCode is snapshot/restored to keep the test hermetic. The L2 integration tests in integration-tests/cli/json-schema.test.ts remain as smoke coverage against a real model.
…esized Closes 1 #3589 review thread (Copilot). The pre-scan comment claimed siblings receive a "synthesized 'skipped' tool_result" after structured_output succeeds. The implementation actually breaks out of the loop without emitting any tool_result for the skipped calls. The transcript is missing the function_response entries for them, but the session terminates via emitResult immediately so no follow-up API call ever sees the mismatch — the missing entries are harmless in the single-shot contract. Update the comment to describe what the code actually does. The existing tests already pin the contract (no executeToolCall for the skipped tool, no emitToolResult for its callId).
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: self-PR. — gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
[Critical] --json-schema + --bare 静默损坏:createToolRegistry() 在 bare 模式下仅注册 ReadFile/Edit/Shell 后提前返回(packages/core/src/config/config.ts:2884),structured_output 工具在后续才注册。用户看到误导性错误 "Model produced plain text instead of calling structured_output"——模型从未见到该工具。建议在 bare 模式提前返回前注册 structured_output,或在 CLI .check() 中拒绝 --json-schema + --bare 组合。
[Critical] 5 处 TypeScript 类型错误:config.test.ts:1693,1698,1711,1725(4 处 possibly 'undefined')+ nonInteractiveCli.test.ts:2775(类型转换错误 TS2352)。tsc --noEmit 报错。建议添加显式断言或类型保护。
— deepseek-v4-pro via Qwen Code /review
…uplicate stderr Closes 3 #3589 review threads (Copilot + deepseek-v4-pro): 1. ToolSearch was calling `revealDeferredTool` AND triggering `setTools()` for every tool that `select:` resolved, including non-deferred / `alwaysLoad` tools (the model is allowed to use `select:` to re-inspect any tool's schema, including core ones). That polluted `revealedDeferred` with names that aren't deferred AND could fail with `GeminiClient not initialised` for what is purely a schema-inspection call. Gate both reveal and the setTools() trigger on `tool.shouldDefer && !tool.alwaysLoad`, and only call setTools() when this call newly revealed at least one deferred tool. 2. The `--json-schema` plain-text fallback wrote the error message to stderr via `writeStderrLine(...)` AFTER calling `adapter.emitResult({isError:true,...})`. The JsonOutputAdapter already writes `errorMessage` to stderr in TEXT-mode isError responses (see JsonOutputAdapter.ts:68-73), so the extra line produced two copies of the same message in headless TEXT runs. The comment claiming `emitResult` was a no-op in TEXT mode was wrong. Remove the duplicate write and the unused `writeStderrLine` import; let the adapter own per-format surfacing. 3. agent-core's wildcard-subagent path uses `getFunctionDeclarations({ includeDeferred: true })` so subagents inherit MCP / lsp / cron_* tools, but no test exercised it — the existing mocks returned `getFunctionDeclarations: () => []` and `tools: ['*']` was never asserted. A refactor that silently dropped `includeDeferred` would break existing wildcard subagent configs without warning. Add three cases: - tools:["*"] inherits deferred tools (asserts the call args passed to getFunctionDeclarations). - absent toolConfig also takes the wildcard path. - explicit tools list does NOT use the wildcard branch (uses getFunctionDeclarationsFiltered instead). Tests: - tool-search: select: a non-deferred tool does not reveal + does not call setTools. Same for alwaysLoad tools. - nonInteractiveCli: existing plain-text test no longer asserts on a stderr `qwen --json-schema:` line; the adapter is responsible for that surfacing per format. - agent-core: 3 new prepareTools cases as described above.
…clarations Closes 1 #3589 review thread (deepseek-v4-pro): the `{ includeDeferred: true }` arg in `collectContextData` is what keeps the "all tools" token estimate aligned with the per-tool breakdown (which iterates `getAllTools()` unfiltered). If a refactor silently dropped the option, `displayBuiltinTools` (clamped via `Math.max(0, …)`) would collapse to 0 — visible in `/context detail` but not caught by anything. New focused test stands up minimal Config / ToolRegistry mocks, calls the exported `collectContextData(...)`, and asserts the spy on `getFunctionDeclarations` was invoked exactly once with `{ includeDeferred: true }`. The token-math itself is not a target of this test (it's covered by the visible UI); the contract being pinned is the call argument.
Closes 1 #3589 review thread (deepseek-v4-pro): previously the `ensureTool()` and `setTools()` failure paths only logged via `debugLogger.warn`, which is a no-op when DEBUG is unset (the production default). Operators running headless against a freshly- initialised session would see opaque "missing" entries or `setTools failed` ToolResult errors with no upstream diagnosis. Mirror each `debugLogger.warn` with a `process.stderr.write` so the underlying cause (factory throw, chat-not-initialised, network) is visible in the run's stderr stream regardless of DEBUG. Used `process.stderr.write` directly rather than `console.warn` because the core package's eslint config bans `console.*` in src and there is no shared cross-package "operator-visible logger" yet (filing that as a separate follow-up — `core` and `cli` would both benefit). The `[ToolSearch]` prefix tags the source so multi-source headless logs can grep cleanly. The existing tests don't spy on stderr so no test changes were required; the new writes show up only on real failure paths.
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Wave-4 closes prior minors cleanly: the pre-scan reorder branch now has its missing test (a [write_file, structured_output] batch that genuinely exercises the structIdx > 0 hoist), the misleading "synthesized 'skipped' tool_result" comment is fixed, and scoping reveal/setTools to genuinely-deferred tools is a tight cleanup. The ToolSearch description rewrite — "callable next turn" — also avoids a class of model confusion.
Preference: split this PR
ToolSearch is ready to land on its own. The four open criticals all sit on the --json-schema side and share a shape — every one is a place where the structured_output single-shot terminal contract isn't honored end-to-end. They're tractable in isolation and a much cleaner follow-up than gating ToolSearch on them. Strongly preferred path: ship ToolSearch now (this PR, scope reduced) and address --json-schema criticals in a focused follow-up. Approving as-is since none of the four block ToolSearch and the carryover minors are cosmetic, but a split would be the better outcome.
For the record:
1. Drain-path structured_output discarded (severity: medium · confidence: very high)
If the model returns structured_output(...) from inside the drain loop (e.g. a cron firing mid---json-schema run), the synthetic tool's result is treated as an ordinary tool result, the drain continues, and the post-drain branch fires the plain-text fallback — exit code 1, structured payload lost. The terminal contract only exists in the main turn's loop.
2. Early return drops monitor finalize + queued notifications (severity: medium · confidence: very high)
The successful-structured_output early return only aborts the registry and emits the result — it skips the queued-notification flush and one-shot monitor finalization that the success and cancel paths run. The finally then aborts monitors with notify: false, so any monitor active earlier in the session emits task_started without its paired task_notification.
3. --bare plus --json-schema makes the contract unfulfillable (severity: medium · confidence: very high)
The bare-mode branch returns early before the structured_output registration runs, so qwen -p "..." --json-schema '<schema>' --bare leaves the synthetic tool unregistered. Plain-text fallback fires every time, exit 1, payload discarded. Either move the registration ahead of the bare-mode early return, or reject the flag combination at parse time.
4. Subagents call structured_output and silently consume the payload (severity: medium · confidence: high)
structured_output is alwaysLoad: true and not in the subagent exclusion list, so dispatched subagents see it and — following its description — will call it. The subagent's tool scheduler runs it, consumes the "session will end" result as an ordinary tool result, and the parent never receives the structured payload. Add structured_output to the subagent exclusion list.
Verdict
APPROVE — ToolSearch is ready. Splitting --json-schema into a follow-up PR is the preferred path; flagging the four criticals here for that follow-up's scope.
Summary
Two independent features sharing the same branch:
ToolSearchfor on-demand deferred tool schemas — shrinks thedefault tool-declaration list by hiding MCP tools and a few
low-frequency built-ins behind a discovery tool. A typical 39-tool
setup previously spent ~15K tokens per request on declarations.
--json-schemaheadless structured output — registers asynthetic
structured_outputtool whose parameter schema IS theuser-supplied JSON Schema. In
qwen -pmode the first valid callterminates the session and exposes the validated payload via the
result message's
structured_resultfield.Feature 1 — ToolSearch / deferred tools
Mechanism
DeclarativeToolbase class gains three optional flags:shouldDefer— hide from initial function-declaration listalwaysLoad— always include even when deferred is the defaultsearchHint— optional keywords for fuzzy-match scoringToolRegistry.getFunctionDeclarations()filters out deferred tools bydefault. New
{ includeDeferred: true }opts back in.revealDeferredTool(name)/unrevealDeferredTool(name)/isDeferredToolRevealed(name)/clearRevealedDeferredTools()track per-session "revealed" state.
getDeferredToolSummary()returns a compact{name, description}list for the system-prompt section.
getCoreSystemPrompt/getCustomSystemPromptaccept an optionaldeferredToolsargument and append a "Deferred Tools" section.ToolSearchsupports two query modes:select:Name1,Name2— exact lookup (case-insensitive, deduped,capped by
max_results)+must-wordrequired-term prefixmax_results: integer,[1, HARD_MAX_RESULTS=20], default 5; out-of-range values fail at
tool.build()time, in-range values are clampedinternally as a defense.
10/5/4/2onexact-name/substring/searchHint/description; MCP
12/6onexact-name/substring to bias toward surfacing MCP.
ensureTool, marks itrevealed, and calls
client.setTools()so the next API requestincludes the schema. If
setTools()throws, the call's reveals arerolled back (keeps the registry consistent with the chat's
declaration list) and a
ToolResult.erroris returned so the agentcan choose to retry — no silent debug-only swallow.
Default deferred tools
them, hence the scoring boost above)
lsp,cron_create,cron_list,cron_delete,ask_user_question,exit_plan_modeEdge cases covered
tools: ['*']) inheritance keepsincludeDeferred: trueso existing configs that relied on MCP via wildcard still work.resetChat(/clear) clears revealed state for a clean slate;compression path (
startChat(newHistory)) preserves it so cross-turnrevealed tools survive compaction.
startChatscans history for prior function calls todeferred tools and re-reveals them before
setTools()runs, so theAPI doesn't reject follow-up calls when resuming a transcript.
ToolSearchis filtered out by the permission manager, thesystem prompt omits the "Deferred Tools" section to avoid advertising
an unreachable tool.
(
collectCandidatesfilter) so the model doesn't waste a lookupcall on schemas it already has.
select:mode is unaffected so themodel can still re-inspect a loaded tool's schema.
contextCommandand agent wildcard inheritance use{ includeDeferred: true }so token accounting and inheritancesemantics stay consistent with pre-change behavior.
Feature 2 —
--json-schemaheadless structured outputCLI
Mechanism
resolveJsonSchemaArg(CLI parse time): reads inline JSON or@path,parses it, runs
SchemaValidator.compileStrict(a dedicated Ajvconfigured with
strictSchema: trueto catch unknown-keyword typoslike
propertees, plusallowUnionTypes: truefor spec-validtype: ["a","b"]unions, and explicit opt-outs ofstrictRequired/
strictTypes/validateFormatsso spec-valid shapes such as{type:'object', required:['x']}and customformatvalues aren'trejected as lint failures), and rejects any non-object root
(top-level
typemust be"object"or a type-array containing"object"; absenttypeis allowed for{}/ composition cases).Bad schemas fail at parse time so they cannot silently no-op at
runtime.
SyntheticOutputTool(registered only whenjsonSchemais set):alwaysLoad: true,shouldDefer: false. Its parameter schema IS theuser-supplied schema, so
BaseDeclarativeTool.validateToolParams(Ajv) gates the model's args before
execute()runs. Validationerrors flow back to the model, which can retry with corrected fields.
runNonInteractive: when the synthetic tool's first successful callis detected, the loop aborts background tasks and emits a result
message with
structured_result: <model-supplied payload>andresult: JSON.stringify(payload).structured_output, the runtime setsprocess.exitCode = 1, writesqwen --json-schema: Model produced plain text…to stderr, and emitsan error result. The exit code now actually propagates to the shell
(the headless
main()was hardcodingprocess.exit(0)regardless).Review-driven hardening
a17a357max_resultsschema declared as integer w/ min/max/default;execute()matches the AbortSignal contract1fa1d75setTools()failure surfaces as a tool error; keyword search excludes already-revealed toolse8712ebmain()honorsprocess.exitCodeinstead of hardcodingprocess.exit(0)38726567--json-schemarejects non-object roots and arrays;compileStrictswitched from the project-wide lenient Ajv to a dedicated instance withstrictSchema: trueso unknown-keyword typos surface (later refined in9c031da0— see below)9c031da0compileStrictswaps the broadstrict: true(which also enabledstrictRequired/strictTypes/validateFormats) for the targetedstrictSchema: trueplus explicit opt-outs, so spec-valid schemas like{type:'object', required:['x']}and custom-format fields aren't rejected as lint failures6153efbsetTools()failure rolls back this call's reveals so the registry stays consistent with the chat's declaration list210ec2bcompileStrictaccepts spec-valid type unions (["string","number"],["object","null"]) viaallowUnionTypes: true; CLI guard treats type-arrays containing"object"as object-allowed7f235c2select:mode now also honorsmax_resultssoselect:a,b,c,…cannot return unbounded schemasTests
L1 unit (deterministic, mocked LLM, ~ms each)
tool-search.test.ts(25 cases): tokenize, scoreTool weights, bothquery modes,
max_resultsschema rejection / cap (keyword ANDselect),
+must-wordfilter, reveal propagation, empty query,setTools()throws → error result + reveal rollback.tool-registry.test.ts(5 new cases): filter,includeDeferred,alwaysLoad, reveal, summary sorting +clearRevealedDeferredTools.schemaValidator.test.ts(38 cases including 6 new):compileStrictrejects null/undefined/string/array; flags unknown keywords (typos);
accepts type-union arrays.
jsonSchemaArg.test.ts(13 cases): inline +@path+ parse errors +array root rejection + non-object type rejection + type-array with
"object"accepted.syntheticOutput.test.ts,nonInteractiveCli.test.ts(2 new):registration; structured_output success path emits
structuredResultand stops in one turn; plain-text fallback sets
process.exitCode=1and writes the contract violation to stderr.
L2 integration (real CLI binary, real LLM, smoke)
integration-tests/cli/json-schema.test.ts(6 cases): inlineschema →
structured_result.answer;@pathschema; parse-timefailures (invalid JSON / invalid Schema / missing file);
--exclude-tools structured_outputforces the plain-text path andasserts
is_error: true+exitCode: 1.integration-tests/cli/tool-search.test.ts(3 cases):select:→invoke (asserts call order so a missed reveal would surface as an
unknown-tool API error); keyword search; feature-flag-off negative.
Test plan
tsc --noEmitonpackages/coreandpackages/clivitest runonpackages/coreandpackages/cliToolSearchis invoked whenthe model needs an MCP tool, and the schema appears in the next
request
/clearand verify deferred list is re-announced withoutstale revealed state
qwen --resumea session that used a deferred tool,verify the tool is callable without calling
ToolSearchagainDeferred to follow-up PRs (replied on threads)
(needs pre-scan + synthesized "skipped" tool_results — design
overlaps with feat(cli): add --json-schema for structured output in headless mode #3598)
+prefix parsing edge cases (C++,+ slack)instanceof DiscoveredMCPToolhard couple (replace with type tag)exclusivity but interactive mode doesn't honor it)
revealedDeferrednot pruned on MCP server disconnect--json-schema(anyOf/oneOf/notcombinators that forbid object root)
--json-schema @pathstructured_resultfield on canonicalCLIResultMessageSuccesstypeDeclarativeTool10-positional-param constructor → options objectre-revealing