feat(cli): add structured JSON schema output#4001
Conversation
Add non-interactive structured output support driven by --json-schema, including schema validation, terminal structured_output tool handling, retry behavior, JSON/stream output metadata, and SDK protocol types. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a non-interactive structured output mode where callers supply a JSON Schema and the run terminates by returning a validated JSON object (rather than free-form final text), enabling automation-friendly outputs across CLI + SDK protocols.
Changes:
- Added a reserved internal
structured_outputtool that validates its arguments against a caller-provided JSON Schema and returns a terminal payload. - Wired structured output configuration through CLI argument parsing, core config/tool registration, permission flow, and non-interactive retry/termination behavior.
- Extended JSON/stream-json adapters and SDK protocol types (TS/Python/Java) to include
structured_outputand a new error subtype for retry exhaustion.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/sdk-typescript/src/types/protocol.ts | Adds structured_output to success payload and new error subtype to the TS SDK protocol. |
| packages/sdk-python/src/qwen_code_sdk/protocol.py | Adds structured_output and new error subtype to the Python SDK protocol TypedDicts. |
| packages/sdk-java/qwencode/src/main/java/com/alibaba/qwen/code/cli/protocol/protocol.ts | Mirrors TS SDK protocol updates in the Java SDK’s TS protocol definition. |
| packages/sdk-java/qwencode/src/main/java/com/alibaba/qwen/code/cli/protocol/message/SDKResultMessage.java | Adds structured_output field + accessors for Java SDK result message JSON serialization. |
| packages/core/src/utils/schemaValidator.ts | Adds strict schema compilation + strict validation helpers for structured output validation paths. |
| packages/core/src/tools/tools.ts | Introduces terminalResult on ToolResult and improves $ref cycle detection (JSON Pointer escaping + root # ref). |
| packages/core/src/tools/tools.test.ts | Adds coverage for root # cycles and escaped JSON Pointer cycle detection. |
| packages/core/src/tools/tool-names.ts | Adds STRUCTURED_OUTPUT to tool name/display name constants. |
| packages/core/src/tools/structured-output.ts | New terminal tool implementation that validates params against the provided JSON Schema. |
| packages/core/src/tools/structured-output.test.ts | Tests structured output tool terminal payload emission and strict validation behavior. |
| packages/core/src/index.ts | Exposes structured-output tool types from core package entrypoint. |
| packages/core/src/core/turn.ts | Propagates terminalResult through tool call response info. |
| packages/core/src/core/permissionFlow.ts | Auto-allows the reserved structured output tool when structured output mode is enabled. |
| packages/core/src/core/permissionFlow.test.ts | Adds tests for permission bypass behavior when structured output is enabled/disabled. |
| packages/core/src/core/coreToolScheduler.ts | Passes terminalResult through tool call success responses. |
| packages/core/src/config/config.ts | Adds structured output config, appends system prompt instructions, and registers the reserved tool. |
| packages/cli/src/nonInteractiveCli.ts | Adds structured-output retry/termination handling and emits structured results via adapters. |
| packages/cli/src/nonInteractiveCli.test.ts | Adds tests for structured output success, mixed tool-call rejection, and slash command rejection. |
| packages/cli/src/nonInteractive/types.ts | Extends CLI result message types with structured_output and new error subtype. |
| packages/cli/src/nonInteractive/io/StreamJsonOutputAdapter.ts | Adds emitStructuredResult for stream-json mode result emission. |
| packages/cli/src/nonInteractive/io/JsonOutputAdapter.ts | Adds emitStructuredResult and raw JSON stdout behavior in text mode for structured output. |
| packages/cli/src/nonInteractive/io/JsonOutputAdapter.test.ts | Adds tests validating structured output emission in text and JSON modes. |
| packages/cli/src/nonInteractive/io/BaseJsonOutputAdapter.ts | Adds StructuredResultOptions and includes structured_output in success result messages. |
| packages/cli/src/config/config.ts | Adds --json-schema / retry CLI args plus schema loading/validation (object-root only, no remote refs/cycles). |
| packages/cli/src/config/config.test.ts | Adds tests for CLI arg parsing + schema acceptance/rejection rules. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Move the nonInteractiveHelpers import above runtime statements so static imports remain grouped together. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
Heads up: there's an overlapping implementation of the same feature in #3598 (open since 2026-04-24). Same shape — a Notable differences I can spot from the file lists:
Looks like the load-bearing cli/core overlap is ~70-80% the same; #4001's SDK-protocol coverage is the biggest piece #3598 doesn't have, and #3598's redaction + parse-time edge-case gating + MCP-shadow guard + multi-round review history are pieces #4001 doesn't appear to have yet (happy to be wrong — pointer-comparison is shallow without reading the diff side-by-side). A few options for the maintainers:
No urgency from my side — just flagging before the parallel work compounds. cc maintainers. |
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Thanks for putting this together — the IO adapter refactor (BaseJsonOutputAdapter / JsonOutputAdapter / StreamJsonOutputAdapter) and the matching SDK protocol additions across Java / Python / TypeScript are well-structured work.
Heads-up though: there's significant overlap with #3598 (feat(cli): add --json-schema for structured output in headless mode), which has been approved and is awaiting merge. Both PRs add a startup JSON-schema flag to non-interactive mode, a schema validator, and a reserved structured-output tool — the headless feature surface is essentially the same. The implementations diverge in approach: #3598 wires the synthetic tool through geminiChat / agent / permission-manager / telemetry, while this PR refactors the non-interactive IO layer and extends the SDK protocols.
Given #3598 is further along in review, the cleanest path is probably to let it land first and then open a follow-up PR on top of it covering the parts of this work that aren't subsumed — the IO adapter refactor and the SDK protocol types in particular are independently valuable and would slot in cleanly on top of #3598's tool plumbing. Up to you how you'd like to handle it — happy to look at a rebased version once #3598 merges.
Verdict
COMMENT.
|
Heads up: #3598 was merged at 84ecb5b (the What's NOT in #3598 that this PR has:
Suggested paths:
Sorry for the parallel work — flagging this earlier (when I posted the cross-link on 2026-05-09) would have saved both sides some iteration time. cc @doudouOUC + maintainers. |
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] VALIDATION_RETRY_LOOP_THRESHOLD = 3 in coreToolScheduler.ts:543 silently caps user-configured --structured-output-max-retries to 3 effective retries. When the model produces the same validation error 3 times, buildInvocation() injects RETRY_LOOP_STOP_DIRECTIVE before the structured output retry logic exhausts its budget. Consider skipping loop detection for ToolNames.STRUCTURED_OUTPUT, or using config.getStructuredOutput()?.maxRetries as the threshold for this tool.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| return Object.values(value).some(hasRemoteRef); | ||
| } | ||
|
|
||
| function decodeJsonPointerSegment(segment: string): string { |
There was a problem hiding this comment.
[Suggestion] decodeJsonPointerSegment is duplicated verbatim here and in packages/core/src/tools/tools.ts:428 (hasCycleInSchema). Same for the JSON Pointer resolution logic (resolveLocalRef vs resolveRef). Two parallel implementations risk drift — a fix for escape sequence handling or edge cases in one location won't propagate to the other.
Consider extracting a shared resolveJsonPointer(root, ref) helper into packages/core/src/utils/jsonPointer.ts.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| return validator.errorsText(validate.errors, { dataVar }); | ||
| } | ||
|
|
||
| static validateStrict( |
There was a problem hiding this comment.
[Suggestion] SchemaValidator.validateStrict was added as a public static method but is never called anywhere in this PR. All callers use compileStrict + errorsText directly. Dead code causes maintenance confusion — readers can't tell if it's safe to delete.
Consider either removing it or wiring it into StructuredOutputTool.build() as the validation entry point.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| let modelOverride: string | undefined; | ||
| const structuredOutputConfig = config.getStructuredOutput(); | ||
| let structuredOutputRetryCount = 0; | ||
| const noteStructuredOutputRetry = () => { |
There was a problem hiding this comment.
[Suggestion] noteStructuredOutputRetry() has zero observability. It increments a counter and may throw StructuredOutputMaxRetriesError, but never logs: the current retry count, what the model returned, or why validation failed. This is the feature's most likely production failure mode — debugging a retry exhaustion error at 3 AM would require raw API traces with no intermediate diagnostics.
Consider adding debugLogger.warn logging the retry count and trigger reason (e.g., "model returned text instead of tool call", "validation error: ", "tool called alongside other tools").
— DeepSeek/deepseek-v4-pro via Qwen Code /review
| } else { | ||
| debugLogger.debug('Parsing JSON schema as inline JSON'); | ||
| parsed = JSON.parse(jsonSchemaArg); | ||
| } |
There was a problem hiding this comment.
[Suggestion] stripJsonComments is applied to file-sourced schemas (line 1103) but NOT to inline JSON (line 1107). --json-schema '{"type":"object" /*comment*/}' fails, while the same content in a file succeeds. Apply stripJsonComments to both sources for consistent behavior.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
User doc:
- explicit stdout-vs-stderr contract and `{}`-schema behavior.
- 500 ms shutdown-holdback latency note.
- ReDoS warning for user-supplied `pattern` keywords.
- root `$ref` rejection + `allOf` workaround.
- per-retry token cost note.
- sibling-suppression success vs retry paths split out.
- numeric exit codes (1 / 53 / 130) for every failure mode.
- new "Session resumption" section for --continue / --resume.
Design doc:
- gloss the ToolSearch on-demand-loading reference.
- `not` row: drop the array-indexing-lookalike `[…]`.
- 500 ms holdback is best-effort, not guaranteed.
- redaction rationale extends to validation-failure retries.
- `CORE_TOOLS` phrasing: structured_output is excluded FROM the set;
skill is in a separate dynamically-discovered category.
- subagent suppression maintainer note (single brittle call path).
- `--bare` parenthetical lists the three retained core tools.
- PR #4001 status (closed 2026-05-11, superseded).
User doc:
- explicit stdout-vs-stderr contract and `{}`-schema behavior.
- 500 ms shutdown-holdback latency note.
- ReDoS warning for user-supplied `pattern` keywords.
- root `$ref` rejection + `allOf` workaround.
- per-retry token cost note.
- sibling-suppression success vs retry paths split out.
- numeric exit codes (1 / 53 / 130) for every failure mode.
- new "Session resumption" section for --continue / --resume.
Design doc:
- gloss the ToolSearch on-demand-loading reference.
- `not` row: drop the array-indexing-lookalike `[…]`.
- 500 ms holdback is best-effort, not guaranteed.
- redaction rationale extends to validation-failure retries.
- `CORE_TOOLS` phrasing: structured_output is excluded FROM the set;
skill is in a separate dynamically-discovered category.
- subagent suppression maintainer note (single brittle call path).
- `--bare` parenthetical lists the three retained core tools.
- PR #4001 status (closed 2026-05-11, superseded).
* docs: add user + design docs for --json-schema structured output Follows up #3598 (cli/core feature shipped to main, no docs). **User doc** `docs/users/features/structured-output.md` — covers quick-start, schema input forms (inline + `@path`), output shapes per `--output-format`, parse-time restrictions, retry/failure modes, privacy redaction, permission gating, MCP shadow-tool handling, and a worked `jq`-piped pipeline example. Registered under the existing `features/_meta.ts` so it shows up in the docs sidebar between "Headless Mode" and "Dual Output". **Design doc** `docs/design/structured-output/structured-output.md` — why the synthetic-tool-whose-param-schema-is-the-user-schema approach, the four-stage parse-time validation pipeline, `schemaRootAcceptsObject`'s decided-vs-deferred boundaries, main-turn vs drain-turn parity via `processToolCallBatch`, the structured- success terminal block, the cross-surface privacy redaction sharing `STRUCTURED_OUTPUT_REDACTED_ARGS`, subagent context handling (`forSubAgent`), MCP shadow-tool guard, the compatibility surface, alternatives considered (and why rejected), and a file-by-file index. Both docs are English-only — repo convention is English-only for both `docs/users/features/` (zero zh-CN siblings) and `docs/design/` (only `customize-banner-area/` has a zh-CN twin). Open to adding zh-CN translations as a separate PR if there's demand. * docs(structured-output): address PR review feedback User doc: - explicit stdout-vs-stderr contract and `{}`-schema behavior. - 500 ms shutdown-holdback latency note. - ReDoS warning for user-supplied `pattern` keywords. - root `$ref` rejection + `allOf` workaround. - per-retry token cost note. - sibling-suppression success vs retry paths split out. - numeric exit codes (1 / 53 / 130) for every failure mode. - new "Session resumption" section for --continue / --resume. Design doc: - gloss the ToolSearch on-demand-loading reference. - `not` row: drop the array-indexing-lookalike `[…]`. - 500 ms holdback is best-effort, not guaranteed. - redaction rationale extends to validation-failure retries. - `CORE_TOOLS` phrasing: structured_output is excluded FROM the set; skill is in a separate dynamically-discovered category. - subagent suppression maintainer note (single brittle call path). - `--bare` parenthetical lists the three retained core tools. - PR #4001 status (closed 2026-05-11, superseded). * docs(structured-output): correct empty-schema / holdback / SIGINT claims Three doc claims were stronger than the actual code behaviour: - **Empty schema produces `{}`, not `null`.** `turn.ts` normalises the tool args via `(fnCall.args || {})` before they land in `structuredSubmission`, so a zero-arg call against `{}` is emitted as `{}` on stdout. The `?? null` in the adapter is defence-in-depth for the strictly-undefined case, which the upstream path doesn't produce. - **Holdback is a cap, not a fixed wait.** The loop guard is `Date.now() < deadline && registry.hasUnfinalizedTasks()`, so it exits immediately when nothing is in flight. Reword as "capped at ~500 ms" with an early-exit note. - **SIGINT can still flush a captured result.** The holdback loop does not poll the abort signal, so a SIGINT after the structured call is captured but before `adapter.emitResult` finishes may still land on stdout. Treat exit code 130 as the source of truth. Also addresses the new auto-review summary suggestion about per-turn schema cost: pull the cost callout up out of the bullet list (so it covers both retry cost and schema-embedded-every-turn cost), since the schema-embedding cost isn't retry-specific. * docs(structured-output): correct stdout/stderr + json-mode envelope claims Two doc claims didn't match `JsonOutputAdapter.emitResult`: - **Model prose doesn't go to stderr in text mode.** Only error messages and log lines do. Successful runs emit just the JSON-stringified payload on stdout; accumulated assistant prose is discarded entirely (not mirrored to stderr). Point users at `--output-format json` / `stream-json` when they need the prose. - **`--output-format json` emits a JSON array, not a single document with top-level fields.** The adapter calls `JSON.stringify(this.messages)` where `messages` is an array of message objects. `structured_result` lives on the final `type: "result"` element of that array, not at the document root, so consumers must read `.[-1].structured_result` rather than `.structured_result`. * docs(structured-output): note schema-itself reaches the provider The Privacy section so far only described `structured_output` *args* being redacted from local on-device surfaces (telemetry + chat recording). The schema body is a separate exposure surface — it ships as the function declaration's `parameters` block on every model request, so `enum`, `const`, `default`, `examples`, `description`, `$comment`, etc. travel to the provider in cleartext. Users defaulting to "redaction covers everything" could legitimately leak secrets via schema-literal fields. Add a callout in the user doc, plus a parallel paragraph in the design doc explaining why the redaction stops at on-device surfaces (the model needs the schema to satisfy the tool-call contract, so provider-side redaction isn't possible). * docs(structured-output): correct stdout-on-failure / ReDoS example / hooks / --bare deny / typo Five issues from the latest /qreview pass: - **stdout-vs-stderr is text-mode only.** In `--output-format json` and `stream-json`, the failure result message is emitted on stdout (final element of the JSON array, or the terminating `result` line on the JSONL stream). Wrappers in those modes must switch on `is_error`, not on whether stdout is empty. - **ReDoS example didn't actually demonstrate the threat.** JSON Schema `pattern` only fires on string instances, and tool args are always objects, so the bare `{"pattern": "(a+)+b"}` schema doesn't constrain anything the model can supply. Move the pattern inside a string-typed property. - **Hooks see raw `tool_input`.** `PreToolUse` / `PostToolUse` / `PostToolUseFailure` receive the unredacted args — including HTTP hooks that can forward off-device. Call this out explicitly so users with audit-style catch-all hooks know to filter or add hook-side redaction. - **`--bare` drops settings-level deny.** Bare mode builds `mergedDeny` as `[...(bareMode ? [] : settings.permissions.deny), …]` — settings-level denies are skipped while the synthetic tool stays registered. Argv-level `--exclude-tools` still applies. Document this exception in the user doc and the design doc. - **`maxSessionTurns` hint typo.** The hint points at "schema is unsatisfiable" — the original text inverted the polarity. * feat(core): PR-2.5 — post-promote stream redirect + natural-exit registry settle Closes the two limitations PR-2 (#3894) deferred for the Phase D part (b) Ctrl+B promote flow (#3831): 1. **Post-promote stream redirect**: today the `bg_xxx.output` file is frozen at promote time because `ShellExecutionService` detaches its data listener as part of PR-1's ownership-transfer contract. PR-2.5 wires a caller-side `onPostPromoteData` callback so bytes from the still-running child append to the file via an `fs.createWriteStream` opened in `handlePromotedForeground`. 2. **Natural-exit registry settle**: today the registry entry stays `'running'` until `task_stop` / session-end `abortAll` fires its abort listener. PR-2.5 wires `onPostPromoteSettle` so natural child exit transitions the entry to `'completed'` / `'failed'` with the right exitCode / signal / error message. ## Service (`shellExecutionService.ts`) - New exported types: `ShellExecuteOptions`, `ShellPostPromoteHandlers`, `ShellPostPromoteSettleInfo`. - `execute()` options bag now accepts `postPromote?: { onData, onSettle }`. Threaded through to both `executeWithPty` and `childProcessFallback`. - PTY's `performBackgroundPromote` (line ~1159): after disposing the foreground data + exit + error listeners, RE-ATTACH minimal forwarders that call `postPromote.onData` / `postPromote.onSettle` when the caller opted in. Backwards compat: when `postPromote` is unset the PR-2 detach-everything contract is preserved (the re-attach is gated on each callback being defined). - `childProcessFallback`'s `performBackgroundPromote` (line ~706): same pattern — re-attach `stdout.on('data', ...)`, `stderr.on('data', ...)`, `child.once('exit', ...)`, `child.once('error', ...)` when the caller opted in. `error` listener routes through `onSettle` with `error` populated, so spawn-side errors after the foreground errorHandler detached don't crash the daemon via the default unhandled `'error'` event. - Both paths wrap caller callbacks in try/catch so a thrown handler doesn't crash the child's data loop / unhandled-rejection the service. ## Shell tool (`shell.ts`) - New `PromoteArtifacts` type — slots shared between the foreground `execute()` postPromote handlers (which fire on the service side as soon as promote happens) and the post-resolve `handlePromotedForeground` finalizer (which runs after `await resultPromise` returns). The two race; the buffer + settle-queue absorb that race so neither chunks nor the eventual exit info are lost. - `executeForeground` wires `postPromote` handlers that route data to either `promoteArtifacts.stream` (if open) or `promoteArtifacts.buffer` (drained when the stream opens), and queue settle info if the wired handler isn't yet installed. - `handlePromotedForeground` opens `fs.createWriteStream(outputPath, { flags: 'w' })`, writes the initial snapshot first, drains the buffer, then registers the entry and wires `onSettleWired` with the full registry decision table: - `error` set → `registry.fail(shellId, error.message, endTime)` - `exitCode === 0` → `registry.complete(shellId, 0, endTime)` - non-zero exitCode → `registry.fail(shellId, "Exited with code N", endTime)` - signal !== null → `registry.fail(shellId, "Terminated by signal N", endTime)` - all-null fallback → `registry.fail(shellId, "Exited with unknown status", endTime)` - Fires queued settle synchronously after wiring so a fast command that exits between promote and finalizer doesn't get lost. - Self-audit catch: closes the output stream on the `registry.register` throw path so the FD doesn't leak past the orphan-child kill. ## Tests - 3 new in `shellExecutionService.test.ts`: - `post-promote bytes route to postPromote.onData when callback provided` - `postPromote.onSettle fires on natural child exit after promote` - `backwards compat: without postPromote, listeners stay fully detached` - 3 new in `shell.test.ts` under a `foreground → background promote PR-2.5` describe block: - `post-promote bytes APPEND to bg_xxx.output via write stream` - `natural child exit transitions registry entry to "completed"` - `non-zero exit / signal / error → "failed" with descriptive message` - Bulk-replaced 50 prior `{},` (empty 6th-arg shellExecutionConfig) with `expect.objectContaining({}),` + added `expect.objectContaining({ postPromote: expect.any(Object) }),` as the 7th-arg expectation for the foreground execute call. - Updated the existing `registers a bg_xxx entry on result.promoted` test to assert on `fs.createWriteStream` + `stream.write` instead of the now-removed `fs.writeFileSync` snapshot path. 182/182 shell.test.ts pass + 73/73 shellExecutionService.test.ts pass + 111/111 coreToolScheduler.test.ts pass + 60/60 AppContainer.test.tsx pass; tsc + ESLint clean. Self-audit: 3 rounds (positive / reverse / cross-file) found one issue — output stream FD leak on `registry.register` throw — and fixed it before flagging complete. All flagged edge cases (stream errors, child-exits-before-wire-up race, task_stop during natural- exit window, promote-never-happens cleanup, backwards compat without callbacks) have explicit handling and / or test pinning. * fix(core): #4102 review wave — 3 Critical + UTF-8 + tests 3 Critical race/correctness issues + 1 multibyte-corruption suggestion + 3 test coverage gaps addressed: **Critical 1 — child_process late-chunk drop (service)** Settle was fired on 'exit', but stdout/stderr can emit buffered data between 'exit' and 'close'. Late chunks landed in `promoteArtifacts.buffer` after shell.ts had already closed the stream + transitioned the registry → silently dropped → truncated `bg_xxx.output`. Switched to listening on 'close' which guarantees all stdio is fully drained. (code, signal) payload is identical to 'exit', just with proper ordering. **Critical 2 — stream-flush wait before registry transition (shell)** `stream.end()` is asynchronous; pending writes can still be in the libuv queue when it returns. The old code transitioned the registry immediately after `.end()`, so a /tasks consumer could observe a `completed` entry and read the output file BEFORE the trailing bytes were on disk. Fixed: wired settle now `stream.once('finish', ...)` BEFORE calling `registry.complete/fail`. `error` event also short-circuits to the transition so a late ENOSPC doesn't hang the settle path forever. **Critical 3 — stream-open-fail buffer leak (shell)** If `fs.createWriteStream` threw, the catch path set `stream = null` but the foreground `onData` handler would still take the `stream === null` branch and push chunks into `promoteArtifacts.buffer` — unbounded growth under a sustained child whose output file couldn't be opened. Added a `streamFailed: boolean` latch on `PromoteArtifacts`. When set, `onData` drops chunks (with a debug log) instead of buffering. The catch branch sets the latch. **Suggestion — shared TextDecoder corrupts multibyte UTF-8 (service)** child_process post-promote used ONE TextDecoder for both stdout AND stderr. The decoder's continuation-byte state machine assumes one byte source; interleaved multibyte chunks corrupted. Now uses separate decoders + flushes both with `decode()` (no `stream: true`) on settle so trailing bytes surface as their final characters. **Suggestion — llmContent reflects already-settled status (shell)** When the queued-settle drain transitions the registry synchronously (fast-exit race), the model-facing copy was still saying "Status: running. … task_stop({...})". Updated to branch on `postPromoteAlreadySettled` / `postPromoteFinalStatus` — when the process is already gone, the copy says "Status: completed/failed" and replaces the `task_stop` suggestion with "Process has already exited; no `task_stop` needed". **Suggestion — test coverage gaps** Added: (a) `queued-settle race: onSettle BEFORE handlePromotedForeground completes` — custom service impl fires onSettle synchronously before resolving the promote promise, pins the drain path. (b) child_process post-promote tests for stdout/stderr forwarding + 'close'-not-'exit' settle + spawn-error settle. **Self-audit**: Round 1 + reverse audit. Stream.once mock added to fire 'finish' synchronously so existing tests don't hang on the new flush wait. 76/76 shellExecutionService.test.ts (+3) + 183/183 shell.test.ts (+1) pass; tsc + ESLint clean. * fix(core): #4102 review wave-2 — 3 more from gpt-5.5 C1 (shell.ts:2227): the WriteStream `'error'` event handler only logged. `fs.createWriteStream` reports common open failures (ENOENT / EACCES / ENOSPC) asynchronously via that event rather than throwing. Result: `promoteArtifacts.stream` kept pointing at the failed stream; `onSettleWired` attached a `.once('finish')` listener that would never fire → registry stuck on `running` forever. Latch the failure (null the shared `stream` slot, set `streamFailed`); `onSettleWired`'s existing `if (!stream)` branch then transitions the registry immediately. C2 (shellExecutionService.ts:1468): the promote handoff removes the foreground `ptyErrorHandler` and only re-attaches data + exit listeners. A subsequent PTY `error` event had no listener — Node treats an unhandled `error` from an EventEmitter as a fatal exception that takes the whole CLI down. Attach a post-promote forwarder that ignores expected PTY read-exit codes (EIO / EAGAIN, same filter the foreground handler uses) and routes unexpected errors through `postPromote.onSettle` with `error` populated. Single-fire latch shared with `onExit` so settle never fires twice. C3 (shell.ts:2503): `onSettleWired` waits for the stream's asynchronous `'finish'` event before flipping `postPromoteAlreadySettled`, but the model-facing `statusLine` was built immediately after invoking `onSettleWired` on the queued settle. A fast-exited promoted command could therefore land "Status: running" + a `task_stop` instruction in production even though settle was already observed. Split into two flags: `postPromoteSettleObserved` (set synchronously when settle is classified) drives the model copy; the registry transition stays behind the stream flush. Tests: +1 PR-2.5 wave-2 PTY error-routing test; +2 shell.ts tests (stream open async error → registry still transitions; async `'finish'` after queued-settle drain → llmContent says 'completed' before registry transition fires). * fix(core): #4102 review wave-3 — 4 actionable from deepseek-v4-pro T2 (shell.ts:2456) — Critical buffer-leak race `onSettleWired` previously set `promoteArtifacts.stream = null` BEFORE calling `stream.end()`. Any `postPromote.onData` chunk that landed between that null assignment and the actual flush completing saw `stream === null && streamFailed === false` and pushed into `promoteArtifacts.buffer` — a buffer that has no further drain path (the foreground finalizer has already returned). Result: chunks stranded indefinitely; PTY mode in particular hits this because `onExit` can fire while kernel buffers still hold data. Fix drains the pre-settle buffer to the stream BEFORE nulling AND latches `streamFailed = true` so any subsequent chunk drops via the existing `else if (streamFailed)` arm in `onData` instead of leaking. Updates the `streamFailed` doc to cover both setters (open-fail and settle-done) so the dual semantic is explicit. T3 (shell.ts:2262) — silent chunk-drop in catch path When `fs.createWriteStream` throws synchronously (rare: ENOENT on a vanished tmpdir), chunks already in `promoteArtifacts.buffer` were silently lost with no observability — oncall reading a truncated `bg_xxx.output` had no way to distinguish "stream open failed" from "child produced nothing." Logs the dropped chunk count and empties the buffer. T5 (shell.ts:2443) — opaque all-null fallback The "Exited with unknown status" fallback fired the registry to 'failed' without any context about which fields were null. This branch is meant to be unreachable; hitting it indicates the service emitted a defective settle info object. Includes the field values in both the fail message and a warn log so the oncall engineer can tell this path apart from the other "failed" branches. T6 (shellExecutionService.ts:1452) — leaked PTY post-promote listeners `ptyProcess.onData(...)` returns an `IDisposable` that was being discarded; same for `onExit`. The `'error'` listener function was also not captured (no way to `removeListener` it). EventEmitter holds refs to listener closures, which transitively hold refs to `onPostData` / `onPostSettle` / the caller's `promoteArtifacts`. While bounded by the PTY's lifetime, the closures keep the caller's state pinned for the post-settle delay window. Captures all three handles into `postPromoteDataDisposable` / `postPromoteExitDisposable` / `postPromoteErrorListener`, then releases them via a shared `disposePostPromoteListeners()` call from `firePostSettle` (idempotent — each slot null-checked and nulled after disposal). Tests: +1 service test for IDisposable + error-listener cleanup; +2 shell.ts tests for buffer drain race and catch-path snapshot fallback. Existing tests stay green (262 → 265 in the touched suites; 7819 → 7822 across the core package). * fix(core/test): drop unused 'registry' in wave-3 T2 test (TS6133) CI build failed across all platforms with src/tools/shell.test.ts(4395,15): error TS6133. The variable was a leftover from copying the queued-settle test pattern; the wave-3 T2 test inspects writeStreamMock.write call history directly and never reads the registry, so the assignment is dead code. Drop it. * fix(core): #4102 review wave-4 — 6 actionable from gpt-5.5 + deepseek-v4-pro T1 (Critical, shellExecutionService.ts:860 child_process onSettle exactly-once) The PTY path used a `firePostSettle` latch but child_process wired `close` and `error` independently to `onPostSettle`. A spawn-side error followed by Node's auto-emitted `'close'` would call the caller's settle TWICE, racing the registry transition. Added the same single-fire latch on the child_process path. T2 (Critical, shell.ts:2264 handoff race reorder) Original order was `write(snapshot) -> drain buffer -> assign stream`. Synchronous today (no race in current code), but assign-after-drain leaves a hazard for any future refactor that adds an `await` inside the drain loop — a chunk arriving in that window would land in `promoteArtifacts.buffer`, then post-assign chunks would write to the stream first, producing out-of-order bytes until the settle drain. Reordered to `write(snapshot) -> assign stream -> drain buffer`, which closes the hazard regardless of future async additions. T3 (Suggestion, shellExecutionService.ts:816 decoder flush gated on onSettle) The trailing-multibyte flush ran inside the `child.once('close', ...)` handler, which was only installed when `onSettle` was set. An `onData`-only caller (no onSettle) lost trailing continuation bytes silently. Hoisted flush into `flushPostPromoteDecoders` called from `firePostSettle`, and made `firePostSettle` available on the `'close'` path independent of onSettle (T6 install). T4 (Suggestion, shell.ts:1700 promoted ANSI passthrough) The regular `executeBackground` path strips ANSI before writing to `bg_xxx.output`; the promoted-foreground onData path appended raw chunks. Reading `bg_xxx.output` after Ctrl+B showed plain text up to the snapshot then raw `\x1b[31m` / cursor-move / clear-screen sequences for the post-promote tail — unreadable. Apply `stripAnsi(rawChunk)` before write/buffer, matching the executeBackground contract. T5 (Suggestion, shellExecutionService.ts:786 UTF-8 hardcoded) The post-promote child_process decoders were hard-coded to `new TextDecoder('utf-8')`, but the foreground decoder runs encoding detection via `getCachedEncodingForBuffer`. On a non-UTF-8 child (e.g. GBK on a Chinese Windows shell), the snapshot decoded correctly but the post-promote tail was mojibake. Capture the foreground decoder's `.encoding` property and reuse it for post-promote (with utf-8 fallback if foreground hadn't seen any bytes yet, and a try/catch around `new TextDecoder` for the rare unsupported-encoding case). T6 (Suggestion, shellExecutionService.ts:1540 `error` listener gated on onSettle) The post-promote `error` listener was attached only when `onSettle` was set. An `onData`-only caller still had the foreground errorHandler detached; a post-promote spawn error would then crash the CLI via Node's unhandled-error default. Hoisted the close + error listeners into `if (postPromote)` so any caller opting into post-promote gets crash protection; if `onSettle` is absent the listeners log + drop instead of routing. T7 (Suggestion, shellExecutionService.ts:791 onSettle-only pipe-block deadlock) Same root cause as T6: when only `onSettle` is set, the foreground `stdout`/`stderr` 'data' listeners are detached and no post-promote listener replaces them. The Readables stay paused, the OS pipe buffer fills (~64KB on Linux), the child blocks on `stdout.write`, 'close' never fires, onSettle never fires. Added `child.stdout?.resume()` and `child.stderr?.resume()` in the no-onData branch so the child can drain its pipes and reach exit. T8 (Suggestion, shell.ts:2614 dead inspectLine ternary) `inspectLine`'s ternary returned the same string on both sides — copy-paste leftover from when the other two adjacent ternaries (statusLine / stopLine) were correctly varied. Collapsed to a single string assignment. Tests: +5 regression tests (4 child_process: T1 double-fire latch, T3 onData-only flush, T6 onData-only error survives, T7 onSettle- only resume; +1 shell.ts: T4 ANSI strip). 265 -> 270 in the touched suites; 7822 -> 7827 across the core package; full suite green. * fix(core/test): use ShellOutputEvent type in wave-4 onData callbacks (TS2345) CI lint failed on the wave-4 (T3 / T6) tests with TS2345: pushing ShellOutputEvent into Array<{type:string;chunk:unknown}> narrows incompatibly. Switch to ShellOutputEvent[] (matches earlier helpers at lines 758/966) and discriminate the union via .type === 'data' when reading .chunk so the narrowed multibyte assertion still type-checks. * docs(structured-output): address doudouOUC's four review findings - Tighten JSON/stream-json paragraph: not all failures emit a result to stdout (exit 53 / exit 130 are stderr-only); check exit code first - Fix suppressed-sibling retry guidance: re-issue in a separate turn that does not include structured_output (avoids re-suppression) - Distinguish settings-deny (exit 53) from --exclude-tools (exit 1) in Permission gating section - Replace <projectDir> placeholder with actual path ~/.qwen/projects/<sanitized-cwd>/chats/<sessionId>.jsonl in both docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(structured-output): fix Permission gating — both deny paths strip registration Forward audit against source code found that the Permission gating section incorrectly distinguished settings.permissions.deny (claiming tool stays visible, exit 53) from --exclude-tools (claiming declaration stripped, exit 1). Both go through the same mergedDeny → isToolEnabled path and both prevent registration — the model never sees the tool. Corrected both docs to reflect the actual mechanism: typical outcome is plain text (exit 1), with maxSessionTurns (exit 53) as the fallback if the model loops through other tools. * docs(structured-output): address doudouOUC's May 17 review (5 items) - Clarify validation is client-side Ajv, not provider-side - Qualify "same way" with DeclarativeTool abstraction parenthetical - Match symptom→cause structure for maxSessionTurns hint - Expand $ref workaround with concrete $defs example - Clarify Dual Output See Also doesn't require --json-schema * docs(structured-output): address 2 unresolved design-doc suggestions 1. Privacy/redaction section: note hooks as intentionally non-redacted surface (matches user-doc "Hooks see raw args" callout). 2. Dual call-site section: clarify differing post-helper termination flow between main-turn (direct return) and drain-turn (sentinel hop). * docs(structured-output): address doudouOUC's May 17 review (2 nits) 1. Failure-paths table: align "three common causes" cell with the symptom→cause framing already used at parse-time validation pipeline section ("common stuck-run symptom and its two likely causes"). 2. Dual call-site section: fix factual inaccuracy from prior commit — `drainOneItem` is `async (): Promise<void>` and returns nothing. The two-hop termination is via closure-mutated `structuredSubmission` (set by `processToolCallBatch`, checked by `drainLocalQueue` and the holdback loop), not a return-value sentinel. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Validation
git diff --checkpassed;npm run build && npm run typecheckpassed. Build emitted only the existing Browserslist stale-data warning.Scope / Risk
Testing Matrix
Testing matrix notes:
Linked Issues / Bugs