docs: user + design docs for --json-schema structured output#4051
Conversation
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
Overview
Docs-only diff. I cross-checked every technical claim against the PR base — file paths, function names, constants (4 MiB cap, STRUCTURED_SHUTDOWN_HOLDBACK_MS = 500, exit codes 1/53/130), the schemaRootAcceptsObject decision matrix (including the Object.keys(notRecord).length === 1 rule for not-narrowing), the 'structuredResult' in options contract, redaction-constant reuse between telemetry and chat-recording, the factories-map MCP-collision check, and the four parse-time mutex rules. All accurate.
Leaving a handful of polish suggestions inline. None blocking — happy with this landing as-is or after a revision pass.
Strengths worth calling out
- Alternatives section captures decision rationales (synthetic tool vs. response-prompting vs. provider-specific
response_format) — exactly what's normally lost in commit history. schemaRootAcceptsObjectdecided-vs-deferred split makes the best-effort scope explicit so future contributors know what's intentionally not handled.- Cross-surface redaction story (shared
STRUCTURED_OUTPUT_REDACTED_ARGS, two surfaces, preserved metrics) is documented coherently. jq-piped pipeline example shows realistic downstream consumption, not just feature mechanics.
Risks
None — no code paths touched, _meta.ts is a single key addition in a semantically sensible slot.
Suggested extra test step
Render the design doc through the site's markdown pipeline before merging — the not: row uses {type: "object"[…]} and pipe-heavy tables sometimes trip renderers. The matrix tables are the part most likely to break.
wenshao
left a comment
There was a problem hiding this comment.
Review Summary
Pure documentation PR (3 files, +518 lines). Docs are well-structured and consistent with the implementation, but the user-facing doc has two critical information gaps and several improvement opportunities.
Critical
- Error output stream (stdout vs stderr) is never explicitly stated — shell scripts that capture stdout via
$(qwen ...)need to know where errors land. --continue/--resumeinteraction with--json-schemais undocumented — users can't determine how to correctly resume a structured-output session.
Suggestions (inline below)
- 10 additional suggestions covering exit codes, performance caveats, ReDoS warnings, edge-case behavior, and doc clarity.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] Per-turn token cost of schema not documented: the schema is serialized as the structured_output function declaration's parameters and sent in every model API call, not just once. Large schemas (up to 4 MiB) over multi-turn runs multiply this silently. Consider adding a note near the existing "Cost note" in the Retry section: "The schema is embedded in every model request as a function declaration; large schemas proportionally increase per-turn token costs."
— DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
wenshao
left a comment
There was a problem hiding this comment.
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
Docs-only follow-up to #3598. Full dual review against current HEAD. The follow-up commits closed two of the three prior findings — the empty-schema "always null" overclaim and the "fixed ~500 ms" holdback wording now match the code. The core contract (parse-time validation, schema input, redaction, holdback bound, subagent suppression, MCP rename, sidebar placement) is accurate. Four gaps remain; three mislead the automation audience the JSON modes serve.
1. "Read is_error on the stdout result" doesn't hold for every failure (severity: medium · confidence: high)
The doc tells json/stream-json wrappers to detect failure via is_error on a stdout result instead of checking for empty stdout. But max-session-turns, Ctrl-C, and config-error exits write to stderr with no result object, so such a wrapper silently misses them — and the doc's own max-turns section confirms that's the stderr exit path.
2. Suppressed-sibling retry guidance describes a path that gets re-suppressed (severity: medium · confidence: high)
The doc says the model can re-issue a skipped side-effecting call "alongside the retried structured call." Any turn containing structured_output is filtered to structured calls only, so a same-turn re-issue is suppressed exactly as before. The call only survives in a turn without structured_output.
3. "Deny the tool → run hits max-session-turns" is overstated (severity: medium · confidence: medium)
Exit 53 only holds when the tool stays registered and the model keeps retrying. When the declaration is stripped (--exclude-tools or a settings deny), the model answers in plain text and the run exits 1 — not 53. The doc collapses both deny shapes into the single exit-53 outcome.
4. Chat-recording path uses an undefined, misleading placeholder (severity: medium · confidence: very high — prior finding, still open)
Both docs still point at <projectDir>/chats/<sessionId>.jsonl; <projectDir> is undefined and reads as the working directory, but the file is under ~/.qwen/projects/<sanitized-cwd>/chats/. The "See also" sibling doc already uses the explicit form. Raised last round; the feedback commit didn't touch it.
Two smaller points: the empty-schema note treats "model calls with no arguments" as a given when an unconstrained schema also emits any supplied args verbatim; and the design doc says structured_output is "excluded from" the core-tools allowlist when it's simply not a member — net behavior is correct in both.
Verdict
COMMENT — Core contract accurate and two prior findings closed; none of the gaps block merge, but items 1–3 misinform automation authors about exit codes and failure detection and should be tightened first.
Response to @tanzhenxin's reviewThanks for the thorough dual-review — addressing each finding: 1. "Read
|
Re: @tanzhenxin's reviewAll four findings addressed in 855f023:
Re: @doudouOUC's action itemsAll four actions landed in 855f023. Thanks for the clear breakdown. |
Follows up #3598 (cli/core feature shipped to main, no docs). **User doc** `docs/users/features/structured-output.md` — covers quick-start, schema input forms (inline + `@path`), output shapes per `--output-format`, parse-time restrictions, retry/failure modes, privacy redaction, permission gating, MCP shadow-tool handling, and a worked `jq`-piped pipeline example. Registered under the existing `features/_meta.ts` so it shows up in the docs sidebar between "Headless Mode" and "Dual Output". **Design doc** `docs/design/structured-output/structured-output.md` — why the synthetic-tool-whose-param-schema-is-the-user-schema approach, the four-stage parse-time validation pipeline, `schemaRootAcceptsObject`'s decided-vs-deferred boundaries, main-turn vs drain-turn parity via `processToolCallBatch`, the structured- success terminal block, the cross-surface privacy redaction sharing `STRUCTURED_OUTPUT_REDACTED_ARGS`, subagent context handling (`forSubAgent`), MCP shadow-tool guard, the compatibility surface, alternatives considered (and why rejected), and a file-by-file index. Both docs are English-only — repo convention is English-only for both `docs/users/features/` (zero zh-CN siblings) and `docs/design/` (only `customize-banner-area/` has a zh-CN twin). Open to adding zh-CN translations as a separate PR if there's demand.
User doc:
- explicit stdout-vs-stderr contract and `{}`-schema behavior.
- 500 ms shutdown-holdback latency note.
- ReDoS warning for user-supplied `pattern` keywords.
- root `$ref` rejection + `allOf` workaround.
- per-retry token cost note.
- sibling-suppression success vs retry paths split out.
- numeric exit codes (1 / 53 / 130) for every failure mode.
- new "Session resumption" section for --continue / --resume.
Design doc:
- gloss the ToolSearch on-demand-loading reference.
- `not` row: drop the array-indexing-lookalike `[…]`.
- 500 ms holdback is best-effort, not guaranteed.
- redaction rationale extends to validation-failure retries.
- `CORE_TOOLS` phrasing: structured_output is excluded FROM the set;
skill is in a separate dynamically-discovered category.
- subagent suppression maintainer note (single brittle call path).
- `--bare` parenthetical lists the three retained core tools.
- PR #4001 status (closed 2026-05-11, superseded).
Three doc claims were stronger than the actual code behaviour:
- **Empty schema produces `{}`, not `null`.** `turn.ts` normalises
the tool args via `(fnCall.args || {})` before they land in
`structuredSubmission`, so a zero-arg call against `{}` is emitted
as `{}` on stdout. The `?? null` in the adapter is defence-in-depth
for the strictly-undefined case, which the upstream path doesn't
produce.
- **Holdback is a cap, not a fixed wait.** The loop guard is
`Date.now() < deadline && registry.hasUnfinalizedTasks()`, so it
exits immediately when nothing is in flight. Reword as "capped at
~500 ms" with an early-exit note.
- **SIGINT can still flush a captured result.** The holdback loop
does not poll the abort signal, so a SIGINT after the structured
call is captured but before `adapter.emitResult` finishes may
still land on stdout. Treat exit code 130 as the source of truth.
Also addresses the new auto-review summary suggestion about per-turn
schema cost: pull the cost callout up out of the bullet list (so it
covers both retry cost and schema-embedded-every-turn cost), since
the schema-embedding cost isn't retry-specific.
…laims Two doc claims didn't match `JsonOutputAdapter.emitResult`: - **Model prose doesn't go to stderr in text mode.** Only error messages and log lines do. Successful runs emit just the JSON-stringified payload on stdout; accumulated assistant prose is discarded entirely (not mirrored to stderr). Point users at `--output-format json` / `stream-json` when they need the prose. - **`--output-format json` emits a JSON array, not a single document with top-level fields.** The adapter calls `JSON.stringify(this.messages)` where `messages` is an array of message objects. `structured_result` lives on the final `type: "result"` element of that array, not at the document root, so consumers must read `.[-1].structured_result` rather than `.structured_result`.
The Privacy section so far only described `structured_output` *args* being redacted from local on-device surfaces (telemetry + chat recording). The schema body is a separate exposure surface — it ships as the function declaration's `parameters` block on every model request, so `enum`, `const`, `default`, `examples`, `description`, `$comment`, etc. travel to the provider in cleartext. Users defaulting to "redaction covers everything" could legitimately leak secrets via schema-literal fields. Add a callout in the user doc, plus a parallel paragraph in the design doc explaining why the redaction stops at on-device surfaces (the model needs the schema to satisfy the tool-call contract, so provider-side redaction isn't possible).
…hooks / --bare deny / typo
Five issues from the latest /qreview pass:
- **stdout-vs-stderr is text-mode only.** In `--output-format json`
and `stream-json`, the failure result message is emitted on
stdout (final element of the JSON array, or the terminating
`result` line on the JSONL stream). Wrappers in those modes must
switch on `is_error`, not on whether stdout is empty.
- **ReDoS example didn't actually demonstrate the threat.** JSON
Schema `pattern` only fires on string instances, and tool args
are always objects, so the bare `{"pattern": "(a+)+b"}` schema
doesn't constrain anything the model can supply. Move the
pattern inside a string-typed property.
- **Hooks see raw `tool_input`.** `PreToolUse` / `PostToolUse` /
`PostToolUseFailure` receive the unredacted args — including
HTTP hooks that can forward off-device. Call this out
explicitly so users with audit-style catch-all hooks know to
filter or add hook-side redaction.
- **`--bare` drops settings-level deny.** Bare mode builds
`mergedDeny` as `[...(bareMode ? [] : settings.permissions.deny), …]`
— settings-level denies are skipped while the synthetic tool
stays registered. Argv-level `--exclude-tools` still applies.
Document this exception in the user doc and the design doc.
- **`maxSessionTurns` hint typo.** The hint points at "schema is
unsatisfiable" — the original text inverted the polarity.
…stry settle Closes the two limitations PR-2 (#3894) deferred for the Phase D part (b) Ctrl+B promote flow (#3831): 1. **Post-promote stream redirect**: today the `bg_xxx.output` file is frozen at promote time because `ShellExecutionService` detaches its data listener as part of PR-1's ownership-transfer contract. PR-2.5 wires a caller-side `onPostPromoteData` callback so bytes from the still-running child append to the file via an `fs.createWriteStream` opened in `handlePromotedForeground`. 2. **Natural-exit registry settle**: today the registry entry stays `'running'` until `task_stop` / session-end `abortAll` fires its abort listener. PR-2.5 wires `onPostPromoteSettle` so natural child exit transitions the entry to `'completed'` / `'failed'` with the right exitCode / signal / error message. ## Service (`shellExecutionService.ts`) - New exported types: `ShellExecuteOptions`, `ShellPostPromoteHandlers`, `ShellPostPromoteSettleInfo`. - `execute()` options bag now accepts `postPromote?: { onData, onSettle }`. Threaded through to both `executeWithPty` and `childProcessFallback`. - PTY's `performBackgroundPromote` (line ~1159): after disposing the foreground data + exit + error listeners, RE-ATTACH minimal forwarders that call `postPromote.onData` / `postPromote.onSettle` when the caller opted in. Backwards compat: when `postPromote` is unset the PR-2 detach-everything contract is preserved (the re-attach is gated on each callback being defined). - `childProcessFallback`'s `performBackgroundPromote` (line ~706): same pattern — re-attach `stdout.on('data', ...)`, `stderr.on('data', ...)`, `child.once('exit', ...)`, `child.once('error', ...)` when the caller opted in. `error` listener routes through `onSettle` with `error` populated, so spawn-side errors after the foreground errorHandler detached don't crash the daemon via the default unhandled `'error'` event. - Both paths wrap caller callbacks in try/catch so a thrown handler doesn't crash the child's data loop / unhandled-rejection the service. ## Shell tool (`shell.ts`) - New `PromoteArtifacts` type — slots shared between the foreground `execute()` postPromote handlers (which fire on the service side as soon as promote happens) and the post-resolve `handlePromotedForeground` finalizer (which runs after `await resultPromise` returns). The two race; the buffer + settle-queue absorb that race so neither chunks nor the eventual exit info are lost. - `executeForeground` wires `postPromote` handlers that route data to either `promoteArtifacts.stream` (if open) or `promoteArtifacts.buffer` (drained when the stream opens), and queue settle info if the wired handler isn't yet installed. - `handlePromotedForeground` opens `fs.createWriteStream(outputPath, { flags: 'w' })`, writes the initial snapshot first, drains the buffer, then registers the entry and wires `onSettleWired` with the full registry decision table: - `error` set → `registry.fail(shellId, error.message, endTime)` - `exitCode === 0` → `registry.complete(shellId, 0, endTime)` - non-zero exitCode → `registry.fail(shellId, "Exited with code N", endTime)` - signal !== null → `registry.fail(shellId, "Terminated by signal N", endTime)` - all-null fallback → `registry.fail(shellId, "Exited with unknown status", endTime)` - Fires queued settle synchronously after wiring so a fast command that exits between promote and finalizer doesn't get lost. - Self-audit catch: closes the output stream on the `registry.register` throw path so the FD doesn't leak past the orphan-child kill. ## Tests - 3 new in `shellExecutionService.test.ts`: - `post-promote bytes route to postPromote.onData when callback provided` - `postPromote.onSettle fires on natural child exit after promote` - `backwards compat: without postPromote, listeners stay fully detached` - 3 new in `shell.test.ts` under a `foreground → background promote PR-2.5` describe block: - `post-promote bytes APPEND to bg_xxx.output via write stream` - `natural child exit transitions registry entry to "completed"` - `non-zero exit / signal / error → "failed" with descriptive message` - Bulk-replaced 50 prior `{},` (empty 6th-arg shellExecutionConfig) with `expect.objectContaining({}),` + added `expect.objectContaining({ postPromote: expect.any(Object) }),` as the 7th-arg expectation for the foreground execute call. - Updated the existing `registers a bg_xxx entry on result.promoted` test to assert on `fs.createWriteStream` + `stream.write` instead of the now-removed `fs.writeFileSync` snapshot path. 182/182 shell.test.ts pass + 73/73 shellExecutionService.test.ts pass + 111/111 coreToolScheduler.test.ts pass + 60/60 AppContainer.test.tsx pass; tsc + ESLint clean. Self-audit: 3 rounds (positive / reverse / cross-file) found one issue — output stream FD leak on `registry.register` throw — and fixed it before flagging complete. All flagged edge cases (stream errors, child-exits-before-wire-up race, task_stop during natural- exit window, promote-never-happens cleanup, backwards compat without callbacks) have explicit handling and / or test pinning.
3 Critical race/correctness issues + 1 multibyte-corruption suggestion
+ 3 test coverage gaps addressed:
**Critical 1 — child_process late-chunk drop (service)**
Settle was fired on 'exit', but stdout/stderr can emit buffered data
between 'exit' and 'close'. Late chunks landed in
`promoteArtifacts.buffer` after shell.ts had already closed the
stream + transitioned the registry → silently dropped → truncated
`bg_xxx.output`. Switched to listening on 'close' which guarantees
all stdio is fully drained. (code, signal) payload is identical to
'exit', just with proper ordering.
**Critical 2 — stream-flush wait before registry transition (shell)**
`stream.end()` is asynchronous; pending writes can still be in the
libuv queue when it returns. The old code transitioned the registry
immediately after `.end()`, so a /tasks consumer could observe a
`completed` entry and read the output file BEFORE the trailing
bytes were on disk. Fixed: wired settle now `stream.once('finish',
...)` BEFORE calling `registry.complete/fail`. `error` event also
short-circuits to the transition so a late ENOSPC doesn't hang the
settle path forever.
**Critical 3 — stream-open-fail buffer leak (shell)**
If `fs.createWriteStream` threw, the catch path set `stream = null`
but the foreground `onData` handler would still take the
`stream === null` branch and push chunks into `promoteArtifacts.buffer`
— unbounded growth under a sustained child whose output file
couldn't be opened. Added a `streamFailed: boolean` latch on
`PromoteArtifacts`. When set, `onData` drops chunks (with a debug
log) instead of buffering. The catch branch sets the latch.
**Suggestion — shared TextDecoder corrupts multibyte UTF-8 (service)**
child_process post-promote used ONE TextDecoder for both stdout AND
stderr. The decoder's continuation-byte state machine assumes one
byte source; interleaved multibyte chunks corrupted. Now uses
separate decoders + flushes both with `decode()` (no `stream: true`)
on settle so trailing bytes surface as their final characters.
**Suggestion — llmContent reflects already-settled status (shell)**
When the queued-settle drain transitions the registry synchronously
(fast-exit race), the model-facing copy was still saying "Status:
running. … task_stop({...})". Updated to branch on
`postPromoteAlreadySettled` / `postPromoteFinalStatus` — when the
process is already gone, the copy says "Status: completed/failed"
and replaces the `task_stop` suggestion with "Process has already
exited; no `task_stop` needed".
**Suggestion — test coverage gaps**
Added: (a) `queued-settle race: onSettle BEFORE
handlePromotedForeground completes` — custom service impl fires
onSettle synchronously before resolving the promote promise, pins
the drain path. (b) child_process post-promote tests for stdout/stderr
forwarding + 'close'-not-'exit' settle + spawn-error settle.
**Self-audit**: Round 1 + reverse audit. Stream.once mock added to
fire 'finish' synchronously so existing tests don't hang on the new
flush wait. 76/76 shellExecutionService.test.ts (+3) + 183/183
shell.test.ts (+1) pass; tsc + ESLint clean.
C1 (shell.ts:2227): the WriteStream `'error'` event handler only
logged. `fs.createWriteStream` reports common open failures
(ENOENT / EACCES / ENOSPC) asynchronously via that event rather
than throwing. Result: `promoteArtifacts.stream` kept pointing at
the failed stream; `onSettleWired` attached a `.once('finish')`
listener that would never fire → registry stuck on `running`
forever. Latch the failure (null the shared `stream` slot,
set `streamFailed`); `onSettleWired`'s existing `if (!stream)`
branch then transitions the registry immediately.
C2 (shellExecutionService.ts:1468): the promote handoff removes the
foreground `ptyErrorHandler` and only re-attaches data + exit
listeners. A subsequent PTY `error` event had no listener — Node
treats an unhandled `error` from an EventEmitter as a fatal
exception that takes the whole CLI down. Attach a post-promote
forwarder that ignores expected PTY read-exit codes (EIO / EAGAIN,
same filter the foreground handler uses) and routes unexpected
errors through `postPromote.onSettle` with `error` populated.
Single-fire latch shared with `onExit` so settle never fires twice.
C3 (shell.ts:2503): `onSettleWired` waits for the stream's
asynchronous `'finish'` event before flipping
`postPromoteAlreadySettled`, but the model-facing `statusLine` was
built immediately after invoking `onSettleWired` on the queued
settle. A fast-exited promoted command could therefore land
"Status: running" + a `task_stop` instruction in production even
though settle was already observed. Split into two flags:
`postPromoteSettleObserved` (set synchronously when settle is
classified) drives the model copy; the registry transition stays
behind the stream flush.
Tests: +1 PR-2.5 wave-2 PTY error-routing test; +2 shell.ts tests
(stream open async error → registry still transitions; async
`'finish'` after queued-settle drain → llmContent says 'completed'
before registry transition fires).
T2 (shell.ts:2456) — Critical buffer-leak race `onSettleWired` previously set `promoteArtifacts.stream = null` BEFORE calling `stream.end()`. Any `postPromote.onData` chunk that landed between that null assignment and the actual flush completing saw `stream === null && streamFailed === false` and pushed into `promoteArtifacts.buffer` — a buffer that has no further drain path (the foreground finalizer has already returned). Result: chunks stranded indefinitely; PTY mode in particular hits this because `onExit` can fire while kernel buffers still hold data. Fix drains the pre-settle buffer to the stream BEFORE nulling AND latches `streamFailed = true` so any subsequent chunk drops via the existing `else if (streamFailed)` arm in `onData` instead of leaking. Updates the `streamFailed` doc to cover both setters (open-fail and settle-done) so the dual semantic is explicit. T3 (shell.ts:2262) — silent chunk-drop in catch path When `fs.createWriteStream` throws synchronously (rare: ENOENT on a vanished tmpdir), chunks already in `promoteArtifacts.buffer` were silently lost with no observability — oncall reading a truncated `bg_xxx.output` had no way to distinguish "stream open failed" from "child produced nothing." Logs the dropped chunk count and empties the buffer. T5 (shell.ts:2443) — opaque all-null fallback The "Exited with unknown status" fallback fired the registry to 'failed' without any context about which fields were null. This branch is meant to be unreachable; hitting it indicates the service emitted a defective settle info object. Includes the field values in both the fail message and a warn log so the oncall engineer can tell this path apart from the other "failed" branches. T6 (shellExecutionService.ts:1452) — leaked PTY post-promote listeners `ptyProcess.onData(...)` returns an `IDisposable` that was being discarded; same for `onExit`. The `'error'` listener function was also not captured (no way to `removeListener` it). EventEmitter holds refs to listener closures, which transitively hold refs to `onPostData` / `onPostSettle` / the caller's `promoteArtifacts`. While bounded by the PTY's lifetime, the closures keep the caller's state pinned for the post-settle delay window. Captures all three handles into `postPromoteDataDisposable` / `postPromoteExitDisposable` / `postPromoteErrorListener`, then releases them via a shared `disposePostPromoteListeners()` call from `firePostSettle` (idempotent — each slot null-checked and nulled after disposal). Tests: +1 service test for IDisposable + error-listener cleanup; +2 shell.ts tests for buffer drain race and catch-path snapshot fallback. Existing tests stay green (262 → 265 in the touched suites; 7819 → 7822 across the core package).
CI build failed across all platforms with src/tools/shell.test.ts(4395,15): error TS6133. The variable was a leftover from copying the queued-settle test pattern; the wave-3 T2 test inspects writeStreamMock.write call history directly and never reads the registry, so the assignment is dead code. Drop it.
…-v4-pro
T1 (Critical, shellExecutionService.ts:860 child_process onSettle
exactly-once)
The PTY path used a `firePostSettle` latch but child_process wired
`close` and `error` independently to `onPostSettle`. A spawn-side
error followed by Node's auto-emitted `'close'` would call the
caller's settle TWICE, racing the registry transition. Added the
same single-fire latch on the child_process path.
T2 (Critical, shell.ts:2264 handoff race reorder)
Original order was `write(snapshot) -> drain buffer -> assign stream`.
Synchronous today (no race in current code), but assign-after-drain
leaves a hazard for any future refactor that adds an `await` inside
the drain loop — a chunk arriving in that window would land in
`promoteArtifacts.buffer`, then post-assign chunks would write to
the stream first, producing out-of-order bytes until the settle
drain. Reordered to `write(snapshot) -> assign stream -> drain
buffer`, which closes the hazard regardless of future async
additions.
T3 (Suggestion, shellExecutionService.ts:816 decoder flush gated
on onSettle)
The trailing-multibyte flush ran inside the `child.once('close', ...)`
handler, which was only installed when `onSettle` was set. An
`onData`-only caller (no onSettle) lost trailing continuation
bytes silently. Hoisted flush into `flushPostPromoteDecoders`
called from `firePostSettle`, and made `firePostSettle` available
on the `'close'` path independent of onSettle (T6 install).
T4 (Suggestion, shell.ts:1700 promoted ANSI passthrough)
The regular `executeBackground` path strips ANSI before writing to
`bg_xxx.output`; the promoted-foreground onData path appended raw
chunks. Reading `bg_xxx.output` after Ctrl+B showed plain text up
to the snapshot then raw `\x1b[31m` / cursor-move / clear-screen
sequences for the post-promote tail — unreadable. Apply
`stripAnsi(rawChunk)` before write/buffer, matching the
executeBackground contract.
T5 (Suggestion, shellExecutionService.ts:786 UTF-8 hardcoded)
The post-promote child_process decoders were hard-coded to
`new TextDecoder('utf-8')`, but the foreground decoder runs
encoding detection via `getCachedEncodingForBuffer`. On a non-UTF-8
child (e.g. GBK on a Chinese Windows shell), the snapshot decoded
correctly but the post-promote tail was mojibake. Capture the
foreground decoder's `.encoding` property and reuse it for
post-promote (with utf-8 fallback if foreground hadn't seen any
bytes yet, and a try/catch around `new TextDecoder` for the rare
unsupported-encoding case).
T6 (Suggestion, shellExecutionService.ts:1540 `error` listener
gated on onSettle)
The post-promote `error` listener was attached only when `onSettle`
was set. An `onData`-only caller still had the foreground
errorHandler detached; a post-promote spawn error would then crash
the CLI via Node's unhandled-error default. Hoisted the close +
error listeners into `if (postPromote)` so any caller opting into
post-promote gets crash protection; if `onSettle` is absent the
listeners log + drop instead of routing.
T7 (Suggestion, shellExecutionService.ts:791 onSettle-only
pipe-block deadlock)
Same root cause as T6: when only `onSettle` is set, the foreground
`stdout`/`stderr` 'data' listeners are detached and no post-promote
listener replaces them. The Readables stay paused, the OS pipe
buffer fills (~64KB on Linux), the child blocks on `stdout.write`,
'close' never fires, onSettle never fires. Added `child.stdout?.resume()`
and `child.stderr?.resume()` in the no-onData branch so the child
can drain its pipes and reach exit.
T8 (Suggestion, shell.ts:2614 dead inspectLine ternary)
`inspectLine`'s ternary returned the same string on both sides —
copy-paste leftover from when the other two adjacent ternaries
(statusLine / stopLine) were correctly varied. Collapsed to a
single string assignment.
Tests: +5 regression tests (4 child_process: T1 double-fire latch,
T3 onData-only flush, T6 onData-only error survives, T7 onSettle-
only resume; +1 shell.ts: T4 ANSI strip).
265 -> 270 in the touched suites; 7822 -> 7827 across the core
package; full suite green.
…(TS2345)
CI lint failed on the wave-4 (T3 / T6) tests with TS2345: pushing
ShellOutputEvent into Array<{type:string;chunk:unknown}> narrows
incompatibly. Switch to ShellOutputEvent[] (matches earlier helpers
at lines 758/966) and discriminate the union via .type === 'data'
when reading .chunk so the narrowed multibyte assertion still
type-checks.
- Tighten JSON/stream-json paragraph: not all failures emit a result to stdout (exit 53 / exit 130 are stderr-only); check exit code first - Fix suppressed-sibling retry guidance: re-issue in a separate turn that does not include structured_output (avoids re-suppression) - Distinguish settings-deny (exit 53) from --exclude-tools (exit 1) in Permission gating section - Replace <projectDir> placeholder with actual path ~/.qwen/projects/<sanitized-cwd>/chats/<sessionId>.jsonl in both docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…p registration Forward audit against source code found that the Permission gating section incorrectly distinguished settings.permissions.deny (claiming tool stays visible, exit 53) from --exclude-tools (claiming declaration stripped, exit 1). Both go through the same mergedDeny → isToolEnabled path and both prevent registration — the model never sees the tool. Corrected both docs to reflect the actual mechanism: typical outcome is plain text (exit 1), with maxSessionTurns (exit 53) as the fallback if the model loops through other tools.
- Clarify validation is client-side Ajv, not provider-side - Qualify "same way" with DeclarativeTool abstraction parenthetical - Match symptom→cause structure for maxSessionTurns hint - Expand $ref workaround with concrete $defs example - Clarify Dual Output See Also doesn't require --json-schema
5cb27cf to
99212aa
Compare
tanzhenxin
left a comment
There was a problem hiding this comment.
Review
This re-review covers the documentation commits pushed since the last round, through the "address doudouOUC's May 17 review" commit. All three accuracy issues from the prior review are now fixed and verified against the shipped source, and the newly added content is factually sound.
The chat-recording path is corrected in both documents to the canonical ~/.qwen/projects/<sanitized-cwd>/chats/<sessionId>.jsonl. The empty-schema callout no longer claims an empty schema "always" yields null — it now correctly explains that {} passed by the model produces {}, with the argument-normalization behavior spelled out. The shutdown holdback is now described as "capped at ~500 ms" with the early-exit condition, matching the bounded loop guard. Spot-checks of the new text (client-side Ajv validation framing, the $defs workaround example, exit codes, redaction placeholder, and envelope shape) all match the code.
One non-blocking nit: the reviewer-driven restructure of the maxSessionTurns hint (from "three causes" to "one symptom + two causes") was applied to the design-doc prose but not to the user doc or the design doc's own failure-paths table, so the two now use slightly different framing for the same hint. This is cosmetic — all three items are genuinely present in the hint — and does not block merge.
Verdict
APPROVE — All prior findings fixed and verified against source; the new content is accurate. The remaining wording inconsistency is cosmetic.
doudouOUC
left a comment
There was a problem hiding this comment.
All five comments from the initial review have been addressed in the latest push:
- "function-calling API" phrasing — now correctly says "tool-call infrastructure…client-side (via Ajv in
BaseDeclarativeTool.build())" - "See also" Dual Output — clarified as "a different approach to machine-readable output; does not require
--json-schema" - "three common causes" framing — rewritten to symptom → cause structure
- Provider-agnostic claim — grounded with "(via the
DeclarativeToolabstraction)" - Root
$refworkaround —{ … }replaced with concrete$defsexample
Additional improvements in the latest push also look good: table formatting, corrected chat-recording path (~/.qwen/projects/…), improved is_error / failure-mode docs, and expanded permission-deny behavior description.
LGTM — solid documentation for a non-trivial feature.
1. Privacy/redaction section: note hooks as intentionally non-redacted surface (matches user-doc "Hooks see raw args" callout). 2. Dual call-site section: clarify differing post-helper termination flow between main-turn (direct return) and drain-turn (sentinel hop).
1. Failure-paths table: align "three common causes" cell with the
symptom→cause framing already used at parse-time validation pipeline
section ("common stuck-run symptom and its two likely causes").
2. Dual call-site section: fix factual inaccuracy from prior commit —
`drainOneItem` is `async (): Promise<void>` and returns nothing.
The two-hop termination is via closure-mutated `structuredSubmission`
(set by `processToolCallBatch`, checked by `drainLocalQueue` and the
holdback loop), not a return-value sentinel.
doudouOUC
left a comment
There was a problem hiding this comment.
Both follow-up comments from the second round have been addressed:
- Failure paths table wording — "three common causes" aligned to the symptom → cause framing used elsewhere in the doc.
- Drain-turn "returns a sentinel" inaccuracy — rewritten to correctly describe the closure-variable (
structuredSubmission) mechanism with the two-hop check path.
No remaining issues. User doc and design doc are internally consistent and accurate against source. LGTM.
Co-authored-by: Bryan Morgan <bryanmorgan@google.com>
Summary
Follow-up to #3598 (cli/core feature shipped to main without docs).
docs/users/features/structured-output.md— quick-start, schema input (inline +@path), output shapes per--output-format, parse-time restrictions, retry/failure modes, privacy redaction, permission gating, MCP shadow-tool handling, workedjq-piped pipeline example. Registered infeatures/_meta.ts.docs/design/structured-output/structured-output.md— synthetic-tool approach, four-stage parse-time validation pipeline,schemaRootAcceptsObject's decided-vs-deferred boundaries, main-turn vs drain-turn parity viaprocessToolCallBatch, structured-success terminal block, cross-surface privacy redaction viaSTRUCTURED_OUTPUT_REDACTED_ARGS, subagent context handling (forSubAgent), MCP shadow-tool guard, compatibility surface, alternatives considered (and why rejected), file-by-file index.Both docs are English-only — repo convention is English-only for both
docs/users/features/(zero zh-CN siblings) anddocs/design/(onlycustomize-banner-area/has a zh-CN twin). Happy to add zh-CN as a follow-up if there's demand.No code changes. No new tests needed.
Test plan
#3598as merged