feat(telemetry): Phase 2 — tool.blocked_on_user + hook spans (#3731)#4321
Conversation
Adds two OTel span types under the existing hierarchical session-tracing infrastructure (#3731 Phase 2; depends on Phase 1 #4126 and Phase 1.5 #4302): 1. `qwen-code.tool.blocked_on_user` — brackets the time a tool spends in awaiting_approval waiting for the user. Child of the tool span. Records decision (proceed_once / proceed_always / cancel / aborted / auto_approved) and source (cli / ide / hook / auto / system). Status stays UNSET — waiting is neither OK nor ERROR. 2. `qwen-code.hook` — wraps each pre/post-hook fire site so a slow hook can be told from a slow tool. Records hook_event (PreToolUse / PostToolUse / PostToolUseFailure), tool_name, shouldProceed, shouldStop, blockType, hasAdditionalContext. Status stays UNSET on intentional blocking decisions; ERROR only when the hook itself throws. To make blocked_on_user a child of the tool span, the tool span lifecycle moved from `executeSingleToolCall` to `_schedule`'s validating-loop — covering validating → awaiting_approval → executing in one span. Two new private Maps on CoreToolScheduler hold span refs across method boundaries (callId-keyed). Centralized cleanup via `finalizeToolSpan` / `finalizeBlockedSpan` private helpers ensures every terminal status path also ends the corresponding span. Eight terminal sites now finalize the tool span: signal.aborted at loop entry, hard deny, plan-mode block, non-interactive deny, permission-hook deny, background-agent deny, _schedule catch, executeSingleToolCall finally. Five blocked_on_user end sites: handleConfirmationResponse cancel and proceed branches, autoApproveCompatiblePendingTools, _schedule catch under signal.aborted, and the global-error catch. ModifyWithEditor stays inside one blocked_on_user span until the final proceed/cancel — the duration_ms reflects total user think-time including editor side trips. Six hook fire sites are wrapped: firePreToolUseHook, firePostToolUseHook, and four safelyFirePostToolUseFailureHook variants (success-path interrupt, toolResult.error path, catch-path interrupt, catch-path real exception). fireNotificationHook is intentionally NOT wrapped — it's fire-and-forget and the duration is meaningless. Mirrors claude-code's session-tracing pattern but deliberately diverges on one point: every end-helper takes the span object explicitly via `getSpanId(span)` lookup instead of `findLast`-by-type. Under concurrent tool calls, claude-code's findLast can end the wrong blocked span; passing the ref directly is concurrency-safe. Tests: - session-tracing.test.ts: 11 new tests covering parent resolution (explicit parent for blocked_on_user, ALS-based for hook), idempotent end, NOOP behavior, error-status mapping, and a concurrency regression test (two parallel blocked spans ended in reverse order). - coreToolScheduler.test.ts: mock extended with the four new helpers and two new metadata fields. New tests cover the tool span outliving a pre-hook deny path, blocked_on_user ending with cancel via the awaiting_approval flow, hook span recording shouldProceed=false / blockType='denied' on pre-hook block and shouldStop=true / blockType='stop' on post-hook stop, and a leak guard that asserts every recorded lifecycle span is ended after a successful tool call. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
📋 Review SummaryThis PR implements Phase 2 of the hierarchical session-tracing plan (#3731), adding two new OTel span types ( 🔍 General Feedback
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
Adds Phase 2 hierarchical session-tracing spans to improve observability of (1) time spent awaiting user approval for tool calls and (2) hook execution latency/behavior around tool calls.
Changes:
- Introduces
qwen-code.tool.blocked_on_userspan helpers and wires them into the tool confirmation state machine. - Introduces
qwen-code.hookspan helpers and wraps pre/post hook fire sites inCoreToolScheduler. - Updates telemetry exports/constants and expands unit tests for new span types and scheduler lifecycle behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/telemetry/session-tracing.ts | Adds span helpers for tool.blocked_on_user and hook, including end semantics and attributes. |
| packages/core/src/telemetry/session-tracing.test.ts | Adds unit tests covering parenting, attributes, idempotency, and concurrency behavior for new spans. |
| packages/core/src/telemetry/index.ts | Re-exports new helpers and related types. |
| packages/core/src/telemetry/constants.ts | Adds span name constants for the two new span types. |
| packages/core/src/core/coreToolScheduler.ts | Moves tool span lifecycle to cover validate→approval→execute; creates/ends blocked-on-user + hook spans at relevant sites. |
| packages/core/src/core/coreToolScheduler.test.ts | Extends scheduler telemetry tests to assert new spans are created/ended and metadata is propagated. |
Comments suppressed due to low confidence (2)
packages/core/src/core/coreToolScheduler.ts:2026
- Same issue as earlier: the fallback tool-span creation passes
{ tool_name, call_id }intostartToolSpan, which will record nonstandard attributes and omit'tool.call_id'. Use namespaced attribute keys (e.g.{ 'tool.call_id': callId }) and avoid duplicating tool name (already set by the first argument).
let toolSpan = this.toolSpans.get(callId);
if (!toolSpan) {
toolSpan = startToolSpan(toolName, {
tool_name: toolName,
call_id: callId,
});
this.toolSpans.set(callId, toolSpan);
packages/core/src/core/coreToolScheduler.ts:1829
- Proceed path always finalizes the blocked span with
source: 'cli', buthandleConfirmationResponse()can be invoked by IDE diff resolution as well (seeopenIdeDiffIfEnabled). Suggest: choosesource: 'ide'when in IDE mode / when the confirmation came from the IDE integration, otherwisecli.
// Proceed: end the blocked span before execution begins. ProceedOnce
// and the three ProceedAlways* variants all close the awaiting phase.
// The tool span itself stays open and is finalized in
// executeSingleToolCall.
const decision: ToolBlockedDecision =
outcome === ToolConfirmationOutcome.ProceedOnce
? 'proceed_once'
: 'proceed_always';
this.finalizeBlockedSpan(callId, decision, 'cli');
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
… + silent-failure-hunter Eight discrete fixes plus two new tests, all surfaced in the Phase 2 review rounds. Grouped here because they touch the same handful of code paths. Copilot inline (#4321 PR): 1. startToolSpan attrs naming: drop redundant `tool_name` (helper already sets `'tool.name'` from the first arg) and rename `call_id` to the namespaced `'tool.call_id'`. Two sites: `_schedule` validating-loop start, and the defensive fallback in executeSingleToolCall. Without this, traces emit non-namespaced `tool_name` / `call_id` attributes that consumers grepping for `tool.call_id` miss. 2. PreToolUse hook span: propagate the actual `preHookResult.blockType` ('denied' / 'ask' / 'stop') instead of collapsing every block to 'denied'. Also record `hasAdditionalContext` for parity with the PostToolUse / failure-hook spans. 3. blocked_on_user `source` detection: use `config.getIdeMode()` (best- effort) so IDE-driven decisions don't all show up as `'cli'`. Centralized in a new `getBlockedSource()` helper. silent-failure-hunter / code-reviewer: 4. Hook span error-tracking is dead code. firePreToolUseHook / firePostToolUseHook / safelyFirePostToolUseFailureHook all swallow throws internally — every `catch (e) { endMeta = { error, ... }; throw e }` block in the scheduler was unreachable. Simplify all 6 sites to `try { ... } finally { endHookSpan(...) }`. The default `endMeta = { success: false }` keeps the span sensible if a future hook impl decides to throw. 5. handleConfirmationResponse had no error handling. modifyWithEditor / _applyInlineModify / attemptExecutionOfScheduledCalls can throw and would otherwise leak both the tool span and the blocked_on_user span until the 30-min TTL fires. Wrap the body in a try/catch that finalizes both spans on rethrow. Extracted the body to `_handleConfirmationResponseInner` for clarity. 6. Add `'error'` to the `ToolBlockedDecision` union for system-error closes, so dashboards counting `decision: 'cancel'` don't get polluted by thrown exceptions. 7. _schedule's outer catch was labelling its non-aborted close as `'cancel'`. Switch to `'error'` (uses #6). 8. signal.aborted vs explicit user Cancel: when both are true, the old code reported `'aborted'/'system'` even though the user actually clicked Cancel. Reverse the precedence so `outcome === Cancel` wins, with `getBlockedSource()` for the source. Tests: - T1: extend the existing ProceedAlways auto-approve test to assert the two siblings' blocked spans end with `decision: 'auto_approved'`, `source: 'auto'`, while the first tool ends as `'proceed_always'`/cli. - T2: existing cancel-during-confirmation test now also asserts exactly one blocked span is recorded for the lifecycle — the same invariant ModifyWithEditor's intentional preservation across editor side trips must not break. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
… behaviors Two follow-ups from the post-#6767469b2 review pass on PR #4321: 1. autoApproveCompatiblePendingTools error path was logging-only and leaving the sibling tool's blocked_on_user span open until the 30-min TTL fires. Symmetric with the success branch's finalizeBlockedSpan('auto_approved', 'auto'), the catch now finalizes with ('error', 'system') so the trace deterministically explains why the sibling didn't auto-approve. 2. Three behaviors introduced by 6767469 had no test coverage: - decision='error' from _schedule's outer catch when getConfirmationDetails throws (asserts tool span ends, no blocked span ever opens since the throw happens pre-awaiting_approval). - source='ide' when getBlockedSource() honors getIdeMode (Cancel path with getIdeMode: () => true). - Explicit Cancel takes precedence over a concurrent signal.aborted in the decision label — the bug the precedence flip was meant to fix is now regression-tested. Extracted a small `buildApprovalScheduler` helper for the two awaiting_approval-flow tests; the throw-on-confirmation test reuses StructuredErrorOnConfirmationTool. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
…odex P3) The previous commit 32f94d3 added a `finalizeBlockedSpan(callId, 'error', 'system')` to the autoApproveCompatiblePendingTools catch in the name of "symmetry with the success branch". Codex review pointed out the bug: that catch fires when evaluatePermissionFlow throws for a SIBLING tool, but the sibling itself is still in `awaiting_approval` — the user can still respond. By closing the blocked span at the catch, the eventual handleConfirmationResponse → finalizeBlockedSpan call becomes a no-op (Map.delete already cleared it), and the user's actual decision / source attributes are lost from the trace. Revert that line. The previous behavior was correct: log the error, leave the span open, let the user's eventual decision close it correctly. If the user never responds, the 30-min TTL in session-tracing.ts cleans up the orphan span — same fallback that already covered every other "user walks away" scenario. The "leak" the original change was trying to fix was a phantom: the span IS finalized once the user (or the abort signal) drives the tool to a terminal state. The TTL is just the safety net. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Codex review disposition (
|
wenshao
left a comment
There was a problem hiding this comment.
[Critical] coreToolScheduler.ts:1835 — ModifyWithEditor silent failure leaks blocked + tool spans
When getPreferredEditor() returns undefined, the bare return in _handleConfirmationResponseInner leaks both the blocked span (created in _schedule) and the tool span. No log, no user feedback — the tool silently stays in awaiting_approval until the 30-min TTL fires.
if (!editorType) {
debugLogger.warn(
`ModifyWithEditor requested for ${callId} but no editor available`,
);
this.finalizeBlockedSpan(callId, 'error', 'system');
return;
}
…ecision Two #4321 review comments from wenshao, both Critical: 1. `TOOL_FAILURE_KIND_PRE_HOOK_BLOCKED` was being emitted for FIVE distinct non-PreToolUse-hook deny paths in `_schedule`: - finalPermission === 'deny' (hard deny) - plan-mode block - non-interactive deny - permission_request hook deny - background-agent deny Dashboards filtering by `failure_kind = 'pre_hook_blocked'` were silently picking up all of these, undermining the attribute. Add distinct constants + status messages for each path. The original PRE_HOOK_BLOCKED label is now used at exactly one site — the actual PreToolUse hook deny in `_executeToolCallBody`. 2. `decision: 'proceed_once'` was untested. Existing tests covered 'cancel' and 'proceed_always' (auto-approve) but not the most common user interaction. Add a test that schedules an approval-required tool, confirms with ProceedOnce, and asserts the blocked span ends with `decision: 'proceed_once'`, `source: 'cli'`. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Three review items folded into one follow-up: 1. wenshao Critical (`coreToolScheduler.ts:1851`) — `ModifyWithEditor` path silently returned when `getPreferredEditor()` was undefined, leaking blocked + tool spans on user-walks-away. Add a `debugLogger.warn` so the silent failure is at least visible in debug telemetry. Deliberately do NOT finalize spans here, matching the Codex P3 / autoApprove decision: ModifyWithEditor stays inside one awaiting period, the user can still recover via Cancel/Proceed which closes the spans correctly, and the 30-min TTL is the safety net for give-up scenarios. Finalizing prematurely would make the user's eventual decision a no-op (Map already cleared) and lose the actual decision/source attributes. 2. Bot summary Medium (`session-tracing.ts:557-562`) — add a `debugLogger.debug` when `startToolBlockedOnUserSpan` falls back to `resolveParentContext` because the tool span isn't in `activeSpans` anymore. Helps diagnose unexpected ordering during development. 3. Bot summary Low (`constants.ts`) — JSDoc the two new span name constants. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
| # | Item | Disposition |
|---|---|---|
| Medium 1 | session-tracing.ts:557-562 debug log on fallback |
✅ Adopted |
| Medium 2 | finalizeToolSpan / finalizeBlockedSpan not in telemetry exports |
❌ Push back — they're explicitly private scheduler-state-machine helpers, not telemetry public API |
| Low 1 | debugLogger.error instead of warn on span end failure |
❌ Push back — span end failures are extremely rare, recoverable, and don't break the service. warn is the existing convention |
| Low 2 | Extract safelyEndSpan helper |
❌ Push back — DRY across 3 sites with ~5 lines each isn't worth the indirection. Existing code structure is consistent |
| Low 3 | JSDoc on new SPAN constants | ✅ Adopted |
| Low 4 | Be more explicit about ProceedAlways* outcomes |
❌ Push back — comment is accurate; listing ProceedAlways / ProceedAlwaysProject / ProceedAlwaysUser would be noise |
Verification
- 191 / 191 tests pass
tsc --noEmitclean- pre-commit clean
🤖 Generated with Qwen Code
…ToolSpan param Two #4321 review Suggestions from wenshao: 1. The 6 hook fire sites (PreToolUse, PostToolUse, 4× PostToolUseFailure) each repeated the same try/finally + endMeta init + endHookSpan pattern. Future hook span protocol changes had to be made in lockstep. Extract a private generic helper: withHookSpan<T>(opts, fn, toEndMeta): Promise<T> Each fire site collapses from ~12 lines of try/finally scaffolding to ~3 lines passing in the fire callback + endMeta builder. The `let postHookResult!:` definite-assignment hack at the PostToolUse site is gone because the helper returns the awaited result directly. 2. `finalizeToolSpan(callId, metadata?)` had a dead `metadata` parameter — every caller pre-sets the span status via `setToolSpan{Failure,Cancelled}` and called `finalizeToolSpan` with no argument. Removed the parameter. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…id back-compat Three #4321 review threads from wenshao (#4321 codex P3-equivalent + two structural concerns): 1. **[Critical] Hook spans reported success on swallowed hook failures.** firePreToolUseHook / firePostToolUseHook / firePostToolUseFailureHook (and the safelyFire wrapper in coreToolScheduler) all catch transport / dispatch errors internally and return safe defaults. Before this fix, withHookSpan's `toEndMeta` ran on the safe default and recorded `success: true` — a crashing hook was indistinguishable from one that allowed execution. Add a `hookError?: string` field to the three result types, populate it in each catch, and have all 6 toEndMeta callbacks return `{ success: false, error: hookError }` when present. Existing "graceful error" tests updated to expect the new field. 2. **[Suggestion] ensureCleanupInterval not kicked from new helpers.** The 30-min TTL cleanup safety net for leaked spans only starts when `startInteractionSpan` is first called. Sub-agent or side-query code paths that call `startToolBlockedOnUserSpan` / `startHookSpan` without an interaction span first never trigger cleanup. Both helpers now call the (idempotent) `ensureCleanupInterval()` early. 3. **[Suggestion] `call_id` → `'tool.call_id'` rename is breaking for downstream consumers.** Phase 1's `startToolSpan(name, { tool_name, call_id })` shipped non-namespaced attribute keys. My Phase 2 #4321 review-fix dropped both. Dual-emit `call_id` (legacy alias) + `'tool.call_id'` for one release cycle so existing dashboards / alerts don't silently return zero. Comment notes the legacy key is removed in the next release. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
Additional observations (not on diff lines):
-
firePermissionRequestHooknot wrapped withwithHookSpan(coreToolScheduler.ts:1734): The PR wraps 6 hook fire sites with span instrumentation, butfirePermissionRequestHook— a decision-making hook that can return allow/deny — is not wrapped. TheHookEventtype also lacks a'PermissionRequest'variant. This makes the permission hook's latency and decision invisible in traces. If this exclusion is intentional, a comment at the call site noting the rationale would help future maintainers. -
endToolBlockedOnUserSpannever sets ERROR status (session-tracing.ts:706): When a blocked span is closed withdecision: 'error'(e.g., system exception during approval flow), the span status stays UNSET. Operators filtering bySpanStatusCode.ERRORin trace backends won't surface these. Consider setting ERROR status whenmetadata?.decision === 'error'. -
Test coverage gaps (pattern): Three specific new code paths lack test coverage: (1)
truncateSpanError— no test passes an oversized string to verify truncation; (2) TTL sweeptool.blocked_on_userbranch — the type-specificdecision: 'aborted'stamping is untested; (3)endHookSpanidempotency — the blocked_on_user span has an idempotency test but the hook span does not.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
Adopted ([Suggestion]):
- coreToolScheduler.ts: `setToolSpanFailure` now applies
`truncateSpanError` to the status message at this single ingress
point. Many of its 10+ call sites pass raw `error.message` which can
be unbounded — the same backend-drop risk that drove
`truncateSpanError` for the endXSpan attribute writes. Static-
constant callers see no change since their messages are well under
the 1024-char cap. Required exporting `truncateSpanError` from
`session-tracing.ts` and re-exporting from `telemetry/index.ts`.
- coreToolScheduler.ts: in `_schedule`, after the for-loop runs to
completion, drop the abort listener if `batchState.callIds.size === 0`.
Closes the all-error-batch leak path: if every newToolCall had
`status !== 'validating'` (e.g., invalid params, tool not registered,
queue full), no `finalizeToolSpan` ever fires for the batch and
`releaseBatchListenerIfDrained` is never invoked. Without this drop,
one dead listener accumulates per all-error batch.
- coreToolScheduler.ts: `handleConfirmationResponse` outer catch now
emits a `debugLogger.warn` before rethrowing. Without it, if the
caller (CLI confirmation UI layer) doesn't log the rejection, the
error disappears from application logs entirely — operators
grepping by callId would see nothing despite the trace backend
showing `failure_kind: tool_exception`.
- session-tracing.test.ts: 4 new tests
* `truncateSpanError` returns short strings unchanged
* `truncateSpanError` truncates over 1024 chars + appends sentinel
* `truncateSpanError` boundary at exactly 1024 chars
* TTL sweep stamps `decision: 'aborted'` + `source: 'system'` on
blocked_on_user spans (covers the branch added in review-3 round)
Pushed back ([Suggestion]):
- "TTL sweep can't reach scheduler-local Maps" — accurate but the fix
is non-trivial: a parallel scheduler-side TTL sweep duplicates the
session-tracing sweep's bookkeeping, and the practical impact is
bounded (Maps die with the scheduler instance, which is per-session
in CLI mode). The bigger leak (listener accumulation on shared
signals) is already covered by `releaseBatchListenerIfDrained`.
Marking as out-of-scope architectural follow-up.
Tests: 259/259 across affected files (coreToolScheduler 159 +
session-tracing 53 + toolHookTriggers 47). `tsc --noEmit` clean.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- coreToolScheduler.test.ts: convert the `truncateSpanError` mock from an inline identity function to `vi.fn(identity)` so individual tests can substitute a sentinel return. Added regression test `setToolSpanFailure forwards the truncateSpanError result to the span status (#4321)` that overrides the spy with `<<TRUNCATED-SENTINEL>>`, drives the scheduler through the pre-hook deny path, and asserts the span's ERROR status message equals the sentinel — locks the integration so a regression dropping the `truncateSpanError(message)` call inside `setToolSpanFailure` is caught at the scheduler boundary rather than only at the utility's unit test. Tests: 213/213 across `coreToolScheduler.test.ts` (160) + `session-tracing.test.ts` (53). `tsc --noEmit` clean. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
… review on PR #4321 Comprehensive self-review (code-reviewer + silent-failure-hunter + type-design-analyzer + pr-test-analyzer agents) after 6 rounds of bot feedback turned up 4 remaining actionable items. Addressed: [silent-failure-hunter HIGH-1] toolHookTriggers.ts: when the hook runner returns `{ success: false }` (or missing output) with no `error.message`, the 3 fire helpers used to silently return the safe default — `{ shouldProceed: true }` / `{ shouldStop: false }` / `{}` — producing a hook span that reads `success: true` and looked like a clean allow in dashboards. Now synthesizes a sentinel hookError describing the contract violation so the span records the failure. Three existing test cases updated to assert the new sentinel-bearing shape. [silent-failure-hunter HIGH-2] coreToolScheduler.ts: synchronous throws in `_executeToolCallBody`'s prelude (addToolInputAttributes, getMessageBus, startToolExecutionSpan, etc.) propagated up to `executeSingleToolCall`'s `finally` without ever hitting setToolSpan*, so the tool span ended UNSET with no failure_kind AND the tool call stayed in 'executing' forever (checkAndNotifyCompletion never sees terminal state, scheduler hangs). Added a catch in executeSingleToolCall that pre-sets failure status + an error response before the finally finalizes — guards every prelude path the body's own try/catch doesn't cover. [silent-failure-hunter MEDIUM-3] session-tracing.ts: the empty catch on `sweepStaleSpans` `setAttributes` lost the `ttl_expired` + `decision: 'aborted'` sentinel attrs silently if setAttributes ever threw. Now matches the sibling `span.end()` catch and surfaces via `debugLogger.warn` — TTL-leaked blocked spans stay distinguishable from deliberately-UNSET ones in dashboards. [pr-test-analyzer Gap1, severity 7] coreToolScheduler.test.ts: the `signal.aborted` re-check at `_schedule:1834` (round-3 fix that prevents opening a blocked span on an already-aborted signal between the for-loop's await points and the awaiting_approval transition) had no regression test. Added one that uses a tool whose `getConfirmationDetails` aborts the signal before returning — top of loop check passes, getConfirmationDetails resolves and aborts, re-check fires the cancel path. Asserts `tool.failure_kind === 'cancelled'` AND that NO blocked_on_user span was ever started. Tests: 261/261 across affected files (coreToolScheduler 161 + session-tracing 53 + toolHookTriggers 47). `tsc --noEmit` clean. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Final self-review pass — commit
|
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] Missing test for the pre-body throw catch path in _executeToolCallBody (coreToolScheduler.ts:2386–2420). A test injecting a config where getMessageBus() or addToolInputAttributes throws before the scheduled → executing transition would have caught the Critical finding above and prevents future regressions.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
All three from the same /review run; all valid (the Critical is a real
bug in the SF-H2 fix from review-7 that this commit fixes).
[Critical] coreToolScheduler.ts:2407 — the `c.status === 'executing'`
guard on the prelude-throw catch was wrong. Prelude throws happen
BEFORE the `scheduled → executing` transition in `_executeToolCallBody`
(getMessageBus is called at line 2460, scheduled→executing flips at
line 2522). The `find(... 'executing')` skipped the setStatusInternal,
so the toolCall stayed in `scheduled` forever and
checkAndNotifyCompletion never fired — exactly the stall the SF-H2 fix
was supposed to prevent. Drop the guard; setStatusInternal already
no-ops on terminal states (success/error/cancelled) so the
unconditional call covers both scheduled-prelude and executing-body
paths. Added regression test that makes getMessageBus throw and
asserts onAllToolCallsComplete fires with status='error'.
[Suggestion] session-tracing.ts:222 — truncateSpanError used
`slice(0, 1024)` on UTF-16 code units, which splits surrogate pairs
when an emoji (e.g. 🚀) or rare CJK character sits at the boundary.
The result was a lone high surrogate followed by `'…[truncated]'` —
strict OTLP/gRPC collectors reject batches with invalid UTF-8 (a lone
high surrogate encodes to an invalid byte sequence). Back up one code
unit when the cut lands on a high surrogate. Added regression test
that constructs the boundary case (1023 'a' + 🚀 + padding) and
asserts the truncated string is valid UTF-16.
[Suggestion] toolHookTriggers.ts:133/240/319 — switched `||` to `??`
in the 3 hookError sentinel sites. `||` treats empty string as falsy
so a runner returning `{ error: { message: "" } }` triggered the
sentinel instead of preserving the (unhelpful but distinct) empty
message — a runner contract violation that's worth distinguishing
from a missing-message case. `??` synthesizes only when the message
is truly absent (undefined / null).
Tests: 263/263 across affected files (coreToolScheduler 162 +
session-tracing 54 + toolHookTriggers 47). `tsc --noEmit` clean.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
[Critical] coreToolScheduler.ts — `handleConfirmationResponse`'s catch
was misattributing sister-tool prelude throws to the confirmed tool's
span. The catch wrapped `_handleConfirmationResponseInner`, which
called `attemptExecutionOfScheduledCalls` at its tail. If the user
proceeds tool A with ProceedAlways, `autoApproveCompatiblePendingTools`
transitions sister tools B/C to `scheduled`, and B has a prelude
throw, the SF-H2 catch in `executeSingleToolCall` re-throws → the
throw propagates up through `attemptExecutionOfScheduledCalls` → into
the outer catch keyed on A.callId, where `setToolSpanFailure(A.span,
TOOL_EXCEPTION, B.error.message)` corrupts A's span and
`finalizeToolSpan(A.callId)` ends A's span prematurely. A's actual
result later disappears from telemetry. Fix: move
`attemptExecutionOfScheduledCalls` out of
`_handleConfirmationResponseInner` and into
`handleConfirmationResponse` after the try/catch. The catch now
covers only confirmation logic; each tool's
`executeSingleToolCall` already handles its own span lifecycle via
its own catch.
[Suggestion] toolHookTriggers.ts — reverted the round-8 `??` change
back to `||`. Downstream consumers in coreToolScheduler.ts gate on
`r.hookError ? ...`, so an empty-string `hookError` preserved by
`??` was silently dropped — the change defeated its own stated
intent. Empty-string runner error messages carry no operator value;
the sentinel ("hook runner returned ... without error detail") is
more actionable, and `||` matches existing downstream truthiness
semantics.
[Suggestion] session-tracing.test.ts — replaced the vacuous
`Buffer.from(truncated, 'utf16le')` assertion (which never throws
because Node's Buffer copies raw 16-bit code units without validating
surrogate pairs) with the suggested regex
`/[\uD800-\uDBFF](?![\uDC00-\uDFFF])/` that actually checks for
orphan high surrogates anywhere in the string.
Tests: 263/263 across affected files (coreToolScheduler 162 +
session-tracing 54 + toolHookTriggers 47). `tsc --noEmit` clean.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Follow-up commit
|
…#4321 [Suggestion] gpt-5.5 review-10: the round-9 `??` → `||` revert was correct, but the existing tests only covered the missing-error case (`success: false` with no `error` field). A future regression back to `??` would still pass those tests while reintroducing the silent-drop behavior the revert was guarding against. Add 3 explicit tests — one per fire helper (PreToolUse, PostToolUse, PostToolUseFailure) — that pass `{ error: { message: '' } }` and assert the sentinel hookError is synthesized (not the empty string). Pins the `||` semantics so any future `??` change fails the suite. Tests: 50/50 in toolHookTriggers.test.ts (47 → 50). `tsc --noEmit` clean. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
pomelo-nwu
left a comment
There was a problem hiding this comment.
Reviewed the full diff and cross-checked against the existing review threads (Copilot + wenshao). No unresolved critical issues from my side:
- Both prior
[Critical]findings are addressed — the TTL sweep now stamps sentinel attrs (ttl_expired/duration_ms/decision/source) and logs on stale spans; the ModifyWithEditor no-editor path is no worse than pre-PR behavior (kept inawaiting_approval+ log + span attr + 30-min TTL safety net). - Spot-checked the highest-risk areas — the tool-span lifecycle move (
validating → awaiting_approval → executing), the prelude-throw path (covered by the new injected-throw test), abort-listener cleanup, and thehandleConfirmationResponserefactor — none change tool-execution control flow or leak spans.
Remaining items are non-blocking suggestions only. LGTM 👍
中文
已通读完整 diff,并与现有 review(Copilot + wenshao)做了交叉核对。从我这边看没有未解决的 critical 问题:
- 此前两个
[Critical]都已落地 —— TTL 清扫现在会为 stale span 打上 sentinel 属性(ttl_expired/duration_ms/decision/source)并记日志;ModifyWithEditor 无编辑器路径不比 PR 之前更差(保持awaiting_approval+ 日志 + span 属性 + 30 分钟 TTL 兜底)。 - 抽查了最高风险处 —— tool span 生命周期上移(
validating → awaiting_approval → executing)、prelude 抛错路径(已被新增的注入抛错测试覆盖)、abort listener 清理、handleConfirmationResponse重构 —— 均不改变工具执行控制流,也不泄漏 span。
剩余仅为非阻塞的 suggestion。认可合并 👍
wenshao
left a comment
There was a problem hiding this comment.
Test coverage suggestions (cannot map to specific diff lines):
-
Sister-tool prelude throw — The design comment at
coreToolScheduler.ts:2045explicitly statesattemptExecutionOfScheduledCallsruns outside the catch to prevent a sister tool's prelude throw from corrupting the confirmed tool's span. No test validates this invariant. Suggested: schedule 2 tools, approve A, make B's prelude throw, assert A's span keeps OK status. -
All-invalid batch listener cleanup —
batchState.callIds.size === 0atcoreToolScheduler.ts:1963strips the abort listener when every tool fails pre-validation. No test exercises this path. In daemon sessions with many all-invalid batches, untested listener cleanup could causeMaxListenersExceededWarning.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
| let end = SPAN_ERROR_MAX_CHARS; | ||
| const code = s.charCodeAt(end - 1); | ||
| if (code >= 0xd800 && code <= 0xdbff) end--; | ||
| return s.slice(0, end) + '…[truncated]'; |
There was a problem hiding this comment.
[Suggestion] truncateSpanError output exceeds its own SPAN_ERROR_MAX_CHARS (1024) cap. The sentinel '…[truncated]' is 12 code units, so a 2000-char input produces 1036-char output (1024 + 12). Operators reading SPAN_ERROR_MAX_CHARS = 1024 expect output <= 1024 chars; if a backend enforces a 1024-byte attribute budget, this "safety" truncation still drops the span.
| return s.slice(0, end) + '…[truncated]'; | |
| const TRUNCATE_SENTINEL = '…[truncated]'; | |
| let end = SPAN_ERROR_MAX_CHARS - TRUNCATE_SENTINEL.length; | |
| const code = s.charCodeAt(end - 1); | |
| if (code >= 0xd800 && code <= 0xdbff) end--; | |
| return s.slice(0, end) + TRUNCATE_SENTINEL; |
Also add expect(truncated.length).toBeLessThanOrEqual(1024) to the existing test.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
| `PreToolUse hook error for ${toolName}: ${error instanceof Error ? error.message : String(error)}`, | ||
| ); | ||
| return { shouldProceed: true }; | ||
| const message = error instanceof Error ? error.message : String(error); |
There was a problem hiding this comment.
[Suggestion] Catch blocks construct hookError as error instanceof Error ? error.message : String(error) without the empty-string guard that the runner-contract-violation paths already apply. If a hook transport throws new Error(''), hookError is ''; downstream r.hookError ? ... (truthiness) silently drops it, and the span records success: true — the exact allow-without-telemetry pathology the || sentinel on the !response.success paths was designed to close.
Same issue at lines 277 (firePostToolUseHook catch) and 356 (firePostToolUseFailureHook catch).
| const message = error instanceof Error ? error.message : String(error); | |
| const message = | |
| (error instanceof Error ? error.message : String(error)) || | |
| 'hook threw with empty message'; |
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
yiliang114
left a comment
There was a problem hiding this comment.
LGTM — all critical findings from prior rounds are addressed with regression tests. The remaining two suggestions (sentinel-length overshoot in truncateSpanError, catch-block hookError empty-string edge case) are valid but non-blocking — can land as a follow-up.
Solid work on the span lifecycle move and the exhaustive terminal-path coverage. Ship it.
All concerns raised in this review have been addressed in subsequent commits. Dismissing as the feedback is no longer applicable to the current code.
Summary
Phase 2 of the hierarchical session-tracing plan in #3731. Adds two OTel span types on top of the infrastructure landed by Phase 1 (#4126) and Phase 1.5 (#4302).
qwen-code.tool.blocked_on_user— brackets the time a tool spends inawaiting_approvalwaiting for the user. Child of the tool span. Recordsdecision(proceed_once/proceed_always/cancel/aborted/auto_approved) andsource(cli/ide/hook/auto/system). Status stays UNSET — waiting is neither OK nor ERROR.qwen-code.hook— wraps each pre/post-hook fire site so a slow hook can be told from a slow tool. Recordshook_event(PreToolUse/PostToolUse/PostToolUseFailure),tool_name,shouldProceed,shouldStop,blockType,hasAdditionalContext. Status stays UNSET on intentional blocking; ERROR only when the hook throws.Architectural change
To make
blocked_on_usera child of the tool span, the tool span lifecycle moved fromexecuteSingleToolCallto_schedule's validating-loop — coveringvalidating → awaiting_approval → executingin one span. Two new private Maps onCoreToolSchedulerhold span refs across method boundaries (callId-keyed). Centralized cleanup viafinalizeToolSpan/finalizeBlockedSpanensures every terminal status path also ends the corresponding span._schedulecatch,executeSingleToolCallfinally.blocked_on_userend sites:handleConfirmationResponsecancel + proceed,autoApproveCompatiblePendingTools,_schedulecatch under signal.aborted, global-error catch. ModifyWithEditor stays inside one span until final proceed/cancel —duration_msreflects total user think-time including editor side trips.firePreToolUseHook,firePostToolUseHook, foursafelyFirePostToolUseFailureHookvariants.fireNotificationHook(permission_prompt) intentionally NOT wrapped — fire-and-forget, duration is meaningless.Diverges from claude-code on one point
claude-code's
endToolBlockedOnUserSpanusesfindLast-by-type to locate the span — under concurrent tool calls this can end the wrong span. Our helpers take the span object explicitly viagetSpanId(span)lookup. Theconcurrent blocked spanstest asserts the regression that bug would otherwise produce.Closes Phase 2 in #3731
Ticks off these checklist items in the parent issue under
### Deeper observability (P3) → Phase 2:tool.blocked_on_userspan type + helper; wire into approval state machine incoreToolScheduler._schedulehookspan type + helper; wire into pre/post hook execution incoreToolScheduler.executeSingleToolCallTest plan
pnpm typecheck— cleannpx vitest run src/telemetry/session-tracing.test.ts src/core/coreToolScheduler.test.ts— 187 / 187 pass (47 telemetry helper tests + 140 scheduler tests)npx vitest run(full core package) — only the preexistinganthropicContentGeneratorUser-Agent test + a fewcrawler.test.ts/gitDiff.test.tsfilesystem-timing failures remain. Confirmed unrelated to telemetry — none touch the changed files.npx eslint— clean人工验证
调用工具的prompt,需要人工选择

预期有

必须是这个,不是 tool_call
├─ qwen-code.tool.blocked_on_user ← 用户思考期 ├─ qwen-code.hook (PreToolUse)
├─ qwen-code.tool.execution
└─ qwen-code.hook (PostToolUse)
符合预期
Rollback
This PR moves the tool span lifecycle. The full set of changes is contained in one commit (
38ba22d9c); revert is a single git revert. The new helpers are additive — disabling them would only require dropping the calls incoreToolScheduler.tsand removing the new exports fromsession-tracing.ts.🤖 Generated with Qwen Code
Local verification (tmux + vitest)
Run the script below in a tmux pane to verify the PR locally.
Captured output and expected results are inlined.
Script
Run via tmux
Expected
TYPECHECK_OKline printed.Test Files 2 passed (2)andTests 187 passed (187).✓.Captured output (this PR, branch
feat/telemetry-phase-2-spans@6767469b2)Coverage map
parents the blocked span under the explicitly-passed tool spanrecords decision/source attributes on end and leaves status UNSEThandles concurrent blocked spans without findLast confusiontool span is started in _schedule and ended even when pre-hook deniesblocked_on_user span ends with cancel when the user rejectshandleConfirmationResponseevery span recorded in a successful tool call is endedauto-approve remaining tool calls ... ProceedAlwaysdecision: 'auto_approved',source: 'auto'hook span records shouldProceed=false / blockType=denied when pre-hook blockshook span records shouldStop=true when post-hook stops executionmarks status ERROR only when the hook itself threw