feat(telemetry): Phase 3 — qwen-code.subagent span with concurrent isolation (#3731)#4410
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements Phase 3 of hierarchical session tracing by introducing a dedicated qwen-code.subagent span around each subagent invocation, so LLM/tool/hook spans emitted by concurrent subagents become isolated subtrees rather than interleaving under the parent interaction/tool spans. It also updates TTL sweeping behavior to support long-lived fork/background subagents, and avoids duplicate span creation by skipping the existing qwen-code.subagent_execution LogRecord → span bridge.
Changes:
- Add
qwen-code.subagentspan lifecycle helpers (start/end + context binding) and wire them into foreground/fork/background subagent execution paths. - Make TTL sweeping type-aware so fork/background subagent spans use a longer TTL window and are stamped with subagent-specific abort metadata on sweep.
- Skip bridging
qwen-code.subagent_executionLogRecords into spans to prevent duplicate/competing span representations.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| docs/design/telemetry-subagent-spans-design.md | Design doc describing subagent-span hierarchy, trace/link strategy, and TTL rationale. |
| packages/core/src/telemetry/constants.ts | Adds SPAN_SUBAGENT constant for the new span name. |
| packages/core/src/telemetry/session-tracing.ts | Implements subagent span helpers and introduces type-aware TTL sweeping. |
| packages/core/src/telemetry/session-tracing.test.ts | Adds unit tests for subagent span behavior (child vs linked-root, links, TTL, status mapping). |
| packages/core/src/telemetry/log-to-span-processor.ts | Adds a skip-list so subagent_execution events are not bridged to spans. |
| packages/core/src/telemetry/log-to-span-processor.test.ts | Tests that the skip-list suppresses bridging only for subagent_execution. |
| packages/core/src/telemetry/index.ts | Re-exports new subagent telemetry helpers and types. |
| packages/core/src/agents/runtime/agent-context.ts | Adds ALS-derived depth tracking and getCurrentAgentDepth() for telemetry. |
| packages/core/src/agents/runtime/agent-context.test.ts | Tests depth behavior across nesting and sibling frames. |
| packages/core/src/tools/agent/agent.ts | Wraps subagent invocation bodies in the new subagent span context across all invocation kinds. |
Comments suppressed due to low confidence (2)
packages/core/src/telemetry/log-to-span-processor.ts:41
- There is an
importstatement after theBRIDGE_SKIP_EVENT_NAMESconstant declaration. ES modules/TypeScript require all imports to come before any executable statements, so this will fail to parse/compile. MoveBRIDGE_SKIP_EVENT_NAMES(and its comment) below the import section, or convert the later import into a dynamicawait import()if it must remain after code (prefer moving the const).
const BRIDGE_SKIP_EVENT_NAMES = new Set<string>([EVENT_SUBAGENT_EXECUTION]);
import {
deriveTraceId,
randomHexString,
randomSpanId,
docs/design/telemetry-subagent-spans-design.md:431
- This table row points to
packages/core/src/subagents/agent-context.ts, but the implementation in this PR modifiespackages/core/src/agents/runtime/agent-context.ts. Updating the file path here will keep the design doc consistent with the actual code layout.
| `packages/core/src/telemetry/log-to-span-processor.ts` | Skip-list to bypass bridging `qwen-code.subagent_execution` | +6 |
| `packages/core/src/telemetry/index.ts` | Re-export new helpers + types | +6 |
| `packages/core/src/subagents/agent-context.ts` | Add `depth?: number` to `AgentContext` + `getCurrentAgentDepth()` accessor | +12 |
| `packages/core/src/tools/agent/agent.ts` | Wrap 3 execution paths (foreground/fork/background) in `runInSubagentSpanContext` with try/catch/finally | +60 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
📋 Review SummaryThis PR implements Phase 3 of hierarchical session tracing (#3731), adding 🔍 General FeedbackPositive aspects:
Architectural decisions:
Potential concerns:
🎯 Specific Feedback🟢 Medium
try {
ctx.span.setAttributes({ ... });
} catch (error) {
debugLogger.warn(`Failed to stamp TTL attrs...`);
}
// Consider: should span.end() be skipped if attribute stamping fails?
// Currently continues to end() regardless, which is correct but worth documentingSuggestion: Add a comment clarifying that span.end() proceeds even if attribute stamping fails, since ending the span is more important than perfect attribution.
parentAgentId: getCurrentAgentId() ?? undefined,
depth: getCurrentAgentDepth(),
// ...
() => runInSubagentSpanContext(span, body)Suggestion: Add an inline comment explicitly stating "Captured BEFORE child's runWithAgentContext frame to record caller's depth, not callee's" to prevent future refactors from accidentally moving these captures.
🔵 Low
* (`https://opentelemetry.io/docs/specs/otel/overview/#links-between-spans`).Suggestion: Complete the URL (missing
✅ Highlights
🔧 Pre-Merge Checklist
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
C1 — qwen-code.subagent.status was always 'completed' regardless of actual outcome because the body of runWithSubagentSpan (runSubagentWithHooks / bgBody) catches all errors internally. The wrapper's catch block was dead code in production, so cancelled / failed subagents were indistinguishable from successful ones in telemetry. Fix: body now opts into reporting its terminal state via a recordSpanOutcome sink, mapping AgentTerminateMode + signal state to the span's status taxonomy (GOAL → completed, CANCELLED / signal aborted → cancelled, ERROR / MAX_TURNS / TIMEOUT / SHUTDOWN → failed). Background path also switched to bgAbortController.signal (was hookOpts.signal — silent-failure-hunter follow-up: task_stop on a bg agent was being classified as 'failed' instead of 'aborted'). C2 — qwen-code.subagent.depth was off-by-one for any nested subagent because getCurrentAgentDepth() returns the parent's depth and the wrapper did not add +1. Top-level and first-nested both recorded depth=0, defeating the recursion-detector use case. Fix: depth = parentAgentId !== null ? getCurrentAgentDepth() + 1 : 0. Tests: nine new tests in agent.test.ts cover the recordSpanOutcome contract across the success/failure/cancel/throw paths plus the depth fix. Mocks the subagent-span telemetry layer so the test exercises the real runWithSubagentSpan wiring without OTel context-with. Design doc: corrected two stale paths to packages/core/src/agents/runtime/ agent-context.ts (Copilot NIT — old packages/core/src/subagents/agent-context.ts location pre-dates the runtime move). Closes wenshao + Copilot review threads on #4410.
wenshao /review round 2 landed 10 inline comments; 8 fixable trivially, 3 require design decisions deferred for follow-up. Landed: - runWithSubagentSpan catch uses `??=` so a body that already recorded its outcome is not clobbered by a downstream cleanup throw (#3287621709). Dedupe against deriveSubagentExceptionMetadata. - runWithSubagentSpan finally now warns when the body forgot to call recordOutcome (defaulting to 'completed' silently was hiding wiring bugs) (#3287621748). - endSubagentSpan warns when the span is missing from activeSpans or already ended — exposes the TTL-sweep race where a fork finishing past 4h would lose its real outcome silently (#3287621722). - runInSubagentSpanContext skips the AsyncLocalStorage wrap when the span isn't tracked (NOOP_SPAN / telemetry disabled), mirroring runInToolSpanContext (#3287621756). - ttlFor uses a typed ReadonlySet<SubagentInvocationKind> so adding a new invocation kind causes a TS error instead of silent fall-through to the 30-min default (#3287621755 part a). - runSubagentWithHooks success+catch paths now publish recordSpanOutcome BEFORE updateDisplay/registry side-effects, so a downstream throw can't overwrite the real outcome with the catch's exception derivation (#3287621764). Same fix applied to bgBody. - Fork path now passes `undefined` instead of hookOpts.signal to runWithSubagentSpan — fire-and-forget forks shouldn't classify parent-turn cancellation as their own abort cause (#3287621745). - Added 2 tests in agent.test.ts for the signalAborted=true branches of deriveSubagentOutcomeMetadata and deriveSubagentExceptionMetadata (#3287621740). Deferred (asked user to triage): - #3287621731 fork/bg integration test coverage — needs new test scaffolding; existing infra only exercises foreground path. - #3287621737 background-agent-resume.ts:834 calls runWithAgentContext but NOT runWithSubagentSpan, so the new skip-list now leaves resumed bg agents with zero span coverage. Architectural fix needed: wrap the resume path in runWithSubagentSpan vs. making the skip-list opt-in. - #3287621755 part b — child LLM/tool/hook spans inside a long-running fork/background subagent still use 30-min TTL while the parent gets 4h, so a 2-hour bg agent will have early child spans force-ended at 30 min. Fix needs TTL inheritance propagation.
DeepSeek /review round 3 landed 7 inline comments; 6 fixable here, 1 needs a design call. Landed (CRITICAL): - session-tracing.ts: add `subagentContext` ALS so `startLLMRequestSpan`/ `startToolSpan`/`startHookSpan` inside a foreground subagent parent under the subagent span instead of escaping back to the outer interaction. Without this fix every foreground subagent span is an empty shell — `resolveParentContext` unconditionally picks interactionContext over the active OTel span, defeating the entire Phase 3 hierarchy claim for the most common invocation. Fork/bg were unaffected (their interactionContext is undefined), which masked the bug from earlier review rounds (#3290820352). - agent.ts: recordOutcome uses `??=` (first-write-wins) so the inner-catch's exception record can't overwrite the success record the body already published (#3290820357). The previous fix put `??=` on the OUTER wrapper's catch only, which doesn't protect against the INNER body's catch calling recordSpanOutcome with plain assignment. Landed (SUGGESTION): - agent.ts: `deriveSubagentExceptionMetadata` now sets `resultSummaryPresent: false` so attribute shape stays symmetric with success-path metadata — dashboards filtering on `qwen-code.subagent.result_summary_present` no longer silently drop failed spans (#3290820374). - agent.ts: `AgentTerminateMode.SHUTDOWN` now maps to `status: 'cancelled', terminateReason: 'subagent_shutdown'` instead of falling through to the failure branch — SHUTDOWN is graceful arena/team-session end, not a real failure (#3290820384). - agent-context.ts: `getCurrentAgentDepth` JSDoc now documents the no-frame vs depth-0-frame ambiguity and points callers at `getCurrentAgentId() !== null` as the discriminator (#3290820381). - session-tracing.ts: `endSubagentSpan` warn for "span not found in activeSpans" now gated on `isTelemetrySdkInitialized()` so it doesn't fire on every subagent invocation when telemetry is OFF (NOOP_SPAN is intentionally never registered in `activeSpans`) (#3290820392). Tests: - 4 new tests in session-tracing.test.ts cover the subagentContext parenting fix: foreground LLM/tool/hook spans inside runInSubagentSpanContext parent under the subagent (regression guards against re-introducing the empty-shell behavior); nested subagent shadowing; ALS frame teardown. - Existing tests adjusted to use the corrected `startInteractionSpan` options shape and `endInteractionSpan` signature. Deferred: - #3290820388 `gen_ai.provider.name` is currently set to SERVICE_NAME (`qwen-code`) but the OTel GenAI spec wants the actual GenAI service provider (`anthropic` / `openai` / etc.). Fix options: omit the attribute, derive from modelOverride via a model→provider mapping, or wait for spec to stabilize. Needs scope discussion.
DeepSeek /review round 4 landed 8 unique inline comments (posted twice by the bot). All actionable — none required design tradeoffs. Landed (CRITICAL): - agent.ts bgBody: fold SHUTDOWN into the CANCELLED branch on the registry side so `registry.finalizeCancelled()` matches the span's `status='cancelled'` classification. Previously the span said "cancelled" while the registry/UI said "failed" — every graceful arena/team-session shutdown produced a mismatch. wenshao @ #3291876034. Landed (SUGGESTION): - session-tracing.ts `clearSessionTracingForTesting`: also reset `subagentContext` via `enterWith(undefined)` so leaked ALS frames don't silently re-parent subsequent tests' spans (subagentContext is checked BEFORE interactionContext). wenshao @ #3291876036. - session-tracing.ts `startLLMRequestSpan`: tri-state the `llm_request.context` attribute (`subagent` / `interaction` / `standalone`) so subagent-parented LLM calls don't get mis-classified as "interaction". wenshao @ #3291876044. - agent.ts foreground `runSubagentWithHooks`: `resultSummaryPresent` now uses raw `subagent.getFinalText()` (not finalText with stop-hook warning) so a subagent that produced no result but hit a stop-hook block doesn't false-positive as having a summary. Matches the bgBody pattern. wenshao @ #3291876047. - session-tracing.ts `runInSubagentSpanContext`: also clear `toolContext` for the body's duration so hooks fired inside a subagent (e.g. SubagentStart, before any inner tool) parent under the subagent span instead of the outer AGENT tool span. Inner tool calls re-set toolContext via runInToolSpanContext, so inner-tool parenting stays correct. wenshao @ #3291876051. - agent.ts `deriveSubagentOutcomeMetadata`: populate `error`/ `errorType` for the non-throwing failure path (ERROR/MAX_TURNS/ TIMEOUT) so endSubagentSpan sets standard OTel exception attrs instead of the generic 'subagent failed' placeholder. wenshao @ #3291876053. Tests (3 new): - session-tracing.test.ts: background TTL test (mirror of fork TTL test — guards against accidental removal of 'background' from LONG_TTL_SUBAGENT_KINDS). wenshao @ #3291876056. - session-tracing.test.ts: startHookSpan inside runInSubagentSpanContext (no inner tool) parents under subagent span — regression-guards the toolContext-clear fix. wenshao @ #3291876055. - agent.test.ts: SHUTDOWN terminateMode and ERROR-with-error-attrs tests for deriveSubagentOutcomeMetadata. Verification: 88 agent tests + 74 session-tracing tests pass; tsc + eslint clean. (1 flaky failure in the unrelated pre-existing git- isolation test under parallel runs — passes in isolation.)
wenshao
left a comment
There was a problem hiding this comment.
Additional suggestions (test coverage):
-
packages/core/src/tools/agent/agent.test.ts—resultSummaryPresent=falsebranch is untested. All new tests havemockAgent.getFinalTextreturning non-empty strings, soresultSummaryPresentis alwaystrue. If someone mistakenly replacessubagentRawTextwithfinalText(which has the stop-hook warning appended), empty-result subagents would be incorrectly marked. Add a test withgetFinalTextreturning empty string, assertingresultSummaryPresent=false. -
packages/core/src/tools/agent/agent.test.ts— The first-write-wins protection (recordedMetadata ??= m) inrunWithSubagentSpanhas no test coverage. If??=is accidentally changed to=, a successful subagent whoseupdateDisplaythrows would be incorrectly reported asfailed. Add a test that mocksupdateOutputto throw after success path, verifyingendSubagentSpanstill receives the success metadata.
DeepSeek /review round 5 landed 5 inline comments; 4 fixed here, 1 overlaps a previously-deferred architectural item. Landed (SUGGESTION): - log-to-span-processor.ts: move `BRIDGE_SKIP_EVENT_NAMES` after all imports — the const was declared between two import blocks, breaking the file's established pattern. wenshao @ #3292370966. - agent.ts `runWithSubagentSpan` finally: default to `status: 'failed'` (instead of 'completed') when `recordOutcome` was never called. A future wiring bug now surfaces as a visible span failure rather than silently masking every miss as a success. wenshao @ #3292370968. - session-tracing.ts `runInSubagentSpanContext`: JSDoc now explicitly documents that the function enters `subagentContext` AND clears `toolContext` for the body's duration. Both are intentional but previously undocumented side effects. wenshao @ #3292370969. - agent.ts `runWithSubagentSpan` JSDoc: documents the rejection-handling contract for void'd callers — body is expected to never reject; the wrapper's catch is defensive only; future void'd call sites must wrap in `.catch()`. Also added defensive `.catch()` handlers on the two existing void'd sites (fork + background) so a freak OTel throw in the finally doesn't become an unhandled-promise process exit. wenshao @ #3292370970 + silent-failure-hunter earlier round. Deferred: - #3292370967 child span TTL inheritance for fork/background. Same as #3287621755b from review-2: child LLM/tool/hook spans inside a 4h subagent still use the 30-min default and get force-swept. Fix needs TTL inheritance propagation (architectural). Deferred to follow-up.
wenshao
left a comment
There was a problem hiding this comment.
[Critical] packages/core/src/tools/agent/agent.ts:~1448 — Foreground SHUTDOWN display/span status mismatch.
deriveSubagentOutcomeMetadata classifies SHUTDOWN as status: 'cancelled' with terminateReason: 'subagent_shutdown', but the foreground display logic (the else branch of if (signal?.aborted)) unconditionally maps any non-GOAL, non-aborted outcome to status: 'failed'. The bgBody path was correctly updated to group SHUTDOWN with CANCELLED, but the foreground this.updateDisplay() was not.
When an arena/team session ends and a foreground subagent receives SHUTDOWN, the span says cancelled while the user-facing UI says failed — a confusing contradiction.
if (signal?.aborted) {
this.updateDisplay({
status: 'cancelled',
terminateReason: 'Agent was cancelled by user',
executionSummary,
}, updateOutput);
} else if (terminateMode === AgentTerminateMode.SHUTDOWN) {
this.updateDisplay({
status: 'cancelled',
terminateReason: 'Agent was gracefully shut down',
executionSummary,
}, updateOutput);
} else {
this.updateDisplay({
status: success ? 'completed' : 'failed',
terminateReason: terminateMode,
result: finalText,
executionSummary,
}, updateOutput);
}
[Suggestion] packages/core/src/telemetry/session-tracing.ts:~258 — ensureCleanupInterval does not store the interval ID, making it impossible to stop programmatically. clearSessionTracingForTesting() does not clear the interval either. Consider storing the ID and exposing a stopCleanupInterval() function. Impact is low today (interval.unref() is called) but would make future cleanup logic harder.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
DeepSeek /review round 6 landed 6 suggestions; all 6 fixed. Landed (SUGGESTION): - agent.ts bgBody: drop `?` from `recordSpanOutcome` parameter — it is always passed and is load-bearing. The optional marker misled callers into thinking omission was safe. Also drop the matching optional-chains at the two call sites inside bgBody. wenshao @ #3292521237. - agent.ts: extract `buildSubagentSpanSpec(hookOpts, subagentConfig, invocationKind)` helper. The spec object was triplicated across foreground/fork/bg call sites with only `invocationKind` differing; the helper makes future spec changes a one-place edit. wenshao @ #3292521238. - agent.ts: change wiring-bug fallback's terminateReason from `'unknown'` to `'wiring_bug_record_outcome_not_called'` so dashboards can separate genuine subagent failures from telemetry wiring defects. wenshao @ #3292521240. - agent.test.ts MAX_TURNS test: add `error`/`errorType` assertions matching the ERROR test shape — regression guard for the non-throwing failure path. wenshao @ #3292521241. - agent.test.ts: rename misleading "endSubagentSpan still fires when body never opts in (default = completed)" to "endSubagentSpan is always called exactly once per invocation"; add a separate "fallback: body that skips recordOutcome" test that stubs out runSubagentWithHooks via instance-level replacement to genuinely exercise the wrapper's defensive default branch. wenshao @ #3292521244. - session-tracing.ts TTL sweep: also stamp the subagent-specific `qwen-code.subagent.duration_ms` on swept subagent spans, mirroring the key endSubagentSpan stamps. Dashboards filtering on the subagent-namespaced duration were silently missing TTL-swept spans. wenshao @ #3292521245. Verification: 89 agent + 74 session-tracing tests pass, tsc + eslint clean.
DeepSeek /review round 7 landed 2 small but real suggestions. Landed: - agent.ts: move `buildSubagentSpanSpec` to BELOW `runWithSubagentSpan`. The previous placement (between `runWithSubagentSpan`'s 28-line JSDoc and its method body) detached the JSDoc from runWithSubagentSpan — TypeScript/IDE tooling treats the immediately-preceding JSDoc as the doc for the next declaration, so the long JSDoc was attaching to `buildSubagentSpanSpec` while runWithSubagentSpan got nothing. wenshao @ #4410 DeepSeek 3292560016. - session-tracing.test.ts: assert `qwen-code.subagent.duration_ms` is set by the TTL sweep on the fork test (mirror of the assertion shape for the generic `qwen-code.span.duration_ms`). Catches the regression where the previous review-6 fix is accidentally reverted. wenshao @ #4410 DeepSeek 3292560017. Verification: typecheck + lint clean, 163 affected tests pass.
DeepSeek /review round 8 landed 3 suggestions; all 3 fixed.
Landed:
- session-tracing.test.ts: add `llm_request.context === 'subagent'`
assertion to the existing "startLLMRequestSpan inside
runInSubagentSpanContext parents under the subagent span" test.
The tri-state attribute (`subagent` / `interaction` / `standalone`)
added in review-4 had no assertion for the 'subagent' value — a
regression to the old binary ternary would have silently misclassified
subagent LLM calls as 'interaction' in dashboards. wenshao @
#3293036596.
- session-tracing.test.ts: add a coverage test for endSubagentSpan's
`'subagent failed'` fallback (when caller passes `{ status: 'failed' }`
without an explicit `error` string). Every prior failure test passes
error, so the fallback message was untested. wenshao @ #3293036600.
- docs/design/telemetry-subagent-spans-design.md: document the
foreground vs background `SubagentStart` hook span parenting
divergence as an "Edge cases" table entry. Foreground fires the hook
inside `runInSubagentSpanContext` (parent = subagent), background
fires before the wrap (parent = AGENT tool span). Architectural fix
(move bg hook fire inside framedBgBody) deferred — needs contextState
threading. wenshao @ #3293036602.
Verification: 75 session-tracing tests pass; tsc + eslint clean.
DeepSeek /review round 9 landed 2 follow-ups to the round-8 doc note. Landed: - docs/design: clean up the round-8 edge case row. Remove the trailing "wenshao @ #4410 DeepSeek 3293036602" provenance suffix (no other row in the 12-row table has review metadata — git log already tracks this). Replace the inaccurate "contextState threading" deferral rationale with the actual reason: moving bg hook fire inside framedBgBody is mechanically simple but changes user-visible semantics (hook fires async-detached after the "Background agent launched" message returns, vs. today's synchronous-before-return). wenshao @ #3293174098. - session-tracing.test.ts: add a regression guard for the documented bg `SubagentStart` parenting asymmetry — hook span fired OUTSIDE runInSubagentSpanContext but inside the AGENT tool's toolContext parents under the tool span, NOT the subagent. Catches a future refactor (or the deferred fix) that changes this. wenshao @ #3293174101. Verification: 76 session-tracing tests pass, tsc + eslint clean.
wenshao
left a comment
There was a problem hiding this comment.
No new issues in this round. The incremental commit cleanly addresses the two prior Suggestions (design doc metadata removal + bg SubagentStart hook parenting regression test). Build passes, 216/216 tests pass across affected files. Prior-round open comments remain for the author's consideration.
— qwen3.7-max via Qwen Code /review
补充:扩展到其余 end-span helper(commit
|
| 函数 | 处理 |
|---|---|
endInteractionSpan |
+debug(用 getSpanId(spanCtx.span),该函数无 spanId 形参) |
endLLMRequestSpan |
+debug |
endToolSpan |
+debug |
endToolExecutionSpan |
+debug |
endToolBlockedOnUserSpan |
+debug(canonical race victim) |
endHookSpan |
+debug |
endSubagentSpan |
保持原有 warn(4h 长 TTL) |
验证同前:typecheck 通过,session-tracing(95)+log-to-span(50)+agent(92)=237 测试全绿,lint 通过。
…Span throw test (PR #4410 review) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
1. Remove 16 opaque DeepSeek bot message IDs (e.g. "DeepSeek bot 3290820352") from comments in session-tracing.ts and agent.ts. These numeric IDs are not dereferenceable outside the original review conversation; the #4410 PR references remain for tracing design decisions. 2. Fix stale JSDoc in agent.ts:1293 — "defaults to completed" was wrong since the behavior changed to "defaults to failed with wiring_bug_record_outcome_not_called" (the comment was not updated when the behavior changed). 3. Fix misleading JSDoc in session-tracing.ts — "causes a TS error" claim about LONG_TTL_SUBAGENT_KINDS was incorrect (Set.has() accepts any union member; a new kind silently falls through to the 30-min default). Addresses wenshao review threads on #4410.
已处理全部 8 条未回复 review thread(0 unresolved 剩余)
Pushed back on #5: Deferred #2/#7/#8:均为测试覆盖扩展,不影响正确性。保留作为 follow-up PR scope。 验证:typecheck 通过,237 受影响测试全绿,lint 通过。 |
BRIDGE_SKIP_EVENT_NAMES was unconditionally suppressing bridge spans for subagent_execution events, but paths going through runForkedAgent (dream, speculation, suggestion, skillReview, extraction agents) emit this event WITHOUT creating a native qwen-code.subagent span — losing all trace representation. Fix: only skip the bridge when isInNativeSubagentSpan() confirms the ALS has an active subagent span context (i.e. code is running inside runInSubagentSpanContext). Paths without native coverage still get bridged. Addresses wenshao CHANGES_REQUESTED [Critical] on #4410.
Fixed: bridge skip regression for forked agents (commit
|
| File | Change |
|---|---|
session-tracing.ts |
Export isInNativeSubagentSpan() accessor |
log-to-span-processor.ts |
Add && isInNativeSubagentSpan() to skip condition |
log-to-span-processor.test.ts |
Test both branches (native active → skip; no native → bridge) |
Verified: typecheck clean, 147 tests pass (51 log-to-span + 96 session-tracing).
wenshao
left a comment
There was a problem hiding this comment.
No review findings on the incremental commit. The bridge-skip conditional on isInNativeSubagentSpan() correctly restores trace visibility for runForkedAgent paths. 256 tests pass across affected files. Downgraded from Approve to Comment: CI still running. — qwen3.7-max via Qwen Code /review
Verification ReportReviewer: wenshao Build
Tests
Code Review Spot Checks
Notes
Verdict✅ LGTM — ready to merge. All PR-introduced code compiles, lints, and passes 519 telemetry tests. The conditional bridge-skip fix (latest commit) correctly preserves observability for forked paths without native span coverage. |
yiliang114
left a comment
There was a problem hiding this comment.
Design is solid — the three-way invocation-kind split (foreground=child, fork/background=linked-root) follows OTel spec, and the SubagentOutcomeSink callback pattern is a clean solution for surfacing terminal state from error-swallowing bodies. ALS context propagation, TTL sweep safety, and first-write-wins recordOutcome all check out.
I left a handful of inline comments — one correctness issue with the bridge skip, plus some cleanup and test coverage items. Nothing blocking on its own, but worth addressing before merge.
Summary:
- 🟠 1× P1: review-trail comment noise (~35 occurrences of
wenshao @ #4410/ DeepSeek bot IDs in production code) - 🟡 5× P2: duration_ms naming inconsistency, bridge skip liveness gap, JSDoc truncation, two test coverage gaps
- 🟢 4× P3: design-choice observations (skipped)
… fixes 1. isInNativeSubagentSpan now checks ctx.ended — after TTL sweep ends the subagent span, the ALS frame persists but the span is dead. Without this check, the bridge skip fires incorrectly and late child events lose all trace representation. 2. Add bare duration_ms alongside qwen-code.subagent.duration_ms in endSubagentSpan for dashboard query consistency with other endXSpan. 3. Fix JSDoc line break that split closing parenthetical across lines. Addresses yiliang114 review on #4410.
已处理 yiliang114 review(commit
|
| # | 建议 | 处理 | Commit |
|---|---|---|---|
| 1 | 清理 41 处 wenshao @ 引用 |
Deferred — follow-up | — |
| 2 | isInNativeSubagentSpan 不检查 span liveness |
Fixed — 加 !ctx.ended 检查 |
fbc71900b |
| 3 | duration_ms 命名不一致 |
Fixed — 加裸 duration_ms |
fbc71900b |
| 4 | JSDoc 换行断裂 | Fixed | fbc71900b |
| 5 | fork/background invocationKind 测试 | Deferred — follow-up | — |
| 6 | resultSummaryPresent: false 测试 |
Deferred — follow-up | — |
验证:typecheck 通过,147 测试全绿。
| const subagentContext = new AsyncLocalStorage<SpanContext | undefined>(); | ||
|
|
||
| export function isInNativeSubagentSpan(): boolean { | ||
| return subagentContext.getStore() !== undefined; |
There was a problem hiding this comment.
[Suggestion] isInNativeSubagentSpan() only checks ALS presence (subagentContext.getStore() !== undefined), but the ALS frame outlives the span itself. After sweepStaleSpans fires (sets ctx.ended = true, removes from activeSpans, calls span.end()), the ALS frame still exists until the body promise resolves. Any subagent_execution LogRecord emitted in the window between sweep and body completion will hit isInNativeSubagentSpan() === true, causing LogToSpanProcessor to skip the bridge span — but the native span is already ended. Result: that event has no span representation in the trace tree at all.
The fix is to also check ctx.ended:
| return subagentContext.getStore() !== undefined; | |
| export function isInNativeSubagentSpan(): boolean { | |
| const ctx = subagentContext.getStore(); | |
| return ctx !== undefined && !ctx.ended; | |
| } |
This way, post-sweep events fall through to bridge creation, preserving at least a log-derived span. The race window is narrow (4h TTL tail), but the fix is a one-liner with no downside.
— qwen3.7-max via Qwen Code /review
| endAttributes['qwen-code.subagent.result_summary_present'] = | ||
| metadata.resultSummaryPresent; | ||
| } | ||
| if (metadata.error !== undefined) { |
There was a problem hiding this comment.
[Suggestion] Two small issues here:
-
Inconsistent guards for
metadata.error: Line 1318 uses!== undefined(truthy for""), while line 1333 uses a truthy check (falsy for""). Ifmetadata.errorwere ever"", the attribute getsexception.message: ""but the status message falls back to'subagent failed'— inconsistent signals for dashboards. -
Double
truncateSpanErrorcall: The same value is truncated independently at line 1319 and line 1334.
Hoist the truncation to fix both:
| if (metadata.error !== undefined) { | |
| const truncatedError = metadata.error !== undefined | |
| ? truncateSpanError(metadata.error) | |
| : undefined; | |
| if (truncatedError !== undefined) { | |
| endAttributes['exception.message'] = truncatedError; | |
| } |
Then use truncatedError in the status block:
message: truncatedError ?? 'subagent failed',— qwen3.7-max via Qwen Code /review
| const record = mockSpans.find((s) => s.name === 'qwen-code.subagent')!; | ||
|
|
||
| mockState.throwOnSetAttributes = true; | ||
| endSubagentSpan(span, { status: 'completed' }); |
There was a problem hiding this comment.
[Suggestion] endSubagentSpan conditionally stamps terminateReason and resultSummaryPresent as span attributes (lines 1310-1317 in session-tracing.ts), but no test in this file ever passes these fields in metadata. The if (metadata.terminateReason !== undefined) and if (metadata.resultSummaryPresent !== undefined) branches are untested at the telemetry layer. (The agent.test.ts span-outcome tests verify the input metadata, but mock away endSubagentSpan itself.)
Consider adding to the completed test:
endSubagentSpan(span, {
status: 'completed',
terminateReason: 'goal',
resultSummaryPresent: true,
});
// ...
expect(record.attributes['qwen-code.subagent.terminate_reason']).toBe('goal');
expect(record.attributes['qwen-code.subagent.result_summary_present']).toBe(true);— qwen3.7-max via Qwen Code /review
| expect(meta.error).toBe('subagent terminated with mode: ERROR'); | ||
| expect(meta.errorType).toBe('ERROR'); | ||
| }); | ||
|
|
There was a problem hiding this comment.
[Suggestion] TIMEOUT is listed as a non-throwing failure path in the comment (line 868: "ERROR/MAX_TURNS/TIMEOUT"), and deriveSubagentOutcomeMetadata maps it via the same catch-all as ERROR/MAX_TURNS. However, only ERROR and MAX_TURNS have explicit test cases; TIMEOUT is missing.
Adding a TIMEOUT test is trivial and guards against a future refactor that accidentally adds a TIMEOUT-specific branch:
it('TIMEOUT terminateMode → status="failed" + terminateReason="timeout"', async () => {
vi.mocked(mockAgent.getTerminateMode).mockReturnValue(
AgentTerminateMode.TIMEOUT,
);
await runForegroundOnce();
const meta = lastEndMeta();
expect(meta.status).toBe('failed');
expect(meta.terminateReason).toBe('timeout');
});— qwen3.7-max via Qwen Code /review
Local Verification ReportPR: #4410 — CI Status
Local TSC Compilation
Notes:
Local Test Results
Total: 147 tests passed across the 2 suites that loaded successfully. Load failures verified as pre-existing on Code Review SummaryArchitecture (Phase 3 subagent spans):
Code quality:
Verdict✅ Ready to merge — Well-designed Phase 3 telemetry extension. 147 tests pass locally, CI passing (Windows pending). TSC errors and test load failures all pre-existing on base. Design doc covers edge cases and deferred work items clearly. |
| const subagentContext = new AsyncLocalStorage<SpanContext | undefined>(); | ||
|
|
||
| export function isInNativeSubagentSpan(): boolean { | ||
| const ctx = subagentContext.getStore(); |
There was a problem hiding this comment.
[Suggestion] The new !ctx.ended guard in isInNativeSubagentSpan() has no direct test coverage. session-tracing.test.ts has zero references to this function, and log-to-span-processor.test.ts fully mocks it as a boolean — neither exercises the liveness check on the real implementation.
Consider adding a test that starts a subagent span, ends it within the ALS frame, and asserts isInNativeSubagentSpan() returns false:
it('isInNativeSubagentSpan returns false after endSubagentSpan within the ALS frame', async () => {
const span = startSubagentSpan({ ...baseOpts, invocationKind: 'foreground' });
await runInSubagentSpanContext(span, async () => {
expect(isInNativeSubagentSpan()).toBe(true);
endSubagentSpan(span, { status: 'completed' });
expect(isInNativeSubagentSpan()).toBe(false);
});
});— qwen3.7-max via Qwen Code /review
| try { | ||
| const duration = Date.now() - spanCtx.startTime; | ||
| const endAttributes: Attributes = { | ||
| duration_ms: duration, |
There was a problem hiding this comment.
[Suggestion] The newly added bare duration_ms attribute is not asserted by any test. The existing duration test (line ~1470) only checks qwen-code.subagent.duration_ms. A one-line assertion would close the gap:
expect(record.attributes['duration_ms'] as number).toBeGreaterThanOrEqual(0);— qwen3.7-max via Qwen Code /review
yiliang114
left a comment
There was a problem hiding this comment.
LGTM — all critical findings across 10+ review rounds resolved. Bridge skip regression fixed with ALS-based liveness check. CI green on all platforms. Remaining P2 items (fork/background test coverage, wenshao-ref cleanup) are fine as follow-up. Solid OTel integration work. Ship it.
… onto main Port the shell notification feature (#4355) from the old BackgroundShellRegistry class into the shell-task.ts free-function module. Update all call sites (Session.ts, useGeminiStream.ts, chatCompressionService.ts) and their tests to use the new module-level setters and unified TaskRegistry API. Also integrates AgentTerminateMode.SHUTDOWN grouping with CANCELLED (#4410) and fork-subagent feature gate (#4574) into the free-function architecture.
… onto main Port the shell notification feature (#4355) from the old BackgroundShellRegistry class into the shell-task.ts free-function module. Update all call sites (Session.ts, useGeminiStream.ts, chatCompressionService.ts) and their tests to use the new module-level setters and unified TaskRegistry API. Also integrates AgentTerminateMode.SHUTDOWN grouping with CANCELLED (#4410) and fork-subagent feature gate (#4574) into the free-function architecture.
Closes Phase 3 of #3731.
Summary
Adds a
qwen-code.subagentspan around every subagent invocation. The LLM/tool/hook spans the subagent emits become a proper subtree, instead of interleaving with concurrent siblings under the parent interaction span.Trace tree change
Before: 3 concurrent AGENT tool calls produced 3
qwen-code.tool [agent]siblings whose LLM/tool children all attached to the shared parent interaction — operators couldn't tell which API call belonged to which subagent.After:
Design
Full doc:
docs/design/telemetry-subagent-spans-design.md(504 lines, 6 explicit decisions each with options + tradeoffs + rationale).Six decisions (all justified against OTel spec, opencode, claude-code precedent):
agent.ts) creates the span, body runs insiderunInSubagentSpanContext(span, fn)viacontext.with. Required for fork/background — by the time fire-and-forget body fires, caller'scontext.active()is already gone.Linkback to invoker). Spec-recommended pattern for 'long running asynchronous data processing operations'. Avoids inflating parent trace duration with hour-long forks.ttlFor(ctx). Swept spans stampedqwen-code.subagent.status='aborted'+terminate_reason='ttl_swept'.qwen-code.subagent_executionLogRecord kept (RUM + metrics depend on it), butLogToSpanProcessorskips bridging that single event name. No duplicate spans.gen_ai.agent.id+qwen-code.subagent.id,gen_ai.agent.name+qwen-code.subagent.name, plusgen_ai.operation.name='invoke_agent',gen_ai.provider.name,gen_ai.conversation.id. Errors use OTel-standardexception.message+error.type. Spec is in Development — dual-emit lets dashboards transition when stable.AgentContext.depth: auto-incremented insiderunWithAgentContext, callers pass nothing. Surfaces asqwen-code.subagent.depthfor recursion-bug detection.What dashboards see
qwen-code.subagentappears betweenqwen-code.tool [agent]and the subagent's LLM/tool children.qwen-code.subagent_executionbridge span no longer emitted (LogRecord still flows to RUM + metrics).Link.Files
docs/design/telemetry-subagent-spans-design.mdconstants.tsSPAN_SUBAGENTconstantsession-tracing.tsstartSubagentSpan/endSubagentSpan/runInSubagentSpanContext+ types; SpanType +='subagent'; type-aware TTL viattlFor(ctx)log-to-span-processor.tsEVENT_SUBAGENT_EXECUTIONtelemetry/index.tsagent-context.tsdepthfield (auto-incremented) +getCurrentAgentDepth()tools/agent/agent.tsrunWithSubagentSpan()helper centralizes the pattern; wired into 3 invocation pathsTest plan
194/194 across affected files (was 173 → +21 new):
session-tracing.test.ts(+14): foreground/fork/background span creation, Link with invoker context, dual-emit attrs, depth/parentAgentId, status mapping (completed/failed/cancelled/aborted), idempotent end, NOOP-when-SDK-uninitialized,truncateSpanErrorintegration on error message, type-aware TTL behavior (fork survives 30min, swept at 4h; foreground swept at 30min)log-to-span-processor.test.ts(+2): skip works forsubagent_execution; non-listed events still bridgeagent-context.test.ts(+5): depth=0 top-level, auto-increment per nesting, sibling isolation, caller-invisibleagent.test.ts(existing 75 unchanged): proves wiring is NOOP-safe under uninit'd SDKtsc --noEmitcleanESLint clean on changed files (auto-fixed import order in
log-to-span-processor.ts)Out-of-scope (deferred follow-ups)
gen_ai.usage.*) → Phase 4qwen-code.subagent_executionLogRecord onto span events → needs 3-system coordination (RUM + metrics + bridge)gen_ai.agent.id/name→ when GenAI spec reaches StableRollback
Single PR, single commit. Revert restores prior behavior 1:1. The new span helpers are only invoked from
agent.ts; dropping the wiring + the LogToSpanProcessor skip restores the original LogRecord bridge behavior.🤖 Generated with Qwen Code