feat(telemetry): trace daemon prompt lifecycle#4556
Conversation
📋 Review SummaryThis PR implements end-to-end OpenTelemetry trace propagation for the daemon mode, connecting HTTP route handling through the ACP bridge to prompt execution spans. The implementation is well-structured with comprehensive test coverage and follows established telemetry patterns. The changes enable LLM/tool telemetry correlation without requiring frontend protocol changes. 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
🔐 Security NotesThe implementation correctly:
One consideration for future review: the |
There was a problem hiding this comment.
Pull request overview
Adds end-to-end OpenTelemetry trace propagation for qwen serve so prompt execution can be correlated from daemon HTTP handling, through the ACP bridge queue/dispatch, into ACP child prompt interaction spans—without requiring client protocol changes (reserving qwen.telemetry.* internal metadata).
Changes:
- Introduces daemon tracing utilities to inject/extract W3C trace context via reserved
_metakeys and to create daemon request/bridge spans. - Initializes telemetry in the serve daemon process and adds an Express middleware to wrap key routes with daemon request spans and error recording.
- Adds a scoped
withInteractionSpanhelper and uses it in ACP child prompt handling to parent interaction spans to the daemon-extracted context; extends bridge to capture/restore context across prompt FIFO queueing.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/telemetry/session-tracing.ts | Adds withInteractionSpan for scoped interaction spans with optional explicit parent context. |
| packages/core/src/telemetry/session-tracing.test.ts | Adds test coverage for scoped interaction spans and parentContext behavior. |
| packages/core/src/telemetry/sdk.ts | Generalizes telemetry initialization to accept TelemetryRuntimeConfig instead of Config. |
| packages/core/src/telemetry/runtime-config.ts | Introduces TelemetryRuntimeConfig interface for non-Config runtimes (e.g., daemon). |
| packages/core/src/telemetry/metrics.ts | Updates metric initialization/perf monitoring to accept TelemetryRuntimeConfig. |
| packages/core/src/telemetry/index.ts | Exports new daemon tracing APIs and runtime config type. |
| packages/core/src/telemetry/daemon-tracing.ts | Adds daemon tracing helpers (request/bridge spans, context capture/restore, traceparent injection/extraction, workspace hashing). |
| packages/core/src/telemetry/daemon-tracing.test.ts | Adds focused tests for trace context extraction/injection behavior and workspace hashing. |
| packages/cli/src/serve/server.ts | Adds daemon request-span middleware and emits daemon telemetry on bridge errors. |
| packages/cli/src/serve/runQwenServe.ts | Initializes telemetry in the daemon process and wires a telemetry seam into the bridge; ensures telemetry shutdown during drain. |
| packages/cli/src/acp-integration/session/Session.ts | Wraps ACP prompt execution in withInteractionSpan and parents to extracted daemon trace context. |
| packages/acp-bridge/src/bridgeOptions.ts | Introduces BridgeTelemetry seam for bridge-level spans/events/context capture and prompt metadata injection. |
| packages/acp-bridge/src/bridge.ts | Uses bridge telemetry seam to span channel/session/prompt operations and restore context across prompt queueing. |
| packages/acp-bridge/src/bridge.test.ts | Adds test validating telemetry seam usage and prompt metadata injection behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f145ab1 to
38f75fb
Compare
1284383 to
1ca0572
Compare
38f75fb to
7e7f6a5
Compare
7e7f6a5 to
79a14b1
Compare
Review fix summary (commit 43071d2)Fixed (10 items)
Not taking (1 item)
Deferred (6 items — follow-up scope)
Ready for re-review. |
wenshao
left a comment
There was a problem hiding this comment.
[Critical] new TraceState(carrier['tracestate']) at line 289 crashes at runtime (TypeError: TraceState is not a constructor). TraceState is a type-only export (interface) from @opentelemetry/api. Build fails (TS1484 + TS2693), and 1 test fails (extracts daemon trace context from reserved prompt metadata keys).
Fix: import createTraceState from @opentelemetry/core and use createTraceState(carrier['tracestate']).
(Re-stating: this was flagged in R4 and remains unfixed at HEAD.)
Fix: TraceState constructor crash (commit 5b39d83)
Ready for re-review. |
Fix round 3 (commit 79aff11)Fixed (3)
Deferred (3)
Ready for re-review. |
Fix round 4 (commit 28a4748)Fixed (5)
Deferred (6)
Not taking (3)
Ready for re-review. |
28a4748 to
4ff18e0
Compare
Fix round 5 (commit 646226b)Fixed (2)
Not taking (6)
Ready for re-review. |
Review round 5 — wenshao's latest comments (May 28)
Fixed 3 / Pushed back 3 / Not taking 2 / Deferred 1 in commit e075c6f. |
Connect qwen serve HTTP routes, ACP bridge dispatch, and ACP child prompt execution through OpenTelemetry context propagation. The daemon injects reserved qwen.telemetry metadata internally so clients do not need to pass trace context. Closes QwenLM#4554 Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Record bridge telemetry events as short daemon bridge spans when they fire outside an active request or prompt context, so asynchronous channel exits remain observable. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- recordDaemonHttpResponse: don't clobber ERROR with OK on non-5xx - finish(): remove signal listeners synchronously before async telemetry shutdown - extractDaemonTraceContext: reject all-zero IDs, include tracestate, set isRemote - propagation.inject: wrap in try/catch for consistency - injectPromptContext: move inside prompt.dispatch span for correct parent - withDaemonSpan: guard on isTelemetrySdkInitialized() - toOtelAttributes: remove identity function, pass attributes directly - injectDaemonTraceContext: early-return when no active span (avoid empty _meta) - emitDaemonLog: remove redundant event.timestamp attribute - NOOP_BRIDGE_TELEMETRY: drop async, add short-circuit for missing keys
TraceState is a type-only export from @opentelemetry/api (not a runtime constructor). The manual fallback path now omits tracestate since the primary propagation.extract path already handles it.
- withDaemonSpan: pass undefined (not getSpan result) when SDK off - stripReservedTraceMeta: skip copy when no reserved keys present - sendBridgeErrorImpl: truncate error.message in emitDaemonLog
- extractDaemonTraceContext: use ROOT_CONTEXT as extraction base to prevent incorrect parent-child when agent has its own active span - extractDaemonTraceContext (manual fallback): already has isRemote:true and ROOT_CONTEXT from previous fix — confirmed consistent - injectDaemonTraceContext: skip _meta assignment when original had no _meta and no trace headers were injected (match NOOP behavior) - withInteractionSpan: cancelled prompts get UNSET instead of OK so dashboards can distinguish cancelled from successful - emitDaemonLog: use OTel built-in timestamp field instead of custom attribute
- Import DAEMON_TRACEPARENT/TRACESTATE_META_KEY from core instead of redeclaring locally in bridge.ts (drift risk) - Add isTelemetrySdkInitialized() guard to event() in createDaemonBridgeTelemetry for consistency with siblings - Remove setStatus(ERROR, "HTTP 500") from recordDaemonHttpResponse to avoid overwriting the descriptive error message already set by recordDaemonError
646226b to
9041d56
Compare
Review round 6 — wenshao's latest CHANGES_REQUESTED (May 28)
Fixed 2/2 in commit 2938ad3. |
Review round 7 — wenshao's CHANGES_REQUESTED (May 28 late)
Fixed 1 / Pushed back 2 / Deferred 7 in commit 6f8cb66. |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
Maintainer Post-Merge Verification ReportReviewer: @penwenshao (local build & test) 1. TypeScript Type Check
2. Unit Tests
Total: 723 tests passed, 0 failed 3. Code ReviewArchitecture (daemon-tracing.ts): ✅
Trace context propagation: ✅
Context capture/restore for prompt FIFO: ✅
Error handling: ✅
Span naming conventions: ✅
Test coverage: ✅
Security: ✅
Verdict: ✅ Verified — merge was correctAll 723 tests pass. The OTel propagation design is clean, well-tested, and follows security best practices (workspace hashing, reserved metadata stripping). The |
Summary
Validation
npm run build && npm run typecheckfrom the repo root.Test Files ... passedfor each focused suite and successful root build/typecheck completion.Scope / Risk
Testing Matrix
Testing matrix notes:
Linked Issues / Bugs
Closes #4554