feat(telemetry): inject TRACEPARENT env var into shell child processes#4906
Conversation
When `outboundCorrelation.propagateTraceContext` is enabled, inject a W3C `TRACEPARENT` environment variable into all shell child processes (Bash tool, hooks, monitor) so that CLI tools and Python scripts can participate in distributed tracing. - Extract shared trace-context module from debugLogger's private helpers - Use OTel's `isSpanContextValid` and `INVALID_TRACEID` for validation - Gate injection via module-level setter (set in initializeTelemetry, reset in shutdownTelemetry) - Update settingsSchema description to document shell env var behavior
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
Thanks for the PR, @doudouOUC! The implementation itself looks thoughtful — extracting a shared trace-context.ts module and injecting TRACEPARENT into shell child processes is a clean design.
However, the PR body doesn't follow the PR template. Several required sections are missing or renamed:
What this PR does— present as "Summary" (different heading)Why it's needed— absent entirelyReviewer Test Plan— absent; has "Test plan" instead but missingHow to verify,Evidence (Before & After),Tested onsub-sectionsRisk & Scope— absentLinked Issues— absent中文说明— absent
The template exists so maintainers can quickly find what they need during review. Please restructure the PR body to match the template headings (content can stay the same — just use the expected section names). Once that's updated, I'll continue the review.
中文说明
感谢贡献,@doudouOUC!实现本身设计合理——提取共享的 trace-context.ts 模块并向 shell 子进程注入 TRACEPARENT 是简洁的方案。
但 PR 正文未遵循 PR 模板,缺少多个必需章节:What this PR does(写成了 Summary)、Why it's needed、Reviewer Test Plan(含 How to verify / Evidence / Tested on)、Risk & Scope、Linked Issues、中文说明。
请按模板标题重组正文(内容可保留,只需改用规定的章节名)。更新后我会继续审查。
— Qwen Code · qwen3.7-max
There was a problem hiding this comment.
Pull request overview
Adds W3C trace context propagation into shell-spawned child processes (when outboundCorrelation.propagateTraceContext is enabled) and consolidates trace-context extraction/formatting into a shared telemetry module.
Changes:
- Introduces
telemetry/trace-context.ts(and tests) to centralize trace context extraction (active span → session root fallback) andtraceparentformatting. - Injects
TRACEPARENTinto shell child-process environments when shell trace propagation is enabled. - Refactors
debugLoggerto use the shared trace-context helper and updates tests/mocks accordingly.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/utils/shellContextEnv.ts | Adds optional TRACEPARENT env var injection for shell subprocesses. |
| packages/core/src/utils/shellContextEnv.test.ts | Adds tests for TRACEPARENT injection behavior. |
| packages/core/src/utils/debugLogger.ts | Replaces local trace helpers with shared getTraceContext() utility. |
| packages/core/src/utils/debugLogger.test.ts | Updates tests to mock getTraceContext() rather than raw OTel APIs. |
| packages/core/src/telemetry/trace-context.ts | New shared module for extracting trace context and formatting W3C traceparent. |
| packages/core/src/telemetry/trace-context.test.ts | New test suite covering extraction, fallback behavior, formatting, and shell gate. |
| packages/core/src/telemetry/sdk.ts | Enables/disables shell trace propagation during telemetry init/shutdown. |
| packages/core/src/telemetry/index.ts | Re-exports trace-context utilities/types. |
| packages/cli/src/config/settingsSchema.ts | Updates setting description to include shell TRACEPARENT propagation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When propagation is enabled but getTraceContext() returns null (e.g., during initialization before any span exists), explicitly set TRACEPARENT to empty string to prevent stale/foreign values inherited from process.env from leaking into child processes.
Review feedback addressed in 8e659a5
Resolved 2/2 review threads. |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
No review findings. The TRACEPARENT injection and trace-context module extraction are clean — correct lifecycle wiring, proper feature gating, thorough test coverage (46/46 tests pass), tsc/eslint clean. LGTM once CI is green. ✅
Two low-confidence suggestions for human review (not posted as inline comments):
sdk.test.ts— consider adding assertions thatsetShellTracePropagation()is called duringinitializeTelemetry/shutdownTelemetryto prevent silent regressions of this wiring.trace-context.test.ts— consider adding an error-path test forgetSessionRootTraceContext(the analogousgetActiveSpanTraceContexthas one, butgetSessionRootTraceContextdoes not).
— qwen3.7-max via Qwen Code /review
- Remove ZERO_TRACE_ID from barrel export (no external consumers) - Clear TRACESTATE alongside TRACEPARENT when context unavailable - Add zero-spanId rejection test (isSpanContextValid contract) - Add getSessionRootTraceContext error path test - Add sdk.test.ts assertions for setShellTracePropagation wiring
Review round 1 addressed in 28be283
Resolved 5/5 review threads. CI: reran Windows test (flake in unrelated |
Clear TRACESTATE unconditionally when propagation is enabled, not only when trace context is null. Prevents stale vendor state from the parent environment pairing with qwen-code's own traceparent.
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI still running.
All 5 R1 suggestions have been addressed by the author. Build passes, 107/107 tests pass, tsc/eslint clean. The implementation is correct and well-tested — clean extraction of trace-context.ts, proper TRACESTATE clearing, comprehensive test coverage including edge cases added in R1 follow-up.
— qwen3.7-max via Qwen Code /review
|
|
||
| if (isShellTracePropagationEnabled()) { | ||
| const ctx = getTraceContext(); | ||
| env['TRACEPARENT'] = ctx ? formatTraceparent(ctx) : ''; |
There was a problem hiding this comment.
[Suggestion] On Windows, environment variable names are case-insensitive, but Node spreads process.env into the child env as case-sensitive object keys. All four spawn sites build the child env as { ...process.env, ...getShellContextEnvVars() } (shellExecutionService.ts:554 & :1157, hookRunner.ts:591, monitor.ts:370). If the parent environment already holds a differently-cased Traceparent/traceparent — the exact CI-OTel-agent scenario this PR's "Risk & Scope" cites — the merged object keeps both the stale key and the new uppercase TRACEPARENT, and Windows may resolve the stale one, silently defeating the documented overwrite/clear guarantee. Same applies to TRACESTATE.
This is the bug class the repo already fixes for PATH via normalizePathEnvForWindows (utils/windowsPath.ts), which only canonicalizes path-cased keys and won't touch TRACEPARENT. Since the PR marks Windows as "not tested," consider extending that normalization (or stripping case-variant TRACEPARENT/TRACESTATE keys from process.env at the merge sites) plus a win32 test asserting a stale mixed-case Traceparent does not survive.
— claude-opus-4-8 via Claude Code /qreview
There was a problem hiding this comment.
Thanks for the thorough analysis. This is a real Windows edge case, but I'd defer it to a follow-up:
- The de facto standard env var name is
TRACEPARENT(uppercase) — otel-cli, GitHub Actions, Tekton all use this casing. A parent with lowercaseTraceparentwould be non-standard. - The existing
normalizePathEnvForWindowsonly handlesPATHand is called at the spread sites, not ingetShellContextEnvVars(). Extending it for arbitrary env vars is a broader change. - This PR is already marked Windows "not tested" — a proper fix would need a Windows-specific test with case-variant keys.
Happy to open a follow-up issue for Windows env var case normalization if maintainers want it.
✅ Local verification report — PR #4906Built the branch and verified the feature end-to-end on real processes (no mocks), driving the actual Environment: Linux ( 1. End-to-end through the real
|
| Source | trace ID |
|---|---|
Shell child's injected TRACEPARENT |
032d21107c0e2be25a2025b7d52cc183 |
sha256(session.id)[:32], session b82f975e-… |
032d21107c0e2be25a2025b7d52cc183 ✅ |
trace_id of this session's telemetry spans (×24 in outfile) |
032d21107c0e2be25a2025b7d52cc183 ✅ |
All three match. The launching shell had no TRACEPARENT in its environment, so injection is the only possible source — and a random per-run session UUID could never coincidentally equal a pre-existing env value. run_shell_command fired 11× in telemetry, confirming the Bash tool actually executed.
Negative (propagateTraceContext: false, telemetry still on) — same CLI, same command, same model:
(no TRACEPARENT / TRACESTATE lines — grep matched nothing)
run_shell_command still fired (9×), so this is a clean controlled comparison: flipping only the one flag toggles injection on/off. ✅ Matches reviewer-test-plan step 4.
2. Deterministic harness driving the real ShellExecutionService.execute()
To remove all LLM non-determinism, a harness imported the compiled PR code and called the real telemetry init + the real shell-spawn code (the exact path the Bash tool uses), spawning real OS subprocesses. Covered both spawn paths and all behavioral branches:
| Scenario | Path | Result |
|---|---|---|
| propagate ON, clean env | child_process |
✅ TRACEPARENT injected, traceId == sha256(sessionId), TRACESTATE="" |
| propagate ON, clean env | node-pty (executionMethod: lydell-node-pty, confirmed) |
✅ same (PTY path injects too) |
| propagate OFF (default), clean env | child_process |
✅ no TRACEPARENT/TRACESTATE injected |
propagate ON, foreign TRACEPARENT+TRACESTATE pre-seeded in parent env |
child_process |
✅ foreign value overwritten with qwen's own; foreign TRACESTATE cleared |
| propagate OFF, foreign env pre-seeded | child_process |
✅ qwen is hands-off — parent value passes through unchanged |
The override + tracestate-clear cases specifically exercise the round-1/round-2 review fixes ("always clear TRACESTATE", "overwrite existing TRACEPARENT").
3. Automated tests
- PR's 4 named test files (
trace-context,shellContextEnv,debugLogger,sdk): 107/107 pass. - Regression sweep — entire
telemetry/dir + every consumer of the refactored module (shellExecutionService,tools/monitor,hooks/hookRunner,debugLogger,shellContextEnv): 766/766 pass, 0 failures. ThedebugLogger→trace-contextextraction introduced no consumer regressions.
Notes / observations
- All four shell-spawn sites that should carry context were verified to spread
getShellContextEnvVars()last (so it correctly overrides inherited env):shellExecutionServicechild_process:554+ PTY:1157,tools/monitor:370,hooks/hookRunner:577. - Gating is consistent with the docs: when
telemetry.enabled: false,initializeTelemetryearly-returns andsetShellTracePropagationis never called → no injection even ifpropagateTraceContext: true. ("Requirestelemetry.enabled: true" holds.) getTraceContext()is fullytry/catch-guarded andformatTraceparentis pure, so the injection path cannot throw into / break shell execution.- Clearing to empty string (
TRACEPARENT="",TRACESTATE="") is W3C-safe — downstream parsers treat empty as "no context", which is preferable to leaking a stale parent value.
Verdict: behavior matches the PR description exactly; no regressions found. LGTM from a functional-verification standpoint. ✅
中文小结
在隔离 worktree 中构建分支,并用 tmux 跑了真实进程端到端验证(无 mock):
- 真实 CLI 端到端:开启
telemetry.enabled+propagateTraceContext后,让 glm-4.7 驱动真实 Shell 工具执行env | grep TRACEPARENT,子进程收到TRACEPARENT=00-032d2110…-…-01、TRACESTATE=为空。该 traceId 同时等于sha256(session.id)和本次会话遥测 span 的 trace_id(出现 24 次)——三者一致,证明注入的是 qwen-code 自己的 trace,而非环境泄漏。关闭开关后同一命令不再注入(test plan 第 4 步)。 - 确定性 harness:直接调用编译后的真实
ShellExecutionService.execute(),覆盖child_process与node-pty两条路径及全部分支(开/关/覆盖外来值/关闭透传),全部通过。 - 测试:PR 指定的 4 个测试文件 107 个全过;遥测目录 + 所有改动消费方回归 766 个全过,无回归。
补充了 PR 中标记为 not tested 的 🐧 Linux 平台。结论:行为与描述完全一致,未发现回归。✅
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No review findings. The TRACEPARENT injection and trace-context module extraction are clean — correct lifecycle wiring, proper feature gating, thorough test coverage (107/107 pass), tsc and eslint both clean. LGTM! ✅ — qwen3.7-max via Qwen Code /review
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
CI failure analysis:
|
| Commit | CI result | Time |
|---|---|---|
9e4c87a7e4 (before #4728) |
✅ success | 10:34 UTC |
423cac110c (#4728 merge) |
cancelled | 11:09 UTC |
fb1432cf5c (after #4728) |
❌ failure | 11:12 UTC |
| All subsequent main commits | ❌ failure | — |
This PR does not modify any file in scripts/ — the failure is 100% inherited from main. All other CI jobs (Lint, Test ubuntu, Test macOS, CodeQL) pass.
What this PR does
When
outboundCorrelation.propagateTraceContextis enabled, inject a W3CTRACEPARENTenvironment variable into all shell child processes (Bash tool, hooks, monitor) so that CLI tools and Python scripts can participate in distributed tracing. Also extracts a sharedtrace-context.tsmodule fromdebugLogger.ts's private trace context helpers, eliminating code duplication and using OTel'sisSpanContextValidfor more correct span validation.Why it's needed
Shell child processes (Bash tool commands, hooks, monitor, skill-invoked Python scripts) currently cannot participate in distributed tracing because they receive no trace context. The existing
propagateTraceContextsetting only controls HTTP header injection viaUndiciInstrumentationforfetch-based requests. CLI tools and Python scripts spawned by qwen-code are invisible in trace trees, making it difficult to correlate shell command execution with the parent qwen-code session.Reviewer Test Plan
How to verify
settings.jsonwith"outboundCorrelation": {"propagateTraceContext": true}and a telemetry endpointenv | grep TRACEPARENT00-{32hex}-{16hex}-{2hex}env | grep TRACEPARENTproduces no outputnpm run test -- packages/core/src/telemetry/trace-context.test.ts packages/core/src/utils/shellContextEnv.test.ts packages/core/src/utils/debugLogger.test.ts人工验证
Evidence (Before & After)
N/A — non-UI change (telemetry plumbing). Unit tests verify behavior.
Tested on
Risk & Scope
TRACEPARENTin the parent environment (e.g., from a CI pipeline's OTel agent). Documented in the settings description.propagateTraceContexttoggle.Linked Issues
N/A
中文说明
本 PR 做了什么
当
outboundCorrelation.propagateTraceContext开启时,向所有 shell 子进程(Bash 工具、hooks、monitor)注入 W3CTRACEPARENT环境变量,使 CLI 工具和 Python 脚本能够参与分布式追踪。同时从debugLogger.ts提取共享的trace-context.ts模块,消除代码重复,并使用 OTel 的isSpanContextValid进行更准确的 span 校验。为什么需要
Shell 子进程(Bash 工具命令、hooks、monitor、skill 调用的 Python 脚本)当前无法参与分布式追踪,因为它们没有收到 trace context。现有的
propagateTraceContext设置仅控制通过UndiciInstrumentation对fetch请求的 HTTP header 注入。qwen-code 启动的 CLI 工具和 Python 脚本在 trace 树中不可见,难以将 shell 命令执行与父 qwen-code session 关联。风险与范围
TRACEPARENT(如 CI pipeline 的 OTel agent 注入的值),已在设置描述中说明。propagateTraceContext开关控制。🤖 Generated with Qwen Code