feat(telemetry): client-side HTTP span + opt-in W3C traceparent propagation (#4384)#4390
Conversation
📋 Review SummaryThis PR implements W3C traceparent propagation for outbound LLM requests by wiring 🔍 General FeedbackPositive aspects:
Architectural decisions:
Concerns:
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
This PR wires OpenTelemetry’s undici/fetch instrumentation into the core telemetry SDK so outbound LLM HTTP calls made via globalThis.fetch (undici) propagate W3C trace context (traceparent) and produce client HTTP spans, while avoiding tracing OTLP exporter uploads to prevent feedback loops. It also adds a design doc and developer documentation describing the outbound trace propagation behavior.
Changes:
- Register
@opentelemetry/instrumentation-undicialongside existingHttpInstrumentationininitializeTelemetry(). - Add
ignoreRequestHooklogic to skip OTLP exporter endpoints and prevent infinite self-tracing loops. - Add unit tests validating both instrumentations are registered and that OTLP endpoints are ignored correctly.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/telemetry/sdk.ts | Adds UndiciInstrumentation and OTLP endpoint ignore logic to propagate trace context on fetch/undici calls. |
| packages/core/src/telemetry/sdk.test.ts | Adds tests covering instrumentation registration and ignoreRequestHook endpoint matching. |
| packages/core/package.json | Adds @opentelemetry/instrumentation-undici@0.14.0 dependency. |
| package-lock.json | Updates lockfile to include the new OTel undici instrumentation dependency and related dependency graph changes. |
| docs/developers/development/telemetry.md | Documents automatic traceparent propagation on outbound LLM fetch requests and the feedback-loop avoidance approach. |
| docs/design/telemetry-outbound-propagation-design.md | Adds the design document covering outbound propagation (part 1 + planned part 2). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Review feedback on #4390: 1. CI was failing on npm ci because the lockfile was generated with npm 11 locally (it sprinkles `peer: true` annotations npm 10 reads differently and rejects). Regenerated with npm 10 (matching CI's Node 22.x default), so the diff vs main is now 18 lines (the actual instrumentation-undici entry) instead of 105 lines of npm-version drift noise. 2. (Copilot inline at sdk.ts:330) `otlpUrlPrefixes` was derived from raw Config strings, so a settings.json `"otlpEndpoint": "\"http://...\""` (quoted) or trailing `#fragment` would silently miss the prefix match and reintroduce the feedback loop the hook exists to prevent. Replaced the regex-based suffix trim with a WHATWG URL parser: - strips ?query, #fragment, trailing slash - trims symmetric ASCII quotes a user may have placed in settings.json - falls back to safe suffix trimming if URL parsing fails (misconfigured endpoint still gets SOME protection) 3. (CodeQL inline) Replaced the `/\?.*$/` regex in ignoreRequestHook with `indexOf('?')`/`indexOf('#')` slicing for ReDoS hygiene. The regex was linear in practice but flagged as polynomial — using indexOf removes the ambiguity and is arguably simpler. Added 3 tests in sdk.test.ts covering the new normalizations (#fragment on incoming path, quoted endpoint, #fragment on configured endpoint). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Review feedback on #4390: 1. CI was failing on npm ci because the lockfile was generated with npm 11 locally (it sprinkles `peer: true` annotations npm 10 reads differently and rejects). Regenerated with npm 10 (matching CI's Node 22.x default), so the diff vs main is now 18 lines (the actual instrumentation-undici entry) instead of 105 lines of npm-version drift noise. 2. (Copilot inline at sdk.ts:330) `otlpUrlPrefixes` was derived from raw Config strings, so a settings.json `"otlpEndpoint": "\"http://...\""` (quoted) or trailing `#fragment` would silently miss the prefix match and reintroduce the feedback loop the hook exists to prevent. Replaced the regex-based suffix trim with a WHATWG URL parser: - strips ?query, #fragment, trailing slash - trims symmetric ASCII quotes a user may have placed in settings.json - falls back to safe suffix trimming if URL parsing fails (misconfigured endpoint still gets SOME protection) 3. (CodeQL inline) Replaced the `/\?.*$/` regex in ignoreRequestHook with `indexOf('?')`/`indexOf('#')` slicing for ReDoS hygiene. The regex was linear in practice but flagged as polynomial — using indexOf removes the ambiguity and is arguably simpler. Added 3 tests in sdk.test.ts covering the new normalizations (#fragment on incoming path, quoted endpoint, #fragment on configured endpoint). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…uests Part 2 of #4384. Stacks on top of PR #4390 (traceparent via undici). Adds a product-namespaced HTTP header X-Qwen-Code-Session-Id to every outbound LLM request when telemetry is enabled, so server-side ingestion can correlate observed requests with qwen-code session metric/log records. Pattern matched from claude-code (X-Claude-Code-Session-Id, verified at src/services/api/client.ts:108 in their open-source repo). Critical design decision (design doc section 4.3): the OpenAI / Anthropic providers use a per-request fetch wrapper rather than the SDK defaultHeaders option, because content-generator SDK clients are constructed once and NOT recreated on /clear-triggered session resets (Config.resetSession updates this.sessionId but the contentGenerator keeps using the stale header value). Reading config.getSessionId() from inside the wrapper at request time gives the live value. Gemini provider uses static httpOptions.headers — @google/genai HttpOptions interface does not expose a fetch hook (only headers, baseUrl, apiVersion, timeout, extraParams). This is a known limitation: after session reset, Gemini X-Qwen-Code-Session-Id stays stale until the contentGenerator is recreated. Documented in telemetry.md and the design doc section 8.6; spans/logs continue to carry the live session id for trace/log correlation. Lazy-invalidate fix is a follow-up sub-issue. Header is omitted when telemetry is disabled OR when getSessionId returns an empty string (some HTTP middleware rejects empty header values). Integration sites: - packages/core/src/core/openaiContentGenerator/provider/default.ts (base class — automatically covered by deepseek/minimax/mistral/ modelscope/openrouter subclasses; openrouter calls super.buildHeaders) - packages/core/src/core/openaiContentGenerator/provider/dashscope.ts (overrides buildClient — must be touched separately; QwenContentGenerator inherits via this provider) - packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.ts - packages/core/src/core/geminiContentGenerator/index.ts (factory function, not the GeminiContentGenerator class — no signature change) End-to-end verification (local HTTP server in tmux): PASS: traceparent + X-Qwen-Code-Session-Id on every LLM request PASS: session id refreshes after simulated /clear (staleness regression guarded by llm-correlation-fetch.test.ts) PASS: OTLP upload traffic not traced (no feedback loop — PR A ignoreRequestHook working) Robot generated with Qwen Code https://github.com/QwenLM/qwen-code
…ry safety Adopts 7 review findings from wenshao on #4390 (+ duplicates from now-closed #4393). Critical bugs first, polish second. CRITICAL: 1. tsc TS2322 — wrapper return type incompatible with Anthropic SDK Fetch. `typeof fetch` (Node WHATWG, 2 overloads) is not structurally assignable to Anthropic's narrower `Fetch = (input: RequestInfo, init?) => ...`, even though they're call-compatible at runtime. Make wrapper generic `<TFetch extends FetchLikeLoose>` so callers preserve their exact fetch signature; cast the Anthropic call site through `unknown` with a comment explaining why. 2. tsc TS2352 / TS2493 — `baseFetch.mock.calls[0]![1] as RequestInit` was out-of-bounds when wrapped was called with no init arg. Replaced with a `makeFetchMock()` helper returning typed accessors. 3. normalizeOtlpPrefix catch fallback was DANGEROUS — a config of `"http"` produced prefix `"http"` which `startsWith`-matched every outbound HTTP request → silently disabled ALL instrumentation (no client spans, no correlation header — defeats the entire feature). Fixed: catch returns undefined + diag.warn. Misconfigured endpoint loses its feedback-loop guard (acceptable) instead of disabling all guards (catastrophic). 4. `url.startsWith(prefix)` matching was NOT boundary-safe — port collision (`:4318` matches `:43180`), hostname suffix collision (`otlp.example.com` matches `otlp.example.com.evil.net`), path-segment collision (`/v1` matches `/v1foo/x`). Replaced with origin-equality + path-prefix + boundary-char check (next char must be `/`, `?`, `#`, or end-of-string). 5. HttpInstrumentation also lacked the OTLP feedback-loop guard. The OTLP HTTP exporter (`@opentelemetry/exporter-trace-otlp-http`) uses node:http (patched by HttpInstrumentation, NOT undici). Without this, every OTLP upload batch creates a parasitic client span → feedback loop. Added `ignoreOutgoingRequestHook` that reuses the same `matchesOtlpPrefix` / `stripPathSuffix` helpers as the undici instrumentation. SAFETY: 6. Request input + undefined init dropped the Request's own headers (Authorization etc.) because `new Headers(undefined)` → `{...init, headers}` replaced them with just our session header. Fix: when input is a Request and init.headers is unset, seed from input.headers before adding ours. 7. Wrapped fetch had no try/catch — a throwing Config getter or Headers constructor would propagate as TypeError and break the LLM request path. Wrapped header construction in try/catch; on failure, fall through to baseFetch with original init (no header) + diag.warn. Telemetry must never break the model call. COVERAGE: - 3 new sdk.test.ts boundary tests (port/host/path) - 1 new sdk.test.ts normalizeOtlpPrefix catch-branch coverage - 1 new sdk.test.ts HttpInstrumentation OTLP guard test - 1 new sdk.test.ts proxy-mode wrapped-fetch test (default.test.ts) - 1 new anthropic test asserting wrapped fetch installed on Anthropic SDK - 2 new llm-correlation-fetch.test.ts (Request-headers preservation + try/catch fall-through) All 668 tests pass (1 pre-existing Anthropic User-Agent failure on main is unrelated). tsc clean. Declined: #10 DRY-refactor of baseFetch extraction across 3 sites — the duplication was pre-existing (default/dashscope buildClient was already near-identical), refactoring is a separate cleanup PR not gated by this feature. Will reply on the thread. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
All 4 findings from the previous review round are resolved in 00434502a:
- Critical:
normalizeOtlpPrefixcatch block now returnsundefined+diag.warn— no more dangerous"http"fallback - Suggestion:
startsWithreplaced withmatchesOtlpPrefix(origin-exact + path-boundary check) - Suggestion:
HttpInstrumentationnow hasignoreOutgoingRequestHookfor OTLP feedback-loop guard - Suggestion: Comprehensive test coverage added (boundary tests, Anthropic fetch wrapper test, catch-branch test)
tsc clean (0 errors), eslint clean, 215/215 tests pass locally. LGTM ✅
Downgraded from Approve to Comment: CI still running.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
Part 1 of #4384 (sub-issue of #3731 P3 deeper observability). Today qwen-code's only OTel instrumentation is `HttpInstrumentation`, which only patches Node's `http`/`https` modules. The `openai` and `@google/genai` SDKs use `globalThis.fetch` (undici), so outbound LLM requests carry no `traceparent` header and trace context dies at the qwen-code process boundary. Adds `@opentelemetry/instrumentation-undici@0.14.0` (peer-compatible with the installed `@opentelemetry/instrumentation@0.203.0`) and wires it into `initializeTelemetry()` next to the existing `HttpInstrumentation`. Default propagator (W3C tracecontext + baggage) remains unchanged — no explicit `textMapPropagator` needed. `ignoreRequestHook` skips OTLP exporter endpoints to avoid the classic feedback loop (OTel SDK uses fetch to upload OTLP data; without the hook each upload would create a span that gets uploaded, infinitely). Configured `otlpEndpoint` / per-signal endpoints are stripped of trailing slash and query string for robust prefix matching against undici's `request.origin + request.path`. Outbound LLM calls now also produce a client-side HTTP span (separating network TTFB / transfer time from the existing `api.generateContent` total-duration span). Design doc: docs/design/telemetry-outbound-propagation-design.md (Part A — traceparent; Part B — session id header — lands in a follow-up PR per the design's split rationale.) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Review feedback on #4390: 1. CI was failing on npm ci because the lockfile was generated with npm 11 locally (it sprinkles `peer: true` annotations npm 10 reads differently and rejects). Regenerated with npm 10 (matching CI's Node 22.x default), so the diff vs main is now 18 lines (the actual instrumentation-undici entry) instead of 105 lines of npm-version drift noise. 2. (Copilot inline at sdk.ts:330) `otlpUrlPrefixes` was derived from raw Config strings, so a settings.json `"otlpEndpoint": "\"http://...\""` (quoted) or trailing `#fragment` would silently miss the prefix match and reintroduce the feedback loop the hook exists to prevent. Replaced the regex-based suffix trim with a WHATWG URL parser: - strips ?query, #fragment, trailing slash - trims symmetric ASCII quotes a user may have placed in settings.json - falls back to safe suffix trimming if URL parsing fails (misconfigured endpoint still gets SOME protection) 3. (CodeQL inline) Replaced the `/\?.*$/` regex in ignoreRequestHook with `indexOf('?')`/`indexOf('#')` slicing for ReDoS hygiene. The regex was linear in practice but flagged as polynomial — using indexOf removes the ambiguity and is arguably simpler. Added 3 tests in sdk.test.ts covering the new normalizations (#fragment on incoming path, quoted endpoint, #fragment on configured endpoint). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…uests Part 2 of #4384. Stacks on top of PR #4390 (traceparent via undici). Adds a product-namespaced HTTP header X-Qwen-Code-Session-Id to every outbound LLM request when telemetry is enabled, so server-side ingestion can correlate observed requests with qwen-code session metric/log records. Pattern matched from claude-code (X-Claude-Code-Session-Id, verified at src/services/api/client.ts:108 in their open-source repo). Critical design decision (design doc section 4.3): the OpenAI / Anthropic providers use a per-request fetch wrapper rather than the SDK defaultHeaders option, because content-generator SDK clients are constructed once and NOT recreated on /clear-triggered session resets (Config.resetSession updates this.sessionId but the contentGenerator keeps using the stale header value). Reading config.getSessionId() from inside the wrapper at request time gives the live value. Gemini provider uses static httpOptions.headers — @google/genai HttpOptions interface does not expose a fetch hook (only headers, baseUrl, apiVersion, timeout, extraParams). This is a known limitation: after session reset, Gemini X-Qwen-Code-Session-Id stays stale until the contentGenerator is recreated. Documented in telemetry.md and the design doc section 8.6; spans/logs continue to carry the live session id for trace/log correlation. Lazy-invalidate fix is a follow-up sub-issue. Header is omitted when telemetry is disabled OR when getSessionId returns an empty string (some HTTP middleware rejects empty header values). Integration sites: - packages/core/src/core/openaiContentGenerator/provider/default.ts (base class — automatically covered by deepseek/minimax/mistral/ modelscope/openrouter subclasses; openrouter calls super.buildHeaders) - packages/core/src/core/openaiContentGenerator/provider/dashscope.ts (overrides buildClient — must be touched separately; QwenContentGenerator inherits via this provider) - packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.ts - packages/core/src/core/geminiContentGenerator/index.ts (factory function, not the GeminiContentGenerator class — no signature change) End-to-end verification (local HTTP server in tmux): PASS: traceparent + X-Qwen-Code-Session-Id on every LLM request PASS: session id refreshes after simulated /clear (staleness regression guarded by llm-correlation-fetch.test.ts) PASS: OTLP upload traffic not traced (no feedback loop — PR A ignoreRequestHook working) Robot generated with Qwen Code https://github.com/QwenLM/qwen-code
…ry safety Adopts 7 review findings from wenshao on #4390 (+ duplicates from now-closed #4393). Critical bugs first, polish second. CRITICAL: 1. tsc TS2322 — wrapper return type incompatible with Anthropic SDK Fetch. `typeof fetch` (Node WHATWG, 2 overloads) is not structurally assignable to Anthropic's narrower `Fetch = (input: RequestInfo, init?) => ...`, even though they're call-compatible at runtime. Make wrapper generic `<TFetch extends FetchLikeLoose>` so callers preserve their exact fetch signature; cast the Anthropic call site through `unknown` with a comment explaining why. 2. tsc TS2352 / TS2493 — `baseFetch.mock.calls[0]![1] as RequestInit` was out-of-bounds when wrapped was called with no init arg. Replaced with a `makeFetchMock()` helper returning typed accessors. 3. normalizeOtlpPrefix catch fallback was DANGEROUS — a config of `"http"` produced prefix `"http"` which `startsWith`-matched every outbound HTTP request → silently disabled ALL instrumentation (no client spans, no correlation header — defeats the entire feature). Fixed: catch returns undefined + diag.warn. Misconfigured endpoint loses its feedback-loop guard (acceptable) instead of disabling all guards (catastrophic). 4. `url.startsWith(prefix)` matching was NOT boundary-safe — port collision (`:4318` matches `:43180`), hostname suffix collision (`otlp.example.com` matches `otlp.example.com.evil.net`), path-segment collision (`/v1` matches `/v1foo/x`). Replaced with origin-equality + path-prefix + boundary-char check (next char must be `/`, `?`, `#`, or end-of-string). 5. HttpInstrumentation also lacked the OTLP feedback-loop guard. The OTLP HTTP exporter (`@opentelemetry/exporter-trace-otlp-http`) uses node:http (patched by HttpInstrumentation, NOT undici). Without this, every OTLP upload batch creates a parasitic client span → feedback loop. Added `ignoreOutgoingRequestHook` that reuses the same `matchesOtlpPrefix` / `stripPathSuffix` helpers as the undici instrumentation. SAFETY: 6. Request input + undefined init dropped the Request's own headers (Authorization etc.) because `new Headers(undefined)` → `{...init, headers}` replaced them with just our session header. Fix: when input is a Request and init.headers is unset, seed from input.headers before adding ours. 7. Wrapped fetch had no try/catch — a throwing Config getter or Headers constructor would propagate as TypeError and break the LLM request path. Wrapped header construction in try/catch; on failure, fall through to baseFetch with original init (no header) + diag.warn. Telemetry must never break the model call. COVERAGE: - 3 new sdk.test.ts boundary tests (port/host/path) - 1 new sdk.test.ts normalizeOtlpPrefix catch-branch coverage - 1 new sdk.test.ts HttpInstrumentation OTLP guard test - 1 new sdk.test.ts proxy-mode wrapped-fetch test (default.test.ts) - 1 new anthropic test asserting wrapped fetch installed on Anthropic SDK - 2 new llm-correlation-fetch.test.ts (Request-headers preservation + try/catch fall-through) All 668 tests pass (1 pre-existing Anthropic User-Agent failure on main is unrelated). tsc clean. Declined: #10 DRY-refactor of baseFetch extraction across 3 sites — the duplication was pre-existing (default/dashscope buildClient was already near-identical), refactoring is a separate cleanup PR not gated by this feature. Will reply on the thread. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI failing: Test (macos-latest, Node 22.x), Test (ubuntu-latest, Node 22.x), Test (windows-latest, Node 22.x). — gpt-5.5 via Qwen Code /review
0043450 to
a1a8a5a
Compare
Local maintainer validation —
|
| Stage | Command | Result |
|---|---|---|
| Install | npm ci |
✅ exit 0 |
| Build | npm run build |
✅ exit 0 (15 unrelated curly warnings in vscode-ide-companion) |
| Typecheck | npm run typecheck |
✅ exit 0 across all 5 workspaces |
| Lint | npm run lint |
✅ exit 0 |
| PR-touched core tests (6 files: sdk, llm-correlation-fetch, default/dashscope providers, gemini/anthropic generators) | npx vitest run ... |
✅ 214 / 215 (1 pre-existing UA env failure — see Triage) |
Full packages/core suite |
cd packages/core && npx vitest run |
|
Full packages/cli suite |
cd packages/cli && npx vitest run |
✅ 6779 passed / 9 skipped / 0 failed across 377 files |
| End-to-end against built artifact | custom Node harness driving dist/.../llm-correlation-fetch.js |
✅ 14 happy-path assertions pass; |
🚨 PR-caused regression
src/core/contentGenerator.test.ts — 2 of 4 tests fail with:
TypeError: config.getTelemetryEnabled is not a function
❯ staticCorrelationHeaders src/telemetry/llm-correlation-fetch.ts:121:15
❯ createGeminiContentGenerator src/core/geminiContentGenerator/index.ts:48:30
❯ Module.createContentGenerator src/core/contentGenerator.ts:370:21
❯ src/core/contentGenerator.test.ts:31:23
Verified PR-caused (not pre-existing or environmental):
- ✅ Reproduces on PR head
a1a8a5ac—2 failed | 2 passed (4) - ✅ Passes on
main(qwen-code-x3at16f0fde19) —4 passed (4) - ✅ Same failure surface on GitHub Actions Ubuntu (
Test (ubuntu-latest, Node 22.x)—2 failed | 9261 passed)
Root cause (diagnosed by e2e harness item [8]):
staticCorrelationHeaders(config) at llm-correlation-fetch.ts:119-127 calls config.getTelemetryEnabled() synchronously without defensive wrapping:
export function staticCorrelationHeaders(config: Config): Record<string, string> {
if (!config.getTelemetryEnabled()) return {}; // ← throws on legacy mocks
...
}But the sibling wrapFetchWithCorrelation at llm-correlation-fetch.ts:73-100 does wrap the same call in try/catch, with this comment (line 63):
Telemetry must never break the LLM request path — if Config getters throw, header construction fails, etc., we still want the model call to proceed.
The contentGenerator.test.ts mock (line 22 / 60) predates this PR and doesn't stub getTelemetryEnabled. The Gemini factory function (geminiContentGenerator/index.ts:48) now calls staticCorrelationHeaders on every construction, exposing the missing method.
Required follow-up — pick one:
- Defensive fix in
staticCorrelationHeaders(consistent with the wrap function's stated contract — recommended):export function staticCorrelationHeaders(config: Config): Record<string, string> { try { if (!config.getTelemetryEnabled()) return {}; const sid = config.getSessionId(); if (!sid) return {}; return { [SESSION_ID_HEADER]: sid }; } catch { return {}; // telemetry must never break the LLM request path } }
- Fix the test mocks in
contentGenerator.test.ts(addgetTelemetryEnabled: () => falseto both mockConfig blocks). Mechanical, but leaves other downstream Config-mock consumers vulnerable to the same trap.
Triage of the other 20 core failures (NOT caused by this PR)
| File | Fails | Cause | Verified |
|---|---|---|---|
src/skills/skill-manager.test.ts |
2 | .qwen path fixture bug |
Reproduces on PR bases of #4314 / #4345 / #4354 / #4386 — every .qwen/tmp/ worktree |
src/core/anthropicContentGenerator/anthropicContentGenerator.test.ts |
1 | Claude Code UA injection (claude-cli/1.2.3 (external, cli) overrides QwenCode/1.2.3) |
Reproduces on qwen-code-x3 main and every prior PR validation |
src/utils/gitDiff.test.ts |
12 | 5 s testTimeout under parallel load (core + cli running together saturates IO/CPU) |
Same pattern as PR 4345 validation — passes in isolation |
src/utils/filesearch/crawler.test.ts |
5 | Same — git ls-files / ripgrep subprocesses time out under parallel load |
Same pattern as PR 4345 — passes in isolation |
End-to-end proof against built packages/core/dist/.../llm-correlation-fetch.js
Drove the compiled wrapFetchWithCorrelation + staticCorrelationHeaders and asserted:
[0] SESSION_ID_HEADER = "X-Qwen-Code-Session-Id" (matches PR spec)
[1] wrapFetchWithCorrelation: telemetry ON adds session header
✓ request carried X-Qwen-Code-Session-Id: sess-abc
✓ Authorization header preserved
[2] wrapFetchWithCorrelation: telemetry OFF skips header
✓ no X-Qwen-Code-Session-Id header sent when telemetry off
[3] CRITICAL: sessionId is read at request time (not captured at construction)
✓ first=sess-initial, second=sess-after-clear — staleness regression NOT present
[4] Empty session id → header omitted (no empty value)
[5] Request input: Authorization preserved when init.headers undefined
[6] Defensive: throwing Config.getTelemetryEnabled does not break request (wrap path)
[7] staticCorrelationHeaders contract (Gemini fallback path) — happy path holds
[8] staticCorrelationHeaders defensiveness ⚠ NOT defensive — reproduces Ubuntu CI fail
[3] is the critical design property the PR description calls out — confirmed: wrapFetchWithCorrelation reads config.getSessionId() at request time, so after a simulated /clear (mutating sessionId on the same Config object), subsequent requests carry the new session id without the SDK being rebuilt.
[6] confirms the wrap path correctly absorbs a throwing Config.getTelemetryEnabled and still sends the request — exactly what the [8] gap should mirror.
Scope / risk
- Diff: +1709 / −4 across 16 files (2 docs + 4 new/edited core source + 7 core tests + 1 new test file + 1 dep added).
- New dep:
@opentelemetry/instrumentation-undici@0.14.0(peer-compatible with the existing@opentelemetry/instrumentation@0.203.0per the PR description). - The fetch-wrapper-vs-defaultHeaders choice is sound and well-defended in the PR body —
/clearstaleness is a real concern and this PR is the right shape to address it. Gemini fallback tohttpOptions.headersis documented and a follow-up sub-issue is tracked. - OTLP feedback-loop avoidance via
ignoreRequestHook(with WHATWGURLnormalization addressing the Copilot-flagged quoted-endpoint case) is well-handled.
Reviewer recommendation
Hold for a one-block fix. The header-propagation core mechanics are correct and the staleness regression is closed. The only blocking issue is the contentGenerator.test.ts breakage, which has a 1-block fix (either in staticCorrelationHeaders or in the test mocks). Once that lands:
- Local full core suite goes from
22 failed→20 failed(all 20 are the same environmental issues observed on every recent PR validation; they don't affect CI on clean Ubuntu/macOS/Windows runners) - Ubuntu CI should turn green (it only failed on the 2 PR-caused tests)
- macOS CI was still running at validation time; should also pass once the fix lands
The defensive try/catch fix in staticCorrelationHeaders is recommended because it aligns with the stated "telemetry must never break the LLM request path" contract that already governs wrapFetchWithCorrelation, and it future-proofs against the same trap recurring in other Config-mock-using tests.
— Maintainer local validation, run on a1a8a5ac from upstream pull/4390/head.
wenshao
left a comment
There was a problem hiding this comment.
— qwen-latest-series-invite-beta-v36 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI failing: Test (windows-latest, Node 22.x), Test (ubuntu-latest, Node 22.x), Test (macos-latest, Node 22.x). — gpt-5.5 via Qwen Code /review
…ndici Switch from exact pin `0.14.0` to `^0.14.0` for consistency with the rest of the `@opentelemetry/*` deps in this block (all carated). For 0.x semver, npm treats `^0.14.0` as `>=0.14.0 <0.15.0`, so patch updates within the 0.14.x line — which are tied to the same `@opentelemetry/instrumentation@0.203.x` peer — flow in via `npm update` without requiring a manual package.json edit. A bump across the 0.x minor (e.g. 0.15.x) would shift the instrumentation peer compatibility and still requires explicit attention, which the caret correctly blocks. Per review feedback on #4390 (wenshao). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
PR 4390 本地 tmux 验证报告PR: #4390 feat(telemetry): propagate W3C traceparent + X-Qwen-Code-Session-Id to outbound LLM requests 1. 总体结论
合并建议:📋 待修一个小问题再合(详见 §5 "建议动作")。整体方向、测试覆盖、E2E 行为都过关;缺的只是 2. 验证方法在 tmux 会话
关键方法学:所有 "test-core 失败" 都在独立 worktree 3. PR 自身行为验证3.1 PR 新增/修改的 9 个测试文件(隔离运行)3.2 E2E 端到端行为验证(驱动已构建产物)脚本 亮点:[3] 是 PR 设计文档 §4.3 的核心 — 之所以不能用 4. 失败逐项核对(基线 vs PR)测试运行 PR 时报 23 个失败,逐项在 baseline (v0.16.0) 上做同样隔离测试,分类如下: 4.1
|
| 验证 | 结果 |
|---|---|
| baseline (v0.16.0) 跑同一子集 | crawler 44/44 ✓,gitDiff 59/59 ✓ |
| PR worktree 隔离跑同一子集 | crawler 44/44 ✓,gitDiff 59/59 ✓ |
| PR worktree 全套并发跑 | crawler 5 / gitDiff 12 timeout |
根因:这些测试用了真实 git fork + 文件 IO,5s timeout 在多 worker 并发争抢时偶发超时;与 PR telemetry 改动无关。
建议:可考虑给 crawler / gitDiff 加 testTimeout: 15000,但是独立工程问题。
5. 建议动作
5.1 合并前(阻断性 1 项)
- 修
staticCorrelationHeaders防御性:按 §4.1 推荐方案给函数加 try/catch +diag.warn,保持与wrapFetchWithCorrelation的契约对称。改动约 3-5 行 + 加 1 个单测覆盖(构造一个无getTelemetryEnabled的 Config 传入,断言返回{})。
5.2 合并后(跟踪性 3 项,可另开 issue)
- anthropic
treats unset baseURLUA 测试基线失败(与 revert(core): undo x-api-key + Authorization double-emit (#4342) — regresses IdeaLab-style proxies #4385 revert 可能有关) -
skill-manager.test.tsmock 对.qwen路径过度敏感 -
crawler/gitDiff测试 timeout 容易在并发下 flake
5.3 不影响合并、值得点赞
llm-correlation-fetch.ts行 33-36 的FetchLikeLoose+ 行 64 的<TFetch extends FetchLikeLoose>泛型设计,把 openai / anthropic / @google/genai 三个 SDK 各异的 Fetch 类型差异完美隔离- 设计文档 §8.6 明确写了 Gemini staleness 的已知短板和未来子任务,技术诚实
- OTLP 反馈环 boundary-safe 匹配(
origin精确比 +pathname路径边界)超出常规防护
6. 复现指引
# 进入 tmux 会话
tmux attach -t pr4390
# 关键命令
tmux capture-pane -t pr4390:e2e -p -S -200 # E2E 输出
tmux capture-pane -t pr4390:pr-tests -p # PR 自测试输出
tmux capture-pane -t pr4390:baseline -p -S -100 # baseline 对照输出
# 重跑 E2E 脚本
node /tmp/pr4390-e2e-correlation.mjs报告由 Claude Opus 4.7 在本地 tmux 上完整验证,作为维护者 merge 决策参考。
Lint check `Check settings schema is up-to-date` failed because the checked-in `packages/vscode-ide-companion/schemas/settings.schema.json` wasn't regenerated after adding `telemetry.sessionIdHeaderHosts` to `settingsSchema.ts` in commit 1c8528a. Regenerated via `npm run generate:settings-schema`. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…t-allowlist scoping Adds a "修订历史" header table at the top and a new §11 "R3 修订 — Host-Allowlist Scoping for X-Qwen-Code-Session-Id" capturing what changed after LaZzyMan's REQUEST_CHANGES review, why, and how. Inline pointers added at §3.1, §3.2, §4.3, §4.4, §9 (claude-code comparison table) to point readers at §11 — original prose preserved as a record of the decision path rather than rewritten in place. Concretely §11 covers: - The three-step LazzyMan critique and why R1's "broadcast to all providers" was structurally wrong for an open-source multi-provider CLI - The default allowlist (`DEFAULT_SESSION_ID_HEADER_HOSTS`) and its semantic alignment with the DashScope provider detector - Pattern grammar (bare hostname / `*.suffix` dot-anchored), the TLD-suffix attack vectors it rejects, why no regex / port-aware globbing - `wrapFetchWithCorrelation` host gate, wrap-time vs request-time semantics, `[" * "]` whitespace tolerance - `staticCorrelationHeaders` `destinationUrl` parameter, Gemini factory's fail-closed treatment of unset `baseUrl` (avoids the Vertex vs `generativelanguage.googleapis.com` ambiguity) - All R3 file changes mapped to the original §5 file-change list - Mapping of LazzyMan's three concerns to R3's responses - §10 future-work additions: `traceparent` per-destination toggle, `X-Qwen-Code-Request-Id`, IPv6 allowlist syntax No code changes; documentation only. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI still running. — gpt-5.5 via Qwen Code /review
Three issues found by wenshao reviewing the R3 host-allowlist scoping.
1. **[Critical] `broadcastAll` outside safety try/catch**
(llm-correlation-fetch.ts wrapFetchWithCorrelation)
The try/catch only fires when `getTelemetrySessionIdHeaderHosts()`
throws. If it returns a malformed value — a bare string (settings.json
typo `"sessionIdHeaderHosts": "host"` instead of `["host"]`), an array
containing `null`/`undefined`/number entries, or whitespace-padded
entries — `.some((p) => p.trim() === '*')` throws TypeError at
buildClient time, bricking the LLM session before the first prompt.
`staticCorrelationHeaders` already handled this via its end-to-end
try/catch but the sister helper diverged. Settings loader does no
runtime schema validation so this is reachable via a single typo.
Fix: normalize the allowlist at wrap time:
1. catch a throwing getter (existing)
2. reject non-array → default allowlist (NEW — bare string typo)
3. filter out non-string elements (NEW — [null, ...] typo)
4. trim every surviving entry uniformly (NEW — see #2 below)
Then `trustedHosts.includes('*')` instead of `.some((p) => p.trim() === '*')`,
since patterns are already pre-trimmed.
2. **Trim asymmetry between `*` detection and host-pattern match**
(llm-correlation-fetch.ts)
`[" * "]` was tolerated (trimmed before `===` compare) but
`[" dashscope.aliyuncs.com "]` silently never matched. The
normalization above fixes this by trimming uniformly upstream.
3. **Proxy fetch test: only negative assertions**
(openaiContentGenerator/provider/default.test.ts)
The test asserted `callArg.fetch !== proxyFetch` and `!== globalThis.fetch`
but both passed for ANY wrapper, including a buggy one that
accidentally wraps globalThis.fetch instead of proxyFetch. Added a
positive assertion: call the wrapped fetch and verify proxyFetch was
the delegation target.
Tests: 4 new cases — whitespace-padded host pattern, bare-string
malformed config (both wrapper and static), null/number-containing
array malformed config (both wrapper and static), positive proxy fetch
delegation. All pass; pre-existing Anthropic User-Agent failure
unrelated.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
No high-confidence issues at commit 7a1b4f8d (round 7). Build passes, 9270/9272 tests pass (2 pre-existing unrelated), tsc/eslint clean. All 7 low-confidence suggestions are incremental quality improvements — see terminal output for details. — qwen3.7-max via Qwen Code /review
LaZzyMan
left a comment
There was a problem hiding this comment.
Review (round 8 follow-up)
Thanks for the substantive response — the host allowlist in 1c8528a56 genuinely addresses constraint #1 by removing the dangerous default, and the defensive normalization in 7a1b4f8d is the right hardening. The original severity: high finding is materially resolved.
But I want to step back from the rename suggestion I drafted earlier and push on a different, harder line, because I think the rename framing concedes too much:
Telemetry is not a container for adjacent features
The traceparent cross-process propagation and the X-Qwen-Code-Session-Id header injection are not telemetry. They are outbound-identity / outbound-correlation work that uses some OTel APIs internally as an implementation detail. The fact that they're convenient to land alongside a telemetry change does not make them telemetry work.
I don't think the right resolution is "rename the namespace inside this PR." I think the right resolution is scope discipline going forward: telemetry PRs should land telemetry, and outbound-correlation work should land in its own PR under its own design discussion, with its own settings tree, its own threat model, and its own user consent flow. Bundling them — even with safe defaults — sets a precedent that future telemetry work can dictate wire-level behavior to third parties, which is exactly the misframing risk that surfaced this whole conversation.
What I'm asking for this round
-
Split this PR. Land in this PR only what is genuinely user-facing observability:
UndiciInstrumentationregistration (provides the client-side HTTP span — pure observability for the qwen-code operator's own collector)- The OTLP feedback-loop guard (necessary side-effect of the above)
- Configure the instrumentation so it does not propagate trace context on the wire (use a no-op propagator on the undici instrumentation, or disable
requestHookinjection — whichever the API supports)
Remove from this PR:
- The entire
X-Qwen-Code-Session-Idapparatus (llm-correlation-fetch.ts,trusted-llm-hosts.ts,telemetry.sessionIdHeaderHosts, the fetch-wrapper plumbing across four provider construction points, the related tests) - The active
traceparentpropagation to upstream LLM providers (keep the trace IDs internal to the user's collector, don't put them on outbound wires)
-
Open a separate issue/PR for outbound correlation headers. If the ARMS+DashScope cross-process trace continuation is a real product requirement (and I believe you that it is), that's a feature on its own merits and deserves its own design discussion. The discussion should cover at minimum:
- Threat model: what's the recipient set, what's the worst-case correlation possibility
- Settings tree: independent of
telemetry.*, named for what it actually does (e.g.,outboundCorrelation.*,llmRequestHeaders.*) - Default: off, per the principle "an open-source client shouldn't broadcast stable identifiers to third parties without explicit user consent"
- Override semantics: same
trustedHostsallowlist mechanism you've built here can carry over unchanged - Audit trail: documented in
docs/users/configuration/settings.mdas a security-relevant setting, not buried under "observability"
-
The work you've already done is not wasted. The fetch wrapper, the host allowlist, the defensive normalization, the test coverage — all of that lifts cleanly into the follow-up PR. You're moving it from one PR to another, not redoing it. What you lose is the ability to land everything in one shot; what you gain is a settings tree whose names match what they do, a separate design doc that names the recipient and threat model, and a precedent that telemetry PRs don't quietly extend wire-level behavior to third parties.
@wenshao @doudouOUC — I'd rather we settle this before I dismiss vs. hold. The principle I'm asking for is "telemetry's scope of work doesn't include sending identifiers to LLM providers, and shouldn't quietly grow to include it." If we accept that principle, the split is mechanical. If we don't, this PR is the wrong place to debate it because the technical fixes are already in.
Verdict
REQUEST_CHANGES (continued) — defaults are now safe, but the scope conflation is the underlying problem. Asking for the outbound-correlation work to be split into a separate PR/issue with its own settings namespace and design discussion, leaving this PR to land the actual telemetry work (client-side HTTP spans + OTLP loop prevention) on its own.
…e (R4) Address LaZzyMan round-8 follow-up review on PR #4390: even though R3's host allowlist made the default behavior safe, the meta-architectural concern remains: telemetry's namespace and consent flow shouldn't quietly extend to wire-level behavior aimed at third-party LLM provider request streams. The recipient sets differ; the consent decisions differ; they deserve separate namespaces, separate threat models, separate PRs. This commit (called R4 in the design doc) collapses the PR scope so it lands ONLY telemetry observability work: REMOVED from this PR: - packages/core/src/telemetry/llm-correlation-fetch.ts(.test.ts) - packages/core/src/telemetry/trusted-llm-hosts.ts(.test.ts) - telemetry.sessionIdHeaderHosts setting + Config getter + resolveTelemetrySettings wiring + settingsSchema entry - wrapFetchWithCorrelation usage from four provider construction points (default.ts, dashscope.ts, anthropicContentGenerator.ts, geminiContentGenerator/index.ts) - All session-id provider tests across the four providers + the contentGenerator.test.ts mock stub - "Session correlation header" section in telemetry.md ADDED: - OutboundCorrelationSettings interface in packages/core/src/config/config.ts, standalone top-level namespace separate from TelemetrySettings — SECURITY-RELEVANT label, all defaults off - Config.getOutboundCorrelationPropagateTraceContext() getter - outboundCorrelation top-level entry in settingsSchema.ts with propagateTraceContext: { default: false } and explicit SECURITY-RELEVANT framing in the description - CLI config-load pipeline passes settings.outboundCorrelation into ConfigParameters - NOOP_PROPAGATOR (TextMapPropagator no-op) in sdk.ts, conditionally installed on NodeSDK when propagateTraceContext is false (default). When true, omits textMapPropagator from NodeSDK options so the SDK keeps its default W3C composite propagator - 2 new sdk.test.ts cases covering the propagator gate behavior UNCHANGED: - UndiciInstrumentation registration + OTLP feedback-loop guard + HttpInstrumentation OTLP guard from R2/R3 stay intact — they are pure telemetry (client HTTP spans into the operator's own OTLP collector), no wire-level data egress - Documentation rewrites telemetry.md to split "client-side HTTP span on outbound fetch" (telemetry) from a new "Outbound correlation (SECURITY-RELEVANT)" top-level section - design doc gets R4 revision row + new §12 "R4 Scope Conflation Split" capturing the rationale and follow-up PR outline The session-id apparatus (R3 code) lives in git history at commits 1c8528a / cb162e7 / 7a1b4f8 / 40e1efc / 106598c; the follow-up PR can cherry-pick or restore those files under the new outboundCorrelation.* namespace as LazzyMan suggested. Vscode-ide-companion settings.schema.json regenerated. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
|
@LaZzyMan — accepting the split. The scope-conflation principle you laid out is correct, and the technical fixes already being in (R3 + R2) means there's no good reason to debate it inside this PR. Took the "split is mechanical" path you described — see commit What's in this PR now
The single boolean here establishes the What moved to a follow-up PRThe entire
The R3 implementation code lives in git history at commits How the new arrangement maps to your three asks
Verdict requestAsking for a re-review at your convenience. Happy to iterate further if there's anything I missed in the split — e.g. if you'd rather the |
…aceContext Self-review pass on R4 commit 9bdd3bd flagged one footgun: both `docs/developers/development/telemetry.md` and the settingsSchema.ts description for `outboundCorrelation.propagateTraceContext` describe the toggle's behavior without noting that the flag is a silent no-op when `telemetry.enabled` is false. An operator who sets only `outboundCorrelation.propagateTraceContext: true` and forgets the telemetry switch gets zero behavior change — no error, no warning, no traceparent. Fix: add the dependency disclosure to both surfaces, plus a JSON example showing both flags wired together for the ARMS+DashScope cross-process trace continuation use case. Also fix a minor comment accuracy nit at `sdk.test.ts:683`: said the SDK installs `W3CTraceContextPropagator` instance when opt-in is true, but the actual default is `CompositePropagator(W3CTraceContextPropagator + W3CBaggagePropagator)` per `@opentelemetry/sdk-node` source. Vscode-ide-companion settings.schema.json regenerated to reflect the expanded description string. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
R4 (commit 9bdd3bd) added the getter but the test file didn't grow a corresponding describe block — sibling telemetry getters all have unit tests but this new one was missed. Add 4 cases covering the security-relevant default-to-false invariant and explicit-set behavior: - omitted outboundCorrelation → false - empty outboundCorrelation: {} → false (the `?? false` collapse on the getter, complementing the same on the constructor) - explicit true → true - explicit false → false PR #4390 review (wenshao). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
R4 (commit 9bdd3bd) was followed by two polish commits that the design doc §12 didn't track: - 0be0df2 (docs): telemetry.enabled dependency disclosure on propagateTraceContext — added to telemetry.md + settingsSchema description because a self-review pass identified the silent-no-op footgun (operator sets propagateTraceContext: true but forgets telemetry.enabled: true, sees zero behavior change with no error). - c0352fd (test): 4 config.test.ts cases covering the getOutboundCorrelationPropagateTraceContext default-false invariant (omitted / {} / explicit true / explicit false) — wenshao review flagged the test gap. Updates §12.4 with a new "Hidden dependency: telemetry.enabled" sub- section explaining the gating relationship and pointing forward at the follow-up PR (future outboundCorrelation.* settings inherit the same dependency). Updates §12.5 implementation table to add the config.test.ts row and clarify the telemetry.md / vscode-schema rows were touched again in the polish pass. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
/simplify review pass on commits 0be0df2 + c0352fd + 62cf6b4 flagged 4 concerns. Fix all 4: 1. **settingsSchema.ts description**: footgun warning ("Depends on telemetry.enabled: true") was at char 600+ of a 650-char description. VS Code settings UI truncates to ~300 chars inline → the most important warning was hidden in the most-glanced view. Hoist to first sentence ("Requires telemetry.enabled: true."). 2. **config.test.ts**: drop the task-narration comment ("PR #4390 R4: keep wire-level toggle out of telemetry namespace.") that just restated the change context. The remaining 2-line comment explaining WHY default-to-false is security-relevant survives. 3. **config.test.ts**: collapse 4 separate `it()` blocks into a single `it.each([...])` covering the same 4 precondition × expectation combinations. Removes boilerplate (`new Config({...baseParams, ...})` repeated 4×) without losing assertion power; case-3 ("explicit false") was a weak duplicate of case-2 ("empty object") since both hit the same `?? false` branch, but keeping all 4 in the parametric table documents intent more clearly than dropping case-3. 4. **design doc §12.4 + §12.5**: strip specific commit SHAs (`0be0df270`, `c0352fd5b`) — design docs should be evergreen, not doubled-up commit logs (those live in `git log`). Keep the design intent ("two panels both document the dependency" / "test block added") without naming the specific commits. Regenerated vscode-ide-companion/schemas/settings.schema.json to reflect the hoisted description sentence. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
LaZzyMan
left a comment
There was a problem hiding this comment.
Review
The R4 split lands cleanly. This PR is now scoped to the two changes that genuinely belong under telemetry.* — the client-side HTTP span from UndiciInstrumentation and the OTLP feedback-loop guard. The whole X-Qwen-Code-Session-Id apparatus (the fetch wrapper, the host allowlist, the related provider-constructor plumbing, the telemetry.sessionIdHeaderHosts setting) is gone from the production diff; only the design-doc narrative retains the R1/R3 history.
traceparent is no longer written on the wire by default. The new outboundCorrelation.propagateTraceContext switch lives in its own top-level namespace (separate from telemetry.*), defaults to false, and installs a no-op propagator on the OTel SDK until explicitly opted in. The schema description leads with SECURITY-RELEVANT. and the inner propagateTraceContext field's description opens with the telemetry.enabled dependency — both well inside the VS Code settings-UI inline truncation window.
Verdict
APPROVE — All three round-8 asks (telemetry-only scope, session-id apparatus removed, traceparent off-by-default under its own namespace) are fully delivered. The single residual nit — propagateTraceContext: true with telemetry.enabled: false silently no-ops without a runtime warning — is docs-disclosed and fine to ship as a follow-up.
wenshao
left a comment
There was a problem hiding this comment.
R8 config test gap fully addressed (c0352fd) — 4-case it.each locks down the default-false security invariant. Description front-loads the telemetry.enabled dependency notice. Design doc documents the footgun. tsc clean, 176 config tests pass. No new findings. LGTM! ✅ — qwen3.7-max via Qwen Code /review
Local Real-Test Verification Report (R4-polish)Re-validating after R4 scope reduction. The earlier R2 verification (#issuecomment-4513932536, 2026-05-21) tested an obsolete shape (always-on Test environment
Suite results
The 4 boundary-safety tests from the R2 review all pass on this commit:
The R4 propagator-gate tests are also green:
Real-network integration testEach phase runs in a fresh node process (otherwise the global propagator from phase 1 bleeds into phase 2 — Result: ✅ 16 / 16 assertions pass. Phase A — default (
|
| ID | Assertion | Detail |
|---|---|---|
| A1 | propagation.fields() empty (NoopPropagator) |
fields=[] |
| A2 | Outbound LLM request carries NO traceparent |
traceparent=null |
| A3 | Outbound LLM request carries NO baggage |
baggage=null |
Phase B — opt-in (outboundCorrelation.propagateTraceContext: true)
[optin] propagation.fields() = ["traceparent","tracestate","baggage"] ← W3C composite installed
[optin] active span: traceId=1f2983b0a47afce44f9bf9e9037651d1 spanId=0e1d05fac956aae2
[optin] LLM-server received 1 request:
POST /v1/chat/completions traceparent=00-1f2983b0a47afce44f9bf9e9037651d1-a318eddf0cee3759-01
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
matches active span's traceId
| ID | Assertion | Detail |
|---|---|---|
| B1 | propagation.fields() contains traceparent |
["traceparent","tracestate","baggage"] |
| B2 | Outbound LLM request carries valid W3C traceparent (00-<32hex>-<16hex>-<2hex>) |
00-1f2983b0…-a318eddf0cee3759-01 |
| B3 | traceparent's traceId == active span's traceId |
both 1f2983b0a47afce44f9bf9e9037651d1 |
The span-id portion (a318eddf0cee3759) differs from the parent span id (0e1d05fac956aae2) because UndiciInstrumentation correctly creates a CLIENT child span for each outbound request and propagates that child's id — standard W3C behavior, this is what we want.
Phase C/D — OTLP feedback-loop guard
UndiciInstrumentation's ignoreRequestHook and HttpInstrumentation's ignoreOutgoingRequestHook both received the same otlpUrlPrefixes set. To prove the guard works end-to-end, the phase script also fetch()s the OTLP endpoint directly inside an active span (simulating any module that might POST to /v1/traces itself).
| ID | Assertion | Detail |
|---|---|---|
| C1 | Default phase: OTLP collector got span uploads | 2 POSTs to /v1/traces |
| C2 | Opt-in phase: OTLP collector got span uploads | 2 POSTs to /v1/traces |
| C3 | Even in opt-in mode, OTLP /v1/traces uploads carry NO traceparent |
both traceparent=null |
| D1 | Manual probe to OTLP endpoint reached collector (default) | 1 body-empty POST |
| D2 | Manual probe to OTLP endpoint reached collector (opt-in) | 1 body-empty POST |
| D3 | Opt-in manual probe carries NO traceparent (because ignoreRequestHook excluded it from instrumentation → propagation.inject was never invoked on its carrier) |
traceparent=null |
Phase E — Client HTTP span exported via OTLP
The mock collector dumps every uploaded OTLP-JSON body to disk. The harness parses them and looks for kind=3 (CLIENT) spans whose scope is @opentelemetry/instrumentation-undici.
// /tmp/pr4390-test/otlp-bodies/0001-_v1_traces.bin (default phase)
"scope":{"name":"@opentelemetry/instrumentation-undici","version":"0.14.0"},
"spans":[{
"traceId":"21e4605424efef838bf3df1d9afa486b",
"spanId":"7bda1ed529b2c791",
"parentSpanId":"cf3261b6c4d0f71b", // ← parent = api.generateContent.default
"name":"POST", "kind":3, // ← CLIENT span
"attributes":[
{"key":"http.request.method","value":{"stringValue":"POST"}},
{"key":"url.full","value":{"stringValue":"http://127.0.0.1:18411/v1/chat/completions"}},
{"key":"http.response.status_code","value":{"intValue":200}},
…
]
}]| ID | Assertion | Detail |
|---|---|---|
| E1 | Default phase: undici emitted CLIENT span for outbound LLM POST /v1/chat/completions (status 200) |
found |
| E2 | Opt-in phase: undici emitted CLIENT span for outbound LLM | found |
| E3 | Default phase: NO undici CLIENT span exists for the OTLP URL (127.0.0.1:18412) |
none — guard works |
| E4 | Opt-in phase: NO undici CLIENT span exists for the OTLP URL | none — guard works |
E3/E4 are the strongest end-to-end proof of the loop guard: even with a manual fetch(MOCK_OTLP/v1/traces) issued inside an active span, the exported span tree contains zero instrumentation-undici entries pointing at the OTLP host. ★ The PR's ignoreRequestHook correctly suppresses span creation, which is the only break in the export → span → export feedback loop.
What I did NOT verify (caveats)
- Real LLM provider request flow (DashScope/OpenAI/Anthropic SDK actually using
globalThis.fetch). I trusted the design doc's audit:openai@5.11.0,@google/genai@1.30.0,@anthropic-ai/sdk— allglobalThis.fetch(undici). Easy to reverify by greppingnode_modulesif anyone is paranoid. - gRPC OTLP. The PR's guard is wired to
HttpInstrumentation(for the HTTP exporter) andUndiciInstrumentation; gRPC traffic doesn't go through either, so no guard is needed there. Not tested locally — would need a gRPC collector. - TUI / Ink-render interaction. Telemetry init runs before Ink, the design doc covers this; I did not run
qweninteractively. - MCP StreamableHTTP propagation. PR description claims propagation also covers MCP; inferred since MCP uses the same
globalThis.fetch. Not exercised by the integration test (only the LLM endpoint was hit). - Pre-existing flake:
anthropicContentGenerator.test.ts > treats unset baseURL as Anthropic-native (SDK default targets api.anthropic.com)fails locally on macOS Node 22.17 (User-Agentmismatch:claude-cli/1.2.3 (external, cli)vs expectedQwenCode/1.2.3). This failure also reproduces onorigin/mainwith PR feat(telemetry): client-side HTTP span + opt-in W3C traceparent propagation (#4384) #4390 reverted — it is not introduced by feat(telemetry): client-side HTTP span + opt-in W3C traceparent propagation (#4384) #4390. CI shows green for the same commit on Linux/macOS/Windows Node 22, so this looks environment-specific. Flagging for separate triage.
Recommendation
Code is correct, well-tested at the unit level (231 tests across the two changed files), and behavior is exactly what's documented in OutboundCorrelationSettings's description and the design doc. The R4 split (telemetry observability ↔ outbound-wire correlation) cleanly addresses @LaZzyMan's scope-conflation concern. OK to merge from a verification standpoint.
Two low-priority follow-ups (not blockers):
- The PR is 27 commits behind main; before merging, recommend a rebase + green CI rerun rather than landing the test-merge as-is.
anthropicContentGeneratormacOS-local User-Agent test failure is pre-existing on main and worth a separate triage issue.
Verification artifacts retained at /tmp/pr4390-test/{result.json,otlp-bodies/} and .pr4390-test/ inside the worktree. Reproducible end-to-end with node .pr4390-test/mock-servers.mjs & then node .pr4390-test/run-real-test.mjs.
Closes #4384 (partial — outbound correlation headers split into follow-up). Sub-issue of #3731 (P3 deeper observability).
This PR was originally split into two stacked PRs (this one for traceparent + #4393 for session id), merged into a single PR per reviewer request, then scope-reduced again in R4 following @LaZzyMan's round-8 follow-up review. See "Scope evolution" below.
Summary
Telemetry observability only. Two changes that genuinely fit under
telemetry.*:Client-side HTTP span on outbound
fetch()— registers@opentelemetry/instrumentation-undici@^0.14.0. UndiciInstrumentation creates a client HTTP span per outboundfetch(LLM SDKs, MCP, WebFetch, IDE-extension out-of-process calls), exported into the operator's own OTLP collector. Lets you separate network latency (TTFB / response body transfer) from upstream model processing time — theapi.generateContentspan alone can't distinguish them.OTLP feedback-loop guard —
HttpInstrumentationandUndiciInstrumentationboth receiveignoreRequestHook/ignoreOutgoingRequestHookthat skip configured OTLP endpoints (telemetry.otlpEndpoint,telemetry.otlpTracesEndpoint, etc.). Without this guard, every OTLP upload would create a parasitic client span → exported → infinite loop. The hook uses WHATWGURLparsing (handles surrounding quotes,#fragment, trailing slash, default-port stripping), boundary-safe origin+path matching, and fail-open on missing protocol — accumulated across multiple wenshao review iterations.Out of scope (split out per R4)
The earlier R1–R3 versions of this PR also injected
X-Qwen-Code-Session-Idandtraceparentonto every outbound LLM request as a default behavior. After @LaZzyMan's round-8 review, that work was separated out under the principle that telemetry's settings namespace shouldn't encompass wire-level behavior aimed at third-party LLM provider request streams:X-Qwen-Code-Session-Idheader: removed from this PR. Will land in a separate follow-up PR under the newoutboundCorrelation.*namespace, with its own threat model, default-off, and own design-doc discussion. The R3 host-allowlist + fetch-wrapper code lives in git history (commits 1c8528a / cb162e7 / 7a1b4f8 / 40e1efc / 106598c) for the follow-up to cherry-pick.traceparentwire propagation: now gated by a single new setting in this PR:When
false(default),NoopTextMapPropagatoris installed onNodeSDKsopropagation.inject()writes nothing into outbound requests. Trace IDs stay internal to the operator's OTLP collector. Whentrue, the SDK's default W3C composite propagator (tracecontext+baggage) is installed andtraceparentis written on every outboundfetch— opt in only when the LLM provider also reports into your collector (e.g. ARMS Tracing + DashScope cross-process trace continuation).The
outboundCorrelation.*top-level namespace is intentionally separate fromtelemetry.*: different recipient (LLM provider vs operator's collector), different consent decision, security-relevant. The single boolean here establishes the namespace for the follow-up session-id PR.Dependency
Adds
@opentelemetry/instrumentation-undici@^0.14.0(peer-compatible with installed@opentelemetry/instrumentation@0.203.0).Settings
Default
false. Seedocs/developers/development/telemetry.md"Outbound correlation (SECURITY-RELEVANT)" section.Test plan
sdk.test.ts: instrumentation registration, OTLP guard correctness (default-port matching, host fallback strip, asymmetric-quote tolerance, missing-protocol fail-open, IPv6 bracket handling), new propagator-gate tests (NoopTextMapPropagator installed by default, undefined when opt-in)settings.schema.jsonregenerated and committedScope evolution (for reviewers re-reading the thread)
🤖 Generated with Qwen Code