feat(telemetry): propagate W3C traceparent + X-Qwen-Code-Session-Id to LLM service calls

## What would you like to be added?

Two coupled HTTP-header propagations on every outbound LLM service request:

1. **W3C `traceparent`** — so qwen-code traces continue across the process boundary into upstream model services (DashScope / OpenAI / Gemini / etc.).
2. **`X-Qwen-Code-Session-Id`** — so server-side ingestion can correlate a qwen-code session with its observed LLM requests, even when the trace backend isn't co-located with the metric/log backend.

Both are coupled because they're injected at the same code site (the SDK construction / fetch wrapper layer) and both are about cross-process correlation. Implementing them together avoids duplicate plumbing.

This sub-issue covers the P3 line *"deeper observability"* in #3731. Separate from #4365 (resource attributes, merged via #4367), which only addressed *attributes on emitted telemetry* — not on outbound requests.

## Why is this needed?

### Today: zero outbound propagation

`packages/core/src/telemetry/sdk.ts:330` only registers `HttpInstrumentation`:

```ts
instrumentations: [new HttpInstrumentation()],
```

`HttpInstrumentation` patches Node's built-in `http`/`https` modules, not the modern `fetch` / undici path. Both LLM SDKs in qwen-code use `globalThis.fetch`:

| SDK | HTTP layer | Covered by `HttpInstrumentation`? |
|---|---|---|
| `openai@5.11.0` (DashScope / OpenAI / DeepSeek / GLM / Kimi providers) | `globalThis.fetch` — see `node_modules/openai/internal/shims.mjs` | ❌ |
| `@google/genai@1.30.0` | `globalThis.fetch` + `new Headers()` — see `dist/node/index.mjs` | ❌ |

Also: `grep -rn "propagation\.\|setGlobalPropagator\|traceparent\|defaultHeaders.*session" packages/core/src --include="*.ts"` → empty. There's no manual propagation and no session-id header injection anywhere.

Consequences:

- The `api.generateContent` span is created locally but the outgoing HTTP request carries no `traceparent`.
- The session id (already on every span/log per #4367) is **not** sent to the LLM service, so cross-system correlation requires the operator to set up a custom relationship between trace ids and session ids.
- If the LLM service is itself OTel-instrumented (e.g. ARMS Tracing serving DashScope), its trace cannot link to qwen-code's trace.
- Even without server-side tracing, qwen-code's local trace is missing a precise client-side HTTP span — today you only see total `api.generateContent` duration; with undici instrumentation you'd also see network TTFB, body size, status, retry attempts as their own span.

### Reference: how Claude Code does it

| Header | Claude Code mechanism | Source basis |
|---|---|---|
| `X-Claude-Code-Session-Id` (+ `x-app`, `x-client-app`, etc.) | Constant in `defaultHeaders` passed to `new Anthropic({ defaultHeaders: {...} })` | ✅ Verified directly in open source: `src/services/api/client.ts:108` |
| `x-client-request-id` | Per-request `randomUUID()` injected by a custom `buildFetch()` wrapper | ✅ Verified directly in open source: `src/services/api/client.ts:370-390` |
| `traceparent` (+ optional `tracestate`) on outbound LLM calls | Documented behavior at docs.claude.com/docs/en/monitoring-usage.md ("Traces" section) — only sent to the first-party Anthropic API endpoint when tracing is enabled | ⚠️ Implementation NOT in the open-source repo (`grep -rn "propagation\.inject\|traceparent\|UndiciInstrumentation" claude-code/src` returns nothing). Almost certainly in a closed-source build. |

Notes from claude-code's verified pattern that are worth borrowing:

- **Session id is a custom HTTP header, not W3C Baggage.** Backends surface headers natively; Baggage requires special collector wiring. Claude Code chose the pragmatic path.
- **Session id is product-namespaced.** `X-Claude-Code-Session-Id`, not generic `X-Session-Id`, to avoid collision with arbitrary third-party tools.
- **Session id uses the SDK's standard `defaultHeaders` option**, not a custom fetch wrapper.

For `traceparent` we follow the OTel ecosystem's canonical answer (`@opentelemetry/instrumentation-undici`) rather than try to reverse-engineer claude-code's closed-source path. Either approach is correct; the undici instrumentation gives us a free client span as a bonus.

## Suggested implementation

### Part 1 — W3C `traceparent` via `@opentelemetry/instrumentation-undici`

```ts
// packages/core/src/telemetry/sdk.ts
import { UndiciInstrumentation } from '@opentelemetry/instrumentation-undici';
...
instrumentations: [
  new HttpInstrumentation(),
  new UndiciInstrumentation({
    // Avoid feedback loop: don't trace requests we make to the OTLP exporter itself.
    ignoreRequestHook: (request) => {
      const url = request.origin + request.path;
      const otlp = [
        config.getTelemetryOtlpEndpoint(),
        config.getTelemetryOtlpTracesEndpoint(),
        config.getTelemetryOtlpLogsEndpoint(),
        config.getTelemetryOtlpMetricsEndpoint(),
      ].filter(Boolean);
      return otlp.some((e) => url.startsWith(e));
    },
  }),
],
```

`UndiciInstrumentation` automatically:
- Creates one client span per `fetch()` (giving you network TTFB / TTLB / status / retry visibility)
- Injects `traceparent` + (if populated) `tracestate` into outgoing headers via the SDK's default `W3CTraceContextPropagator`

Dependency add:
```bash
npm install @opentelemetry/instrumentation-undici --workspace=packages/core
```

(~30 KB, official OTel package, production-grade.)

### Part 2 — `X-Qwen-Code-Session-Id` via SDK `defaultHeaders`

Per claude-code's pattern (`src/services/api/client.ts:108`), set the header at SDK construction time using the SDK's native `defaultHeaders` option.

**OpenAI-family providers** (`packages/core/src/core/openaiContentGenerator/provider/default.ts:91`, plus `dashscope.ts`, `deepseek.ts`, etc.):

```ts
return new OpenAI({
  apiKey,
  baseURL,
  defaultHeaders: {
    ...existingHeaders,
    'X-Qwen-Code-Session-Id': config.getSessionId(),
    // optional extra correlation matching claude-code:
    // 'X-Qwen-Code-App': 'cli',
  },
});
```

**Google Gemini provider**: `@google/genai` accepts an `httpOptions: { headers: {...} }` at `new GoogleGenAI({ ... })` — wire equivalently.

Either inject directly at each provider site (3-4 call sites), or pull into a small helper:

```ts
// packages/core/src/telemetry/llm-headers.ts
export function llmCorrelationHeaders(config: Config): Record<string, string> {
  return {
    'X-Qwen-Code-Session-Id': config.getSessionId(),
  };
}
```

### Header summary (what's added on the wire)

Before this issue:
```http
POST /compatible-mode/v1/chat/completions HTTP/1.1
Authorization: Bearer sk-...
Content-Type: application/json
User-Agent: openai/5.11.0 ...
```

After:
```http
POST /compatible-mode/v1/chat/completions HTTP/1.1
Authorization: Bearer sk-...
Content-Type: application/json
User-Agent: openai/5.11.0 ...
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
X-Qwen-Code-Session-Id: 12345678-90ab-cdef-1234-567890abcdef
```

2 new headers, ~120 bytes total, fixed per request.

## Why this is low-cost

| Dimension | Cost |
|---|---|
| Code diff | ~10 lines in `sdk.ts`, ~3 lines per provider constructor (~12 lines), 1 tiny helper |
| Dependency | 1 official OTel package (~30 KB) |
| Per-request CPU | ~10 μs span creation + header injection |
| Per-request bytes on wire | +~120 B for both headers — negligible vs LLM payload |
| Per-request span volume | +1 client span per fetch — useful, not noise |
| Architectural risk | Low — both patterns are standard (OTel official instrumentation + SDK-native `defaultHeaders`) |

## Out of scope (future sub-issues)

These match claude-code's documented behavior but are different code paths and should be separate sub-issues:

- **Subprocess `TRACEPARENT` env var inheritance** — claude-code injects `TRACEPARENT` into Bash/PowerShell child processes so external tools running under a tool span can continue the trace. qwen-code's `BashTool` does not do this today.
- **Inbound `TRACEPARENT` / `TRACESTATE` read on startup** — claude-code's `-p` mode and Agent SDK read `TRACEPARENT` from their own env on startup so a parent process can stitch them into a larger trace. qwen-code's `--prompt` mode does not.
- **`tracestate` / `baggage` propagation policy** — when/whether to populate W3C baggage. Standard SDK already handles it if anyone calls `propagation.setBaggage()`; we don't today.

## Acceptance criteria

- [ ] `@opentelemetry/instrumentation-undici` added to `packages/core/package.json`
- [ ] `sdk.ts` registers both `HttpInstrumentation` and `UndiciInstrumentation`
- [ ] `ignoreRequestHook` skips configured OTLP exporter endpoints (no telemetry feedback loop)
- [ ] All outgoing requests from `openai` SDK constructors carry `traceparent` (auto via undici) AND `X-Qwen-Code-Session-Id` (via `defaultHeaders`)
- [ ] Same for `@google/genai` constructor path
- [ ] Streaming requests (chat completions with `stream: true`) work without regression — no truncated streams, no leaked client spans
- [ ] Proxy mode (`setGlobalDispatcher(new ProxyAgent(...))`) still works — undici instrumentation cooperates with the proxy dispatcher
- [ ] Unit test: spies on `fetch()` and asserts both headers are present on a sample call
- [ ] Unit test: asserts the OTLP exporter URL is NOT traced when `ignoreRequestHook` is in effect (regression guard for the feedback-loop fix)
- [ ] E2E verification (extends the existing tmux harness or `/tmp/verify-telemetry-pr-4367.mjs` pattern): real OpenAI provider request with `--telemetry-outfile`, inspect outfile for a `HTTP POST` client span, dump the captured request headers, confirm both new headers present
- [ ] Docs: short paragraph in `docs/developers/development/telemetry.md` under a new "Trace context propagation" subsection — explain both headers + the OTLP-loop skip + claude-code parity

## Design doc

Full design: [`docs/design/telemetry-outbound-propagation-design.md`](https://github.com/QwenLM/qwen-code/blob/main/docs/design/telemetry-outbound-propagation-design.md) (will land in the PR — covers layered architecture, all 4 SDK construction integration points with file:line refs, `customHeaders` precedence policy, OTLP loop avoidance edge cases including auth-token-in-URL, streaming/proxy/retry verification, PR split, and the verified-vs-documented breakdown of the claude-code reference).


SDK	HTTP layer	Covered by `HttpInstrumentation`?
`openai@5.11.0` (DashScope / OpenAI / DeepSeek / GLM / Kimi providers)	`globalThis.fetch` — see `node_modules/openai/internal/shims.mjs`	❌
`@google/genai@1.30.0`	`globalThis.fetch` + `new Headers()` — see `dist/node/index.mjs`	❌

Header	Claude Code mechanism	Source basis
`X-Claude-Code-Session-Id` (+ `x-app`, `x-client-app`, etc.)	Constant in `defaultHeaders` passed to `new Anthropic({ defaultHeaders: {...} })`	✅ Verified directly in open source: `src/services/api/client.ts:108`
`x-client-request-id`	Per-request `randomUUID()` injected by a custom `buildFetch()` wrapper	✅ Verified directly in open source: `src/services/api/client.ts:370-390`
`traceparent` (+ optional `tracestate`) on outbound LLM calls	Documented behavior at docs.claude.com/docs/en/monitoring-usage.md ("Traces" section) — only sent to the first-party Anthropic API endpoint when tracing is enabled	⚠️ Implementation NOT in the open-source repo (`grep -rn "propagation\.inject\|traceparent\|UndiciInstrumentation" claude-code/src` returns nothing). Almost certainly in a closed-source build.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(telemetry): propagate W3C traceparent + X-Qwen-Code-Session-Id to LLM service calls #4384

What would you like to be added?

Why is this needed?

Today: zero outbound propagation

Reference: how Claude Code does it

Suggested implementation

Part 1 — W3C `traceparent` via `@opentelemetry/instrumentation-undici`

Part 2 — `X-Qwen-Code-Session-Id` via SDK `defaultHeaders`

Header summary (what's added on the wire)

Why this is low-cost

Out of scope (future sub-issues)

Acceptance criteria

Design doc

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Dimension	Cost
Code diff	~10 lines in `sdk.ts`, ~3 lines per provider constructor (~12 lines), 1 tiny helper
Dependency	1 official OTel package (~30 KB)
Per-request CPU	~10 μs span creation + header injection
Per-request bytes on wire	+~120 B for both headers — negligible vs LLM payload
Per-request span volume	+1 client span per fetch — useful, not noise
Architectural risk	Low — both patterns are standard (OTel official instrumentation + SDK-native `defaultHeaders`)

feat(telemetry): propagate W3C traceparent + X-Qwen-Code-Session-Id to LLM service calls #4384

Description

What would you like to be added?

Why is this needed?

Today: zero outbound propagation

Reference: how Claude Code does it

Suggested implementation

Part 1 — W3C traceparent via @opentelemetry/instrumentation-undici

Part 2 — X-Qwen-Code-Session-Id via SDK defaultHeaders

Header summary (what's added on the wire)

Why this is low-cost

Out of scope (future sub-issues)

Acceptance criteria

Design doc

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Part 1 — W3C `traceparent` via `@opentelemetry/instrumentation-undici`

Part 2 — `X-Qwen-Code-Session-Id` via SDK `defaultHeaders`