Skip to content

tracing: coordinate Langfuse trace coverage across protoCLI + gateway (thinking, normalizer, upstream) #162

@mabry1985

Description

@mabry1985

Goal

Coordinate on Langfuse tracing between protoCLI (primary) and the LiteLLM gateway (homelab-iac) so we get a single, complete trace per turn that includes:

  • protoCLI's existing structure (turn → agent.execute → tool calls → gen_ai chat)
  • Model thinking traces (currently lost on streaming requests)
  • Gateway-side processing decisions (normalizer firing, salvage paths, retries, fallbacks)
  • Upstream model usage/timing as visible to the gateway

Today, only the first bullet is captured. This issue is to align on which sources write what and how they merge in Langfuse.

What's currently happening

protoCLI (this repo, packages/core/src/telemetry/sdk.ts):

  • Initializes OTel SDK with HttpInstrumentation and a Langfuse-specific BatchSpanProcessor that exports to ${LANGFUSE_BASE_URL}/api/public/otel/v1/traces
  • service.name: qwen-code on all spans
  • gen_ai chat ${model} span emitted from packages/core/src/core/openaiContentGenerator/pipeline.ts:569 with attributes for system, model, usage, plus a gen_ai.content.completion event
  • W3C traceparent header is auto-propagated on outbound HTTP via HttpInstrumentation (so gateway could read it)
  • pipeline.ts already extracts delta.reasoning_content and delta.reasoning from chunks (line 156-160) but only for debug logging — not added to the span

LiteLLM gateway (protoLabsAI/homelab-iac, stacks/ai/config/litellm/callbacks/thinking_normalizer.py):

  • Strips <think>...</think> markup from streaming delta.content before forwarding to clients (per protoLabsAI/homelab-iac#26)
  • Captures the stripped thinking text and writes to request_data["metadata"]["thinking"][<choice_idx>] (per protoLabsAI/homelab-iac#28)
  • Has success_callback: ["langfuse", "prometheus"] configured
  • But: I audited 500 most recent generations in Langfuse — 100% have service.name: qwen-code (protoCLI). Zero come from LiteLLM's langfuse callback. The gateway-written metadata.thinking never reaches any visible trace.

Why this is happening

The PR that added metadata.thinking was verified against a direct curl probe where LiteLLM's langfuse callback was the only writer — and it worked. But for actual protoCLI traffic, protoCLI's OTel SDK writes to Langfuse directly, bypassing whatever LiteLLM's callback does. The two paths don't merge.

What full coverage would look like

A single Langfuse trace per protoCLI turn containing:

turn (protoCLI)
├── agent.execute
│   ├── tool/<name>
│   ├── tool/<name>
│   └── gen_ai chat protolabs/smart   ← currently captured by protoCLI
│       ├── attributes:
│       │   ├── gen_ai.system, gen_ai.request.model, ...     [present]
│       │   ├── gen_ai.usage.{input,output,total}_tokens     [present]
│       │   └── gen_ai.response.thinking (e.g., 384 chars)    ← MISSING
│       ├── child span: gateway.normalize                    ← MISSING
│       │   ├── normalizer.thinking_captured (bool)
│       │   ├── normalizer.unclosed_think_salvaged (bool)
│       │   └── normalizer.duration_ms
│       └── child span: gateway.upstream.vllm                ← MISSING
│           └── upstream.duration_ms, upstream.first_token_ms

Three architecture options

Option A: Gateway returns thinking via response, protoCLI captures it as span attribute

  • Gateway adds a custom HTTP response header (e.g., x-llm-thinking) or appends a final SSE chunk with structured metadata
  • pipeline.ts's stream handler already has the stripped delta.reasoning_content / delta.reasoning available — extend it to also read the gateway-emitted thinking and set gen_ai.response.thinking as a span attribute
  • Single source (protoCLI), single trace, no merge needed

Pros: simplest from the trace-merging perspective; one writer.
Cons: custom response shape; loses gateway-internal observability (normalizer decisions, upstream timing).

Option B: Gateway emits its own OTel spans to the same Langfuse endpoint, joined via traceparent

  • protoCLI already auto-propagates traceparent via HttpInstrumentation
  • LiteLLM gateway adds OTel SDK + reads incoming traceparent, emits child spans under the same trace_id to Langfuse OTel endpoint
  • Spans: gateway.normalize (with thinking as attribute), gateway.upstream (with vLLM timing), etc.
  • Both protoCLI and gateway write to Langfuse — Langfuse merges by trace_id

Pros: clean parent-child structure; both sides own their data; no custom response shape.
Cons: gateway needs OTel instrumentation added (more code); needs LiteLLM's langfuse callback disabled to avoid double-writing.

Option C: Status quo + small protoCLI capture of available reasoning fields

  • Gateway keeps stripping markup but also passes through a small reasoning_content field on the final chunk (via the OpenAI-extension reasoning field convention)
  • pipeline.ts is updated to add gen_ai.response.reasoning span attribute when present
  • Doesn't capture gateway-internal decisions

Pros: smallest change, ships fast.
Cons: still single-source; loses gateway internals; works around current architecture rather than fixing it.

Proposed direction

Recommend Option B for completeness, with Option C as a stepping stone if Option B is too much scope.

Coordination needed:

Side Change
protoCLI Verify W3C traceparent propagation reaches the gateway (should be automatic via HttpInstrumentation; confirm with a debug request). Add span attributes for thinking when present (Option C step).
homelab-iac (gateway) Add OTel SDK to LiteLLM container, configure to export to same Langfuse OTel endpoint (Option B). Replace current metadata.thinking write with a gateway span attribute. Disable LiteLLM's langfuse callback to avoid duplicate writes.
Both Decide canonical attribute names: gen_ai.response.thinking? llm.reasoning_content? Pick one and document.

Happy to drive the gateway side. Looking for a protoCLI maintainer to:

  1. Confirm/document the current OTel setup (which fields are captured, sample trace, what's the canonical attribute schema)
  2. Discuss whether direction B is right or if there's a different roadmap protoCLI is already pursuing
  3. Pick attribute names so we don't drift

Related context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions