Skip to content

[Feature] Add API↔Engine context propagation for journey tracing (PR #7/9)#15

Merged
sriumcp merged 4 commits intomainfrom
pr7ofjourney
Jan 27, 2026
Merged

[Feature] Add API↔Engine context propagation for journey tracing (PR #7/9)#15
sriumcp merged 4 commits intomainfrom
pr7ofjourney

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 27, 2026

Summary

This PR implements W3C Trace Context propagation from API spans to core spans, enabling parent-child linkage in distributed traces. This completes the handshake between PR #6 (API span lifecycle) and PR #2 (core span lifecycle).

Part of journey tracing series: PRs #0-#7 completed, #8-#9 remaining

What Changed

Core Implementation

Added inject_trace_context() helper (vllm/tracing.py, ~30 lines):

  • Injects span context into carrier dict using W3C Trace Context propagator
  • Mirrors existing extract_trace_context() for symmetric API
  • Defensive error handling (returns carrier on failure, no exceptions)
  • Early return when OTEL unavailable (zero overhead)

Added context injection in API layer (chat_completion/serving.py, ~16 lines):

  • Injection occurs immediately after API span creation succeeds
  • Before engine.generate() call (critical ordering preserved)
  • Wrapped in try-except with DEBUG logging
  • Modified trace_headers flows to both beam_search and engine.generate paths

Test Coverage

New test file: tests/entrypoints/openai/test_context_propagation.py (~430 lines)

12 tests, all passing:

  • ✅ Basic injection with None carrier
  • ✅ Injection with existing carrier (preserves headers)
  • ✅ Early return when span is None
  • ✅ Early return when OTEL unavailable
  • ✅ Graceful failure handling
  • ✅ W3C traceparent format validity
  • ✅ Existing headers preserved during injection
  • ✅ Conditional injection (only when span exists)
  • ✅ Tracing disabled behavior (backward compatibility)
  • ✅ Integration point verification
  • ✅ Graceful failure (request continues)
  • ✅ Trace ID continuity through Client→API→Core chain

Strengthened test: Mock propagator actually reads span.get_span_context() to prove span context usage.

Behavioral Guarantees Verified

All unit-testable guarantees from the approved plan:

  • G1: Trace ID Continuity ✅ - API and core spans share same trace_id
  • G2: W3C Format ✅ - traceparent header present with valid format
  • G3: Trace Continuation ✅ - trace_id preserved (not replaced)
  • G4: Graceful Degradation ✅ - Request continues on injection failure
  • G5: No Exception Propagation ✅ - Injection failures never break requests
  • G6: Conditional Injection ✅ - Only when API span exists

Invariants:

  • I1: Backward Compatibility ✅ - Early return when tracing disabled
  • I2: Zero Overhead ✅ - No allocations when disabled
  • I3: No Resource Leaks ✅ - Only modifies existing dict

Test Results

✅ 12/12 new tests pass (test_context_propagation.py)
✅ 17/17 existing API span tests pass (test_api_span_lifecycle.py)
✅ 8/8 API span tracking tests pass (test_api_span_tracking.py)
✅ Total: 37/37 tests pass (100%)

Why This Is Safe

No Lifecycle Risk

  • ✅ Zero new resources (only modifies existing trace_headers dict)
  • ✅ No cleanup obligations (dict managed by request lifecycle)
  • ✅ Stateless transformation (span context → headers)

Defensive Error Handling

  • ✅ All OTEL operations wrapped in try-except
  • ✅ Failures logged at DEBUG level only
  • ✅ Request processing never interrupted

Performance

  • ✅ Early return when tracing disabled (before propagator instantiation)
  • ✅ No allocations when disabled
  • ✅ Single injection point (no redundant operations)

Ordering Safety

  • ✅ Strict sequence preserved: span creation → inject → engine call
  • ✅ Single-threaded request processing (no race conditions)

Edge Cases Handled

  1. Injection failure: Request continues, core span becomes root (no linkage)
  2. Span is None: Early return, no injection attempted
  3. OTEL unavailable: Early return, headers preserved
  4. Tracing disabled: Early return, zero overhead
  5. Incoming client traceparent: trace_id preserved through chain

Polish Fixes Applied

Following code review:

  1. Clarified docstring: Documents when carrier is returned unchanged (not "returns None on failure")
  2. Strengthened test: Propagator now reads span.get_span_context() to generate traceparent (proves G1/G3 without OTLP)

Files Modified

File Lines Purpose
vllm/tracing.py +30 inject_trace_context() helper
vllm/entrypoints/openai/chat_completion/serving.py +16 Context injection call
tests/entrypoints/openai/test_context_propagation.py +430 Comprehensive test coverage
JOURNEY_TRACING_PR_PLAN.md +13/-4 Mark PR #7 as completed

Total: ~489 lines added

Compliance with Approved Plan

✅ All scope constraints met (no new resources)
✅ All hard constraints satisfied (ordering, semantics, defensive behavior)
✅ All testing requirements fulfilled (behavioral properties A-E)
✅ Zero regressions (all existing tests pass)
✅ Follows vLLM coding conventions

Next Steps

Related PRs


Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

sriumcp and others added 4 commits January 27, 2026 14:06
…/9)

This PR implements W3C Trace Context propagation from API spans to core spans,
enabling parent-child linkage in distributed traces. Completes the handshake
between PR #6 (API span lifecycle) and PR #2 (core span lifecycle).

Changes:
- Add inject_trace_context() helper to vllm/tracing.py
- Inject API span context into trace_headers after span creation
- Context flows to engine.generate() and scheduler for parent-child linkage
- Defensive error handling: injection failures never break requests
- Zero overhead when tracing disabled (early return)

Behavioral guarantees verified by tests:
- G1: Trace ID continuity (API and core spans share same trace_id)
- G2: W3C Trace Context format (traceparent header valid)
- G3: Trace continuation (trace_id preserved through Client→API→Core)
- G4: Graceful degradation (request continues on injection failure)
- G5: No exception propagation (injection failures caught)
- G6: Conditional injection (only when API span exists)

Invariants:
- I1: Backward compatibility (early return when tracing disabled)
- I2: Zero overhead when disabled (no propagator/allocation access)
- I3: No resource leaks (only modifies existing trace_headers dict)

Test coverage:
- 12 new tests (100% pass) covering all unit-testable properties
- 17 existing API span lifecycle tests pass (no regressions)
- Tests focus on behavioral properties, not implementation details

Safety properties:
- Zero new resources (only modifies existing dict)
- No cleanup obligations (dict managed by request lifecycle)
- Stateless transformation (span context → headers)
- Single injection point (strict ordering preserved)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Two quality improvements following code review:

1. Clarify inject_trace_context() docstring:
   - Previous: "or None if injection failed" (misleading)
   - Now: Explicitly documents when carrier is returned unchanged
   - Details all three early-return paths (OTEL unavailable, span None, exception)

2. Strengthen test_trace_id_preserved_through_chain():
   - Mock propagator now actually reads span.get_span_context()
   - Extracts trace_id and span_id from span context
   - Generates traceparent using those values (simulates real OTEL behavior)
   - Asserts get_span_context() was called
   - Better proves G1/G3 guarantees without requiring real OTLP exporter

Test results: All 29 tests pass (12 context propagation + 17 lifecycle)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updates to reflect PR #7 completion:
- PR sequence table: Mark #7 as COMPLETED with 12 tests
- Dependency chain: Mark #6 and #7 as COMPLETED
- PR #7 section: Add completion status with commit hashes
- Document deliverables: inject_trace_context(), tests, guarantees

Remaining: PRs #8 (API events), #9 (remove buffering)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Srinivasan Parthasarathy <spartha@us.ibm.com>
@sriumcp sriumcp merged commit c2540af into main Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant