[Feature] Add API↔Engine context propagation for journey tracing (PR #7/9)#15
Merged
[Feature] Add API↔Engine context propagation for journey tracing (PR #7/9)#15
Conversation
…/9) This PR implements W3C Trace Context propagation from API spans to core spans, enabling parent-child linkage in distributed traces. Completes the handshake between PR #6 (API span lifecycle) and PR #2 (core span lifecycle). Changes: - Add inject_trace_context() helper to vllm/tracing.py - Inject API span context into trace_headers after span creation - Context flows to engine.generate() and scheduler for parent-child linkage - Defensive error handling: injection failures never break requests - Zero overhead when tracing disabled (early return) Behavioral guarantees verified by tests: - G1: Trace ID continuity (API and core spans share same trace_id) - G2: W3C Trace Context format (traceparent header valid) - G3: Trace continuation (trace_id preserved through Client→API→Core) - G4: Graceful degradation (request continues on injection failure) - G5: No exception propagation (injection failures caught) - G6: Conditional injection (only when API span exists) Invariants: - I1: Backward compatibility (early return when tracing disabled) - I2: Zero overhead when disabled (no propagator/allocation access) - I3: No resource leaks (only modifies existing trace_headers dict) Test coverage: - 12 new tests (100% pass) covering all unit-testable properties - 17 existing API span lifecycle tests pass (no regressions) - Tests focus on behavioral properties, not implementation details Safety properties: - Zero new resources (only modifies existing dict) - No cleanup obligations (dict managed by request lifecycle) - Stateless transformation (span context → headers) - Single injection point (strict ordering preserved) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Two quality improvements following code review: 1. Clarify inject_trace_context() docstring: - Previous: "or None if injection failed" (misleading) - Now: Explicitly documents when carrier is returned unchanged - Details all three early-return paths (OTEL unavailable, span None, exception) 2. Strengthen test_trace_id_preserved_through_chain(): - Mock propagator now actually reads span.get_span_context() - Extracts trace_id and span_id from span context - Generates traceparent using those values (simulates real OTEL behavior) - Asserts get_span_context() was called - Better proves G1/G3 guarantees without requiring real OTLP exporter Test results: All 29 tests pass (12 context propagation + 17 lifecycle) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updates to reflect PR #7 completion: - PR sequence table: Mark #7 as COMPLETED with 12 tests - Dependency chain: Mark #6 and #7 as COMPLETED - PR #7 section: Add completion status with commit hashes - Document deliverables: inject_trace_context(), tests, guarantees Remaining: PRs #8 (API events), #9 (remove buffering) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Srinivasan Parthasarathy <spartha@us.ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements W3C Trace Context propagation from API spans to core spans, enabling parent-child linkage in distributed traces. This completes the handshake between PR #6 (API span lifecycle) and PR #2 (core span lifecycle).
Part of journey tracing series: PRs #0-#7 completed, #8-#9 remaining
What Changed
Core Implementation
Added
inject_trace_context()helper (vllm/tracing.py, ~30 lines):extract_trace_context()for symmetric APIAdded context injection in API layer (
chat_completion/serving.py, ~16 lines):engine.generate()call (critical ordering preserved)trace_headersflows to both beam_search and engine.generate pathsTest Coverage
New test file:
tests/entrypoints/openai/test_context_propagation.py(~430 lines)12 tests, all passing:
Strengthened test: Mock propagator actually reads
span.get_span_context()to prove span context usage.Behavioral Guarantees Verified
All unit-testable guarantees from the approved plan:
Invariants:
Test Results
Why This Is Safe
No Lifecycle Risk
trace_headersdict)Defensive Error Handling
Performance
Ordering Safety
Edge Cases Handled
Polish Fixes Applied
Following code review:
span.get_span_context()to generate traceparent (proves G1/G3 without OTLP)Files Modified
vllm/tracing.pyvllm/entrypoints/openai/chat_completion/serving.pytests/entrypoints/openai/test_context_propagation.pyJOURNEY_TRACING_PR_PLAN.mdTotal: ~489 lines added
Compliance with Approved Plan
✅ All scope constraints met (no new resources)
✅ All hard constraints satisfied (ordering, semantics, defensive behavior)
✅ All testing requirements fulfilled (behavioral properties A-E)
✅ Zero regressions (all existing tests pass)
✅ Follows vLLM coding conventions
Next Steps
Related PRs
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com