[Feature] Add API parent span lifecycle management (PR #6/9)#14
Merged
[Feature] Add API parent span lifecycle management (PR #6/9)#14
Conversation
Implements complete API-level span lifecycle with guaranteed cleanup on all termination paths. This PR creates parent spans that will be linked to core spans via context propagation in PR #7. Key implementation: - Added _create_api_span() with ARRIVED event emission - Added _finalize_api_span() single finalizer (DEPARTED/ABORTED/cleanup-only) - Added _pop_api_span_info() for explicit pop-at-start idempotence - Wrapped streaming/non-streaming generators with comprehensive error handling - Covered 12 termination paths (6 streaming, 6 non-streaming) Critical fixes applied (post-review): - Guaranteed span.end() via nested try/except/finally (prevents OTEL leaks) - Eliminated brittle test patterns using kwargs-only helper function - All 17 tests passing with robust event detection Design patterns: - Pop-at-start idempotence: only one caller succeeds in finalization - Cleanup-only fallback: outer finally with terminal_event=None - Re-raise pattern: CancelledError/GeneratorExit finalize then re-raise - Cleanup independence: span.end() not gated on is_recording() or flags Test coverage: 17 tests - Span creation (2) - Streaming termination (6) - Non-streaming termination (5) - Idempotence & leak prevention (2) - Cleanup independence (2) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Implements complete API-level span lifecycle with guaranteed cleanup on all termination paths. This is PR #6 in the 9-PR journey tracing implementation sequence.
This PR creates parent spans (
llm_request) at the API serving layer that will be linked to child spans (llm_core) in the scheduler via W3C Trace Context propagation (implemented in PR #7).What This PR Does
Core Functionality
_create_api_span()creates parent spans with ARRIVED event_finalize_api_span()handles DEPARTED/ABORTED/cleanup-only modes_pop_api_span_info()ensures only one caller succeeds in finalizationTermination Paths Covered (12 Total)
Streaming (6 paths):
Non-Streaming (6 paths):
Critical Design Patterns
1. Pop-At-Start Idempotence
This makes idempotence explicit and avoids any theoretical reentrancy window.
2. Nested Try/Except/Finally (Guaranteed span.end())
This prevents OTEL span leaks even if
set_status()oradd_event()throw.3. Re-raise Pattern
Exceptions that should propagate (CancelledError, GeneratorExit) first finalize with ABORTED, then re-raise. This preserves both observability and propagation semantics.
4. Cleanup Independence
span.end()NOT gated onis_recording()(always called if span exists)is_recording()5. Cleanup-Only Fallback
Outer finally calls
_finalize_api_span(request_id)with no terminal_event (cleanup-only mode). This is an idempotent no-op if already finalized, catching truly uncaught exceptions.Critical Fixes Applied (Post-Review)
Fix #1: Guaranteed span.end() (BLOCKER)
Problem: If
set_status()oradd_event()threw,span.end()at line 485 was never reached → OTEL span leak.Solution: Restructured with nested try/except/finally:
Fix #2: Eliminated Brittle Test Patterns (BLOCKER)
Problem: All tests used
"DEPARTED" in str(call)which depends on mock library's repr format.Solution:
find_events()helper using kwargs-only pattern:call.kwargs.get("name") == "api.DEPARTED"Fix #3: Simplified Partial Fix (QUALITY)
Problem: Mixed positional/keyword arg checking when implementation uses kwargs-only.
Solution:
call.kwargs.get("attributes")Test Coverage (17 Tests, All Passing)
Span Creation (2 tests)
test_api_span_created_when_tracing_enabled- Verify span created and ARRIVED eventtest_span_creation_gated_at_call_site- Verify tracing disabled when flag offStreaming Termination (6 tests)
test_streaming_success_path- DEPARTED before [DONE]test_streaming_departed_emitted_before_done- Verify orderingtest_streaming_cancelled_error- ABORTED + re-raisetest_streaming_generator_exit- ABORTED + re-raisetest_streaming_generic_exception- ABORTED + [DONE]test_full_generation_error_during_response_building- ABORTED with error attributesNon-Streaming Termination (5 tests)
test_full_success_path- DEPARTED on successtest_full_cancelled_error- ABORTED + returntest_full_validation_error- ABORTED(validation_error)test_full_generic_exception- ABORTED(exception)test_full_generator_exit- ABORTED + re-raiseIdempotence & Leak Prevention (2 tests)
test_finalizer_idempotence- Safe to call multiple timestest_no_span_leaks- No dict growth over 100 requestsCleanup Independence (2 tests)
test_cleanup_not_gated_on_is_recording- span.end() called even when is_recording()=Falsetest_cleanup_only_fallback- terminal_event=None mode worksSafety Guarantees
✅ span.end() ALWAYS called (even if set_status/add_event throw)
✅ All 12 termination paths finalize properly (no leaks)
✅ Idempotent via pop-at-start (safe to call multiple times)
✅ Tests use robust kwargs-only pattern (no brittle string matching)
✅ Dict cleanup NOT gated on flags (cleanup independence)
✅ All tests passing (17/17, 100% pass rate)
✅ Zero overhead when tracing disabled (early returns)
✅ Defensive error handling (tracing never breaks requests)
Files Changed
vllm/entrypoints/openai/engine/serving.py(+179 lines) - API span lifecycle methodsvllm/entrypoints/openai/chat_completion/serving.py(modified) - Generator error handlingtests/entrypoints/openai/test_api_span_lifecycle.py(+801 lines) - Comprehensive testsJOURNEY_TRACING_PR_PLAN.md(updated) - DocumentationDependencies
Review Notes
Response to ChatGPT Review
Both identified "merge blockers" have been resolved:
find_events()helperAdditional concerns addressed:
Testing
python -m pytest tests/entrypoints/openai/test_api_span_lifecycle.py -v --tb=short # Result: 17 passed, 2 warnings in 1.68sNext Steps
After merge:
🤖 Generated with Claude Code