Skip to content

[Feature] Add API parent span lifecycle management (PR #6/9)#14

Merged
sriumcp merged 2 commits intomainfrom
pr6ofjourney
Jan 27, 2026
Merged

[Feature] Add API parent span lifecycle management (PR #6/9)#14
sriumcp merged 2 commits intomainfrom
pr6ofjourney

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 27, 2026

Overview

Implements complete API-level span lifecycle with guaranteed cleanup on all termination paths. This is PR #6 in the 9-PR journey tracing implementation sequence.

This PR creates parent spans (llm_request) at the API serving layer that will be linked to child spans (llm_core) in the scheduler via W3C Trace Context propagation (implemented in PR #7).

What This PR Does

Core Functionality

  • Span Creation: _create_api_span() creates parent spans with ARRIVED event
  • Span Finalization: _finalize_api_span() handles DEPARTED/ABORTED/cleanup-only modes
  • Idempotence Primitive: _pop_api_span_info() ensures only one caller succeeds in finalization
  • Comprehensive Error Handling: Wraps both streaming and non-streaming generators with try/except/finally

Termination Paths Covered (12 Total)

Streaming (6 paths):

  1. Success → DEPARTED before [DONE]
  2. GenerationError → ABORTED(generation_error) + [DONE] + return
  3. CancelledError → ABORTED(client_disconnect) + re-raise
  4. GeneratorExit → ABORTED(generator_exit) + re-raise
  5. Generic Exception → ABORTED(exception) + [DONE] + return
  6. Truly uncaught → outer finally cleanup-only fallback

Non-Streaming (6 paths):

  1. Success → DEPARTED
  2. CancelledError → ABORTED(client_disconnect) + return
  3. GeneratorExit → ABORTED(generator_exit) + re-raise
  4. ValueError → ABORTED(validation_error) + return
  5. Generic Exception → ABORTED(exception) + return
  6. Truly uncaught → outer finally cleanup-only fallback

Critical Design Patterns

1. Pop-At-Start Idempotence

def _finalize_api_span(self, request_id, terminal_event=None, ...):
    # Pop at method start - only one caller succeeds
    api_span_info = self._pop_api_span_info(request_id)
    if api_span_info is None:
        return  # Already finalized or never created

This makes idempotence explicit and avoids any theoretical reentrancy window.

2. Nested Try/Except/Finally (Guaranteed span.end())

try:
    # Step 1: Set status (best-effort, inner try/except)
    # Step 2: Emit event (best-effort, inner try/except)
finally:
    # Step 3: ALWAYS end span (guaranteed by outer finally)
    api_span.end(end_time=time.time_ns())

This prevents OTEL span leaks even if set_status() or add_event() throw.

3. Re-raise Pattern

Exceptions that should propagate (CancelledError, GeneratorExit) first finalize with ABORTED, then re-raise. This preserves both observability and propagation semantics.

4. Cleanup Independence

  • span.end() NOT gated on is_recording() (always called if span exists)
  • Dict cleanup NOT gated on feature flags (cleanup runs regardless)
  • Only event emission is gated on is_recording()

5. Cleanup-Only Fallback

Outer finally calls _finalize_api_span(request_id) with no terminal_event (cleanup-only mode). This is an idempotent no-op if already finalized, catching truly uncaught exceptions.

Critical Fixes Applied (Post-Review)

Fix #1: Guaranteed span.end() (BLOCKER)

Problem: If set_status() or add_event() threw, span.end() at line 485 was never reached → OTEL span leak.

Solution: Restructured with nested try/except/finally:

  • Inner try/except for best-effort operations (set_status, add_event)
  • Outer finally for guaranteed span.end()

Fix #2: Eliminated Brittle Test Patterns (BLOCKER)

Problem: All tests used "DEPARTED" in str(call) which depends on mock library's repr format.

Solution:

  • Added find_events() helper using kwargs-only pattern: call.kwargs.get("name") == "api.DEPARTED"
  • Replaced all 10 brittle instances with helper usage
  • Tests now robust to mock library changes and Python version updates

Fix #3: Simplified Partial Fix (QUALITY)

Problem: Mixed positional/keyword arg checking when implementation uses kwargs-only.

Solution:

  • Changed to pure kwargs: call.kwargs.get("attributes")
  • Reduced from 25 lines to 8 lines using helper
  • Added assertion for error message existence

Test Coverage (17 Tests, All Passing)

Span Creation (2 tests)

  • test_api_span_created_when_tracing_enabled - Verify span created and ARRIVED event
  • test_span_creation_gated_at_call_site - Verify tracing disabled when flag off

Streaming Termination (6 tests)

  • test_streaming_success_path - DEPARTED before [DONE]
  • test_streaming_departed_emitted_before_done - Verify ordering
  • test_streaming_cancelled_error - ABORTED + re-raise
  • test_streaming_generator_exit - ABORTED + re-raise
  • test_streaming_generic_exception - ABORTED + [DONE]
  • test_full_generation_error_during_response_building - ABORTED with error attributes

Non-Streaming Termination (5 tests)

  • test_full_success_path - DEPARTED on success
  • test_full_cancelled_error - ABORTED + return
  • test_full_validation_error - ABORTED(validation_error)
  • test_full_generic_exception - ABORTED(exception)
  • test_full_generator_exit - ABORTED + re-raise

Idempotence & Leak Prevention (2 tests)

  • test_finalizer_idempotence - Safe to call multiple times
  • test_no_span_leaks - No dict growth over 100 requests

Cleanup Independence (2 tests)

  • test_cleanup_not_gated_on_is_recording - span.end() called even when is_recording()=False
  • test_cleanup_only_fallback - terminal_event=None mode works

Safety Guarantees

span.end() ALWAYS called (even if set_status/add_event throw)
All 12 termination paths finalize properly (no leaks)
Idempotent via pop-at-start (safe to call multiple times)
Tests use robust kwargs-only pattern (no brittle string matching)
Dict cleanup NOT gated on flags (cleanup independence)
All tests passing (17/17, 100% pass rate)
Zero overhead when tracing disabled (early returns)
Defensive error handling (tracing never breaks requests)

Files Changed

  • vllm/entrypoints/openai/engine/serving.py (+179 lines) - API span lifecycle methods
  • vllm/entrypoints/openai/chat_completion/serving.py (modified) - Generator error handling
  • tests/entrypoints/openai/test_api_span_lifecycle.py (+801 lines) - Comprehensive tests
  • JOURNEY_TRACING_PR_PLAN.md (updated) - Documentation

Dependencies

Review Notes

Response to ChatGPT Review

Both identified "merge blockers" have been resolved:

  1. span.end() guarantee: ✅ Fixed with nested try/finally
  2. Test brittleness: ✅ Fixed with find_events() helper

Additional concerns addressed:

  • Parser error handlers already have explicit returns (lines 732, 758)
  • Production code already uses keyword-only style consistently
  • [DONE] after errors matches existing vLLM streaming behavior
  • Unused variables (arrival_time, first_response_time) noted but not blocking

Testing

python -m pytest tests/entrypoints/openai/test_api_span_lifecycle.py -v --tb=short
# Result: 17 passed, 2 warnings in 1.68s

Next Steps

After merge:


🤖 Generated with Claude Code

sriumcp and others added 2 commits January 27, 2026 11:03
Implements complete API-level span lifecycle with guaranteed cleanup on all
termination paths. This PR creates parent spans that will be linked to core
spans via context propagation in PR #7.

Key implementation:
- Added _create_api_span() with ARRIVED event emission
- Added _finalize_api_span() single finalizer (DEPARTED/ABORTED/cleanup-only)
- Added _pop_api_span_info() for explicit pop-at-start idempotence
- Wrapped streaming/non-streaming generators with comprehensive error handling
- Covered 12 termination paths (6 streaming, 6 non-streaming)

Critical fixes applied (post-review):
- Guaranteed span.end() via nested try/except/finally (prevents OTEL leaks)
- Eliminated brittle test patterns using kwargs-only helper function
- All 17 tests passing with robust event detection

Design patterns:
- Pop-at-start idempotence: only one caller succeeds in finalization
- Cleanup-only fallback: outer finally with terminal_event=None
- Re-raise pattern: CancelledError/GeneratorExit finalize then re-raise
- Cleanup independence: span.end() not gated on is_recording() or flags

Test coverage: 17 tests
- Span creation (2)
- Streaming termination (6)
- Non-streaming termination (5)
- Idempotence & leak prevention (2)
- Cleanup independence (2)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sriumcp sriumcp merged commit d01e6a0 into main Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant