Skip to content

[Bugfix] Remove legacy EngineCoreEvent system and restore Prometheus metrics#7

Merged
sriumcp merged 1 commit intomainfrom
removelegacy
Jan 26, 2026
Merged

[Bugfix] Remove legacy EngineCoreEvent system and restore Prometheus metrics#7
sriumcp merged 1 commit intomainfrom
removelegacy

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 26, 2026

Summary

This PR removes the legacy EngineCoreEvent system (from v0.0.1) which was used only for internal metrics tracking, while preserving the current RequestJourneyEvent system (v0.1.x) used for journey tracing and OTEL export. Additionally, it restores Prometheus metrics by extracting timestamps from journey events instead of the removed legacy events.

Changes

Removed Legacy Code (~100 lines)

  • vllm/v1/engine/__init__.py: Removed EngineCoreEvent and EngineCoreEventType classes
  • vllm/v1/request.py: Removed events field and record_event()/take_events() methods
  • vllm/v1/core/sched/scheduler.py: Removed 3 record_event() calls and event passing to output
  • vllm/v1/metrics/stats.py: Removed update_from_events() method and call
  • vllm/v1/engine/output_processor.py: Removed redundant time delta calculations from do_tracing()

Restored Prometheus Metrics

After removing EngineCoreEvent, Prometheus histogram metrics (queue_time, prefill_time, etc.) were broken because queued_ts and scheduled_ts were no longer populated. Fixed by:

  • Extracting queued_ts from first RequestJourneyEvent.QUEUED event (ts_monotonic field)
  • Extracting scheduled_ts from first RequestJourneyEvent.SCHEDULED event (ts_monotonic field)
  • Using 0.0 sentinel value to prevent overwrites (preemption-safe)
  • Time domain: Both use time.monotonic() (consistent with existing code)

Added Test Coverage

  • tests/v1/engine/test_output_processor.py: New test test_journey_events_populate_prometheus_timestamps
    • Verifies timestamps extracted correctly from journey events
    • Tests preemption safety (resume doesn't overwrite original timestamps)
    • Validates Prometheus delta calculations work correctly

Motivation

The EngineCoreEvent system was a simple 3-event system (QUEUED, SCHEDULED, PREEMPTED) used only for populating internal timestamps. The newer RequestJourneyEvent system is more comprehensive (6 event types with rich context) and is the sole path for OTEL export. Having both systems was redundant and added unnecessary complexity.

Impact

  • Journey tracing: ✅ Fully preserved - all OTEL export functionality intact
  • Prometheus metrics: ✅ Restored via journey event timestamps
  • Codebase: ✅ Simplified - removed ~100 lines of legacy code
  • Performance: ✅ No change - journey events already emitted

Testing

All tests pass:

  • ✅ New test: test_journey_events_populate_prometheus_timestamps (1/1)
  • ✅ Journey tracing integration: test_journey_tracing_integration.py (12/12)
  • ✅ Scheduler tests: test_scheduler.py (81/81)
  • ✅ Verified no orphaned references via grep

Time Domain Verification

Both internal timestamps use time.monotonic():

  • Journey events: ts_monotonic = time.monotonic()
  • Request stats: queued_ts, scheduled_ts, first_token_ts, last_token_ts all monotonic
  • Only arrival_time uses wall time (time.time()) for E2E calculations

Backwards Compatibility

  • ✅ Prometheus metrics histograms work unchanged
  • ✅ Journey tracing OTEL export unchanged
  • ✅ Scheduler behavior unchanged
  • ✅ No API changes

Files Changed

File Lines Change Type
vllm/v1/engine/__init__.py -28 Remove EngineCoreEvent classes
vllm/v1/request.py -16 Remove events field and methods
vllm/v1/core/sched/scheduler.py -13 Remove record_event calls
vllm/v1/metrics/stats.py -37 Remove update_from_events logic
vllm/v1/engine/output_processor.py +18, -31 Add journey timestamp extraction, remove time deltas
tests/v1/engine/test_output_processor.py +184 Add comprehensive test

Total: +200 insertions, -127 deletions

Review Focus Areas

  1. Timestamp extraction logic (output_processor.py:547-564): Verify sentinel pattern and preemption safety
  2. Test coverage (test_output_processor.py:1135-1294): Verify test scenarios are comprehensive
  3. No regressions: Confirm journey tracing OTEL export still works (verified by integration tests)

…metrics

Remove the legacy EngineCoreEvent system (v0.0.1) which was used only for
internal metrics tracking, while preserving the current RequestJourneyEvent
system (v0.1.x) used for journey tracing and OTEL export.

Changes:
- Remove EngineCoreEvent and EngineCoreEventType classes
- Remove events field and record_event/take_events methods from Request
- Remove update_from_events() logic from IterationStats
- Remove redundant time delta calculations from do_tracing()

Fix Prometheus metrics by extracting timestamps from RequestJourneyEvent:
- Extract queued_ts from first QUEUED event (ts_monotonic)
- Extract scheduled_ts from first SCHEDULED event (ts_monotonic)
- Use 0.0 sentinel to prevent overwrites (preemption-safe)
- Maintains backward compatibility with existing Prometheus histograms

Journey tracing OTEL export remains fully functional via do_tracing().

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sriumcp sriumcp merged commit 23b8d6b into main Jan 26, 2026
sriumcp added a commit that referenced this pull request Jan 27, 2026
Implements complete API-level span lifecycle with guaranteed cleanup on all
termination paths. This PR creates parent spans that will be linked to core
spans via context propagation in PR #7.

Key implementation:
- Added _create_api_span() with ARRIVED event emission
- Added _finalize_api_span() single finalizer (DEPARTED/ABORTED/cleanup-only)
- Added _pop_api_span_info() for explicit pop-at-start idempotence
- Wrapped streaming/non-streaming generators with comprehensive error handling
- Covered 12 termination paths (6 streaming, 6 non-streaming)

Critical fixes applied (post-review):
- Guaranteed span.end() via nested try/except/finally (prevents OTEL leaks)
- Eliminated brittle test patterns using kwargs-only helper function
- All 17 tests passing with robust event detection

Design patterns:
- Pop-at-start idempotence: only one caller succeeds in finalization
- Cleanup-only fallback: outer finally with terminal_event=None
- Re-raise pattern: CancelledError/GeneratorExit finalize then re-raise
- Cleanup independence: span.end() not gated on is_recording() or flags

Test coverage: 17 tests
- Span creation (2)
- Streaming termination (6)
- Non-streaming termination (5)
- Idempotence & leak prevention (2)
- Cleanup independence (2)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 27, 2026
* [Feature] Add API parent span lifecycle management (PR #6/9)

Implements complete API-level span lifecycle with guaranteed cleanup on all
termination paths. This PR creates parent spans that will be linked to core
spans via context propagation in PR #7.

Key implementation:
- Added _create_api_span() with ARRIVED event emission
- Added _finalize_api_span() single finalizer (DEPARTED/ABORTED/cleanup-only)
- Added _pop_api_span_info() for explicit pop-at-start idempotence
- Wrapped streaming/non-streaming generators with comprehensive error handling
- Covered 12 termination paths (6 streaming, 6 non-streaming)

Critical fixes applied (post-review):
- Guaranteed span.end() via nested try/except/finally (prevents OTEL leaks)
- Eliminated brittle test patterns using kwargs-only helper function
- All 17 tests passing with robust event detection

Design patterns:
- Pop-at-start idempotence: only one caller succeeds in finalization
- Cleanup-only fallback: outer finally with terminal_event=None
- Re-raise pattern: CancelledError/GeneratorExit finalize then re-raise
- Cleanup independence: span.end() not gated on is_recording() or flags

Test coverage: 17 tests
- Span creation (2)
- Streaming termination (6)
- Non-streaming termination (5)
- Idempotence & leak prevention (2)
- Cleanup independence (2)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Docs] Add PR #14 reference to journey tracing plan

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 27, 2026
…/9)

This PR implements W3C Trace Context propagation from API spans to core spans,
enabling parent-child linkage in distributed traces. Completes the handshake
between PR #6 (API span lifecycle) and PR #2 (core span lifecycle).

Changes:
- Add inject_trace_context() helper to vllm/tracing.py
- Inject API span context into trace_headers after span creation
- Context flows to engine.generate() and scheduler for parent-child linkage
- Defensive error handling: injection failures never break requests
- Zero overhead when tracing disabled (early return)

Behavioral guarantees verified by tests:
- G1: Trace ID continuity (API and core spans share same trace_id)
- G2: W3C Trace Context format (traceparent header valid)
- G3: Trace continuation (trace_id preserved through Client→API→Core)
- G4: Graceful degradation (request continues on injection failure)
- G5: No exception propagation (injection failures caught)
- G6: Conditional injection (only when API span exists)

Invariants:
- I1: Backward compatibility (early return when tracing disabled)
- I2: Zero overhead when disabled (no propagator/allocation access)
- I3: No resource leaks (only modifies existing trace_headers dict)

Test coverage:
- 12 new tests (100% pass) covering all unit-testable properties
- 17 existing API span lifecycle tests pass (no regressions)
- Tests focus on behavioral properties, not implementation details

Safety properties:
- Zero new resources (only modifies existing dict)
- No cleanup obligations (dict managed by request lifecycle)
- Stateless transformation (span context → headers)
- Single injection point (strict ordering preserved)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 27, 2026
Updates to reflect PR #7 completion:
- PR sequence table: Mark #7 as COMPLETED with 12 tests
- Dependency chain: Mark #6 and #7 as COMPLETED
- PR #7 section: Add completion status with commit hashes
- Document deliverables: inject_trace_context(), tests, guarantees

Remaining: PRs #8 (API events), #9 (remove buffering)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 27, 2026
…/9) (#15)

* [Feature] Add API↔Engine context propagation for journey tracing (PR #7/9)

This PR implements W3C Trace Context propagation from API spans to core spans,
enabling parent-child linkage in distributed traces. Completes the handshake
between PR #6 (API span lifecycle) and PR #2 (core span lifecycle).

Changes:
- Add inject_trace_context() helper to vllm/tracing.py
- Inject API span context into trace_headers after span creation
- Context flows to engine.generate() and scheduler for parent-child linkage
- Defensive error handling: injection failures never break requests
- Zero overhead when tracing disabled (early return)

Behavioral guarantees verified by tests:
- G1: Trace ID continuity (API and core spans share same trace_id)
- G2: W3C Trace Context format (traceparent header valid)
- G3: Trace continuation (trace_id preserved through Client→API→Core)
- G4: Graceful degradation (request continues on injection failure)
- G5: No exception propagation (injection failures caught)
- G6: Conditional injection (only when API span exists)

Invariants:
- I1: Backward compatibility (early return when tracing disabled)
- I2: Zero overhead when disabled (no propagator/allocation access)
- I3: No resource leaks (only modifies existing trace_headers dict)

Test coverage:
- 12 new tests (100% pass) covering all unit-testable properties
- 17 existing API span lifecycle tests pass (no regressions)
- Tests focus on behavioral properties, not implementation details

Safety properties:
- Zero new resources (only modifies existing dict)
- No cleanup obligations (dict managed by request lifecycle)
- Stateless transformation (span context → headers)
- Single injection point (strict ordering preserved)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Polish] Improve inject_trace_context docstring and strengthen test

Two quality improvements following code review:

1. Clarify inject_trace_context() docstring:
   - Previous: "or None if injection failed" (misleading)
   - Now: Explicitly documents when carrier is returned unchanged
   - Details all three early-return paths (OTEL unavailable, span None, exception)

2. Strengthen test_trace_id_preserved_through_chain():
   - Mock propagator now actually reads span.get_span_context()
   - Extracts trace_id and span_id from span context
   - Generates traceparent using those values (simulates real OTEL behavior)
   - Asserts get_span_context() was called
   - Better proves G1/G3 guarantees without requiring real OTLP exporter

Test results: All 29 tests pass (12 context propagation + 17 lifecycle)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Docs] Mark PR #7 as completed in journey tracing plan

Updates to reflect PR #7 completion:
- PR sequence table: Mark #7 as COMPLETED with 12 tests
- Dependency chain: Mark #6 and #7 as COMPLETED
- PR #7 section: Add completion status with commit hashes
- Document deliverables: inject_trace_context(), tests, guarantees

Remaining: PRs #8 (API events), #9 (remove buffering)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* removing PR7_summary

Signed-off-by: Srinivasan Parthasarathy <spartha@us.ibm.com>

---------

Signed-off-by: Srinivasan Parthasarathy <spartha@us.ibm.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant