Skip to content

[Feature] Remove journey event buffering (PR #9/9)#17

Merged
sriumcp merged 3 commits intomainfrom
pr9ofjourney
Jan 28, 2026
Merged

[Feature] Remove journey event buffering (PR #9/9)#17
sriumcp merged 3 commits intomainfrom
pr9ofjourney

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 28, 2026

Summary

Completes the migration to OTEL-based journey tracing by removing all intermediate buffering and export mechanisms introduced in earlier PRs. This is the final PR in the 9-part journey tracing implementation series.

Changes:

  • Remove journey event buffer dictionary and all buffering logic from scheduler
  • Remove journey event flushing in update_from_output()
  • Remove journey event export from output processor's do_tracing()
  • Add direct timestamp capture (queued_ts, scheduled_ts) to Request object using time.monotonic()
  • Propagate timestamps through EngineCoreOutput for Prometheus metrics
  • Preserve backward compatibility with deprecated journey_events parameters

Key Design Decisions:

  • Journey events now emit exclusively as OTEL spans (real-time)
  • Prometheus metrics capture timestamps directly on Request objects (independent of tracing)
  • Monotonic time domain used for all duration measurements
  • Defensive guard ensures scheduled_ts set only once (never overwritten)

Test Plan

Added 16 comprehensive tests in tests/v1/core/test_pr9_no_buffering.py:

TestNoBuffering (3 tests)

  • Verify buffer dictionary doesn't exist
  • Verify no buffering during request lifecycle
  • Verify EngineCoreOutputs.journey_events always None

TestSpanInfrastructure (2 tests)

  • Verify span tracking infrastructure exists
  • Verify span cleanup is safe and idempotent

TestMetricsIndependence (3 tests)

  • Verify metrics work with tracing disabled
  • Verify monotonic time domain used
  • Verify timestamps stored in Request object

TestBackwardCompatibility (2 tests)

  • Verify journey_events parameter accepted
  • Verify EngineCoreOutputs.journey_events field exists

TestTimestampCapture (4 tests)

  • Verify queued_ts captured on add_request
  • Verify scheduled_ts captured on first schedule
  • Verify scheduled_ts not overwritten on subsequent schedules
  • Verify timestamps not captured when log_stats=False

TestZeroOverheadWhenDisabled (2 tests)

  • Verify no tracing structures when disabled
  • Verify metrics independent of tracing

Test Results: All 16 PR #9 tests pass ✓
Regression Check: All existing scheduler tests pass ✓

Dependencies

This PR depends on PRs #1-8 in the journey tracing series:

Files Modified

  • vllm/v1/core/sched/scheduler.py - Remove buffering, add direct timestamp capture
  • vllm/v1/engine/output_processor.py - Remove export, propagate timestamps
  • vllm/v1/engine/async_llm.py - Remove journey event distribution
  • vllm/v1/request.py - Add queued_ts, scheduled_ts fields
  • vllm/v1/engine/__init__.py - Add timestamp fields to EngineCoreOutput
  • tests/v1/core/test_pr9_no_buffering.py - New comprehensive test suite
  • JOURNEY_TRACING_PR_PLAN.md - Update PR [CI] Add Docker build and push workflow #9 design specification

🤖 Generated with Claude Code

sriumcp and others added 3 commits January 27, 2026 20:47
Completes migration to OTEL-based journey tracing by removing all intermediate
buffering and export mechanisms. Journey events are now emitted exclusively as
OTEL spans in real-time, while Prometheus metrics capture timestamps directly
on Request objects using monotonic time.

Changes:
- Remove journey event buffer dictionary and flushing logic from scheduler
- Remove journey event export from output processor
- Add direct timestamp capture (queued_ts, scheduled_ts) to Request
- Preserve backward compatibility with deprecated journey_events parameters
- Add 16 comprehensive tests verifying no buffering, span infrastructure,
  metrics independence, and backward compatibility

All 16 PR #9 tests pass. All existing scheduler tests pass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…tats

Update plan document to reflect actual implementation results vs estimates:

Changes:
- Update total line counts: ~7,528 added / ~1,116 removed (was ~618/~280)
- Update PR #9 stats: 16 tests, ~478 added / ~389 removed (was 4-5 tests)
- Update total test count: 27+ journey tracing tests (was 77)
- Add implementation timeline: Jan 23-27, 2026
- Add "Implementation Status" section with all completed PRs
- Update PR #0 description to clarify metrics restoration evolution
- Add timestamp propagation path diagram for PR #9
- Clarify that journey event buffering removed in PR #9

All stats now match actual git history and test counts.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Additional documentation improvements:

- Add "Implementation Status" section with all completed PRs (PR #0-9)
  with commit hashes and PR numbers
- Add timestamp propagation path diagram showing Request → EngineCoreOutput
  → OutputProcessor → req_state.stats flow
- Update PR #0 description to clarify metrics restoration evolution
  (journey events were interim, replaced by direct capture in PR #9)
- Clarify timeline and completion status

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sriumcp sriumcp merged commit 1d9b9f3 into main Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant