Conversation
Adds --enable-journey-tracing CLI flag to vllm serve, enabling the v1 scheduler's request journey event tracing feature. Journey events are automatically exported to OpenTelemetry when tracing is configured. Features: - CLI flag --enable-journey-tracing (disabled by default) - Automatic OTEL span event export when tracer is active - Journey events (QUEUED, SCHEDULED, FIRST_TOKEN, PREEMPTED, FINISHED) exported with full progress snapshots - Events cleared after export to prevent memory leaks Critical bug fixes: - Fixed event loss in AsyncLLM chunking (events now distributed before chunk processing) - Fixed silent dropping of events for requests without outputs (QUEUED events for non-scheduled requests) - Fixed event duplication prevention with proper consumption semantics OTEL integration: - Journey events exported as span events on request completion - Proper span.is_recording() guards to respect OTEL state - None values excluded from attributes (OTEL-compliant) - Monotonic timestamps preserved as ts.monotonic attribute Testing: - 14 comprehensive integration tests covering CLI, OTEL, edge cases - Tests for AsyncLLM chunking behavior - Tests for events without corresponding outputs - Tests for memory leak prevention and duplication Documentation: - Updated JOURNEY_TRACING.md with CLI usage examples - Added event delivery guarantees and caveats section - Corrected memory overhead claims (minimal vs zero) - Production deployment recommendations Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 27, 2026
Extends the centralized cleanup method to handle journey tracing state alongside core span cleanup. Fixes memory leak on natural completion path. Changes: - Extend _end_core_span_and_cleanup() with decoupled cleanup logic - Cleanup #1: Core spans (always runs, independent of flags) - Cleanup #2: Journey state (only if journey tracing enabled) - Remove duplicate inline cleanup from finish_requests() - Add 4 tests verifying state cleanup on all termination paths Tests: - test_journey_state_created: Verify state initialization - test_journey_state_cleaned_on_finish: Explicit abort cleanup - test_journey_state_cleaned_on_completion: Natural completion cleanup - test_no_state_leak: No accumulation over 20 iterations All 95 tests passing (4 new + 91 existing). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 27, 2026
Updates: - Mark PR #3 as COMPLETED in PR sequence summary - Update PR dependencies to show PR #3 complete - Add PR #3 to Implementation History section with full details - Document commit hash (f4cf790) and PR number (vllm-project#33126) - Record test results, code review process, and key achievements Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 27, 2026
* [Feature] Add journey state cleanup to scheduler (PR #3/9) Extends the centralized cleanup method to handle journey tracing state alongside core span cleanup. Fixes memory leak on natural completion path. Changes: - Extend _end_core_span_and_cleanup() with decoupled cleanup logic - Cleanup #1: Core spans (always runs, independent of flags) - Cleanup #2: Journey state (only if journey tracing enabled) - Remove duplicate inline cleanup from finish_requests() - Add 4 tests verifying state cleanup on all termination paths Tests: - test_journey_state_created: Verify state initialization - test_journey_state_cleaned_on_finish: Explicit abort cleanup - test_journey_state_cleaned_on_completion: Natural completion cleanup - test_no_state_leak: No accumulation over 20 iterations All 95 tests passing (4 new + 91 existing). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Mark PR #3 as completed in journey tracing plan Updates: - Mark PR #3 as COMPLETED in PR sequence summary - Update PR dependencies to show PR #3 complete - Add PR #3 to Implementation History section with full details - Document commit hash (f4cf790) and PR number (vllm-project#33126) - Record test results, code review process, and key achievements Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This was referenced Jan 27, 2026
sriumcp
added a commit
that referenced
this pull request
Jan 28, 2026
Implements step-level observability with probabilistic sampling: - CLI flags: --step-tracing-enabled, --step-tracing-sample-rate - Emits batch summary events per sampled scheduler step - 16 attributes: queue depths, batch composition, token counts, KV metrics - O(n) complexity, failure-safe, disabled by default - 9 comprehensive tests, zero regressions (111/111 tests pass) Fixes applied based on review: - Fixed O(n²) → O(n) complexity with dict-based lookup - Moved emission before _update_after_schedule() for spec compliance - Removed dead code, explicit endpoint handling, gated timestamp capture - Documented test assumptions for spec decode compatibility Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 28, 2026
Implements step-level observability with probabilistic sampling: - CLI flags: --step-tracing-enabled, --step-tracing-sample-rate - Emits batch summary events per sampled scheduler step - 16 attributes: queue depths, batch composition, token counts, KV metrics - O(n) complexity, failure-safe, disabled by default - 9 comprehensive tests, zero regressions (111/111 tests pass) Fixes applied based on review: - Fixed O(n²) → O(n) complexity with dict-based lookup - Moved emission before _update_after_schedule() for spec compliance - Removed dead code, explicit endpoint handling, gated timestamp capture - Documented test assumptions for spec decode compatibility Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 29, 2026
Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 29, 2026
Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860
sriumcp
added a commit
that referenced
this pull request
Jan 29, 2026
…ty (PR #5) (#27) * [Feature] Add rich request snapshot stream (PR #5) Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Update step tracing plan with implementation progress Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860 --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
--enable-journey-tracingCLI flag tovllm serve, enabling the v1 scheduler's request journey event tracing feature with automatic OpenTelemetry export.Journey tracing was implemented in PR #2 but not exposed via CLI. This PR makes it accessible to users and integrates it with vLLM's existing OTEL infrastructure.
What Changed
CLI Enablement
--enable-journey-tracingflag tovllm serve(disabled by default)ObservabilityConfig.enable_journey_tracingOTEL Integration
journey.QUEUED- Request added to schedulerjourney.SCHEDULED- Request allocated resourcesjourney.FIRST_TOKEN- First token generated (TTFT)journey.PREEMPTED- Request preemptedjourney.FINISHED- Request completedspan.is_recording()to respect OTEL statets.monotonicattributeCritical Bug Fixes
pop()How to Use
Basic usage (collection only)
With OTEL export (recommended for production)
vllm serve meta-llama/Llama-3.2-1B-Instruct \ --enable-journey-tracing \ --otlp-traces-endpoint http://localhost:4317Events will appear in Jaeger/Tempo/Zipkin as span events on the
llm_requestspan.Programmatic API
Testing
Comprehensive test coverage (14 tests, all passing ✅)
CLI & Config:
test_enable_journey_tracing_parsing()- Verify flag parsingtest_enable_journey_tracing_config_plumbing()- Verify config flowEvent Accumulation:
test_journey_events_accumulation()- Basic accumulationtest_journey_events_accumulation_across_iterations()- Multi-iterationtest_journey_events_ignored_for_unknown_requests()- Unknown request handlingtest_journey_events_without_outputs_are_accumulated()- QUEUED events preservedtest_journey_events_with_async_chunking()- AsyncLLM chunking behaviorOTEL Integration:
test_otel_journey_events_span_events()- Span event exporttest_otel_journey_events_with_preemption()- Preemption handlingtest_otel_journey_events_without_tracer()- Graceful degradationtest_otel_journey_events_not_exported_when_span_not_recording()- Respects OTEL stateMemory Safety:
test_otel_journey_events_no_duplication_across_iterations()- No duplicatestest_otel_journey_events_cleared_after_each_do_tracing_call()- Cleared after exporttest_journey_events_cleared_on_finish_without_tracer()- Cleared without tracerTest results
Documentation
JOURNEY_TRACING.mdwith CLI usage examplesPerformance Impact
When disabled (default):
When enabled:
Breaking Changes
None. Feature is disabled by default and fully backward compatible.
Related PRs
Checklist
Example Output
When viewing traces in Jaeger with
--enable-journey-tracing --otlp-traces-endpoint http://localhost:4317:🤖 Generated with Claude Code