[Feature] Add CLI flag for journey tracing with OTEL integration by sriumcp · Pull Request #3 · inference-sim/vllm

sriumcp · 2026-01-23T21:49:56Z

Summary

Adds --enable-journey-tracing CLI flag to vllm serve, enabling the v1 scheduler's request journey event tracing feature with automatic OpenTelemetry export.

Journey tracing was implemented in PR #2 but not exposed via CLI. This PR makes it accessible to users and integrates it with vLLM's existing OTEL infrastructure.

What Changed

CLI Enablement

Added --enable-journey-tracing flag to vllm serve (disabled by default)
Config flows: CLI → EngineArgs → ObservabilityConfig → Scheduler
Single source of truth in ObservabilityConfig.enable_journey_tracing

OTEL Integration

Journey events automatically exported as OTEL span events when tracing is active
Events exported on request completion with full lifecycle:
- journey.QUEUED - Request added to scheduler
- journey.SCHEDULED - Request allocated resources
- journey.FIRST_TOKEN - First token generated (TTFT)
- journey.PREEMPTED - Request preempted
- journey.FINISHED - Request completed
Proper guards: span.is_recording() to respect OTEL state
None values excluded from attributes (OTEL-compliant)
Monotonic timestamps preserved as ts.monotonic attribute

Critical Bug Fixes

AsyncLLM chunking event loss: Events now distributed once before chunk processing
Events without outputs dropped: QUEUED events for non-scheduled requests now preserved
Event duplication prevention: Proper consumption semantics with pop()
Memory leak prevention: Events cleared after export or on request finish

How to Use

Basic usage (collection only)

vllm serve meta-llama/Llama-3.2-1B-Instruct --enable-journey-tracing

With OTEL export (recommended for production)

vllm serve meta-llama/Llama-3.2-1B-Instruct \
    --enable-journey-tracing \
    --otlp-traces-endpoint http://localhost:4317

Events will appear in Jaeger/Tempo/Zipkin as span events on the llm_request span.

Programmatic API

from vllm.config import ObservabilityConfig

observability_config = ObservabilityConfig(
    enable_journey_tracing=True,
    otlp_traces_endpoint="http://localhost:4317"
)

Testing

Comprehensive test coverage (14 tests, all passing ✅)

CLI & Config:

test_enable_journey_tracing_parsing() - Verify flag parsing
test_enable_journey_tracing_config_plumbing() - Verify config flow

Event Accumulation:

test_journey_events_accumulation() - Basic accumulation
test_journey_events_accumulation_across_iterations() - Multi-iteration
test_journey_events_ignored_for_unknown_requests() - Unknown request handling
test_journey_events_without_outputs_are_accumulated() - QUEUED events preserved
test_journey_events_with_async_chunking() - AsyncLLM chunking behavior

OTEL Integration:

test_otel_journey_events_span_events() - Span event export
test_otel_journey_events_with_preemption() - Preemption handling
test_otel_journey_events_without_tracer() - Graceful degradation
test_otel_journey_events_not_exported_when_span_not_recording() - Respects OTEL state

Memory Safety:

test_otel_journey_events_no_duplication_across_iterations() - No duplicates
test_otel_journey_events_cleared_after_each_do_tracing_call() - Cleared after export
test_journey_events_cleared_on_finish_without_tracer() - Cleared without tracer

Test results

$ pytest tests/v1/engine/test_journey_tracing_integration.py tests/engine/test_arg_utils.py::test_enable_journey_tracing* -v
================================ 14 passed in 1.81s ================================

Documentation

Updated JOURNEY_TRACING.md with CLI usage examples
Added "Event Delivery Guarantees & Caveats" section
Corrected memory overhead claims (minimal vs zero)
Production deployment recommendations with OTEL sampling guidance

Performance Impact

When disabled (default):

CPU: Single boolean check per emission point (6 checks/request)
Memory: ~56 bytes per request (empty list in RequestState)
Throughput: Negligible impact

When enabled:

Event creation: O(1) per event
Typical event count: 5-7 events per request (without/with preemption)
Events accumulate until request completion (bounded by O(5 + 2*preemptions))
Memory cleared on export or finish

Breaking Changes

None. Feature is disabled by default and fully backward compatible.

Related PRs

Depends on: [Feature] Add request journey event tracing to v1 scheduler #2 (Request journey event tracing implementation)
Depends on: [Feature] Add monotonically increasing step counter to vLLM scheduler #1 (Monotonically increasing step counter)

Checklist

CLI flag added and tested
Config plumbing verified end-to-end
OTEL integration tested with mocks
Critical bugs fixed (AsyncLLM chunking, events without outputs)
Memory leaks prevented
Documentation updated
14 comprehensive tests passing
No performance regression when disabled
Backward compatible (disabled by default)

Example Output

When viewing traces in Jaeger with --enable-journey-tracing --otlp-traces-endpoint http://localhost:4317:

Span: llm_request (request-123)
├─ Span Event: journey.QUEUED (ts.monotonic=1234.5, scheduler.step=null, phase=PREFILL)
├─ Span Event: journey.SCHEDULED (ts.monotonic=1234.6, scheduler.step=1, schedule.kind=FIRST)
├─ Span Event: journey.FIRST_TOKEN (ts.monotonic=1235.2, scheduler.step=5, phase=DECODE)
└─ Span Event: journey.FINISHED (ts.monotonic=1238.9, scheduler.step=42, finish.status=length)

🤖 Generated with Claude Code

Adds --enable-journey-tracing CLI flag to vllm serve, enabling the v1 scheduler's request journey event tracing feature. Journey events are automatically exported to OpenTelemetry when tracing is configured. Features: - CLI flag --enable-journey-tracing (disabled by default) - Automatic OTEL span event export when tracer is active - Journey events (QUEUED, SCHEDULED, FIRST_TOKEN, PREEMPTED, FINISHED) exported with full progress snapshots - Events cleared after export to prevent memory leaks Critical bug fixes: - Fixed event loss in AsyncLLM chunking (events now distributed before chunk processing) - Fixed silent dropping of events for requests without outputs (QUEUED events for non-scheduled requests) - Fixed event duplication prevention with proper consumption semantics OTEL integration: - Journey events exported as span events on request completion - Proper span.is_recording() guards to respect OTEL state - None values excluded from attributes (OTEL-compliant) - Monotonic timestamps preserved as ts.monotonic attribute Testing: - 14 comprehensive integration tests covering CLI, OTEL, edge cases - Tests for AsyncLLM chunking behavior - Tests for events without corresponding outputs - Tests for memory leak prevention and duplication Documentation: - Updated JOURNEY_TRACING.md with CLI usage examples - Added event delivery guarantees and caveats section - Corrected memory overhead claims (minimal vs zero) - Production deployment recommendations Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Extends the centralized cleanup method to handle journey tracing state alongside core span cleanup. Fixes memory leak on natural completion path. Changes: - Extend _end_core_span_and_cleanup() with decoupled cleanup logic - Cleanup #1: Core spans (always runs, independent of flags) - Cleanup #2: Journey state (only if journey tracing enabled) - Remove duplicate inline cleanup from finish_requests() - Add 4 tests verifying state cleanup on all termination paths Tests: - test_journey_state_created: Verify state initialization - test_journey_state_cleaned_on_finish: Explicit abort cleanup - test_journey_state_cleaned_on_completion: Natural completion cleanup - test_no_state_leak: No accumulation over 20 iterations All 95 tests passing (4 new + 91 existing). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Updates: - Mark PR #3 as COMPLETED in PR sequence summary - Update PR dependencies to show PR #3 complete - Add PR #3 to Implementation History section with full details - Document commit hash (f4cf790) and PR number (vllm-project#33126) - Record test results, code review process, and key achievements Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Feature] Add journey state cleanup to scheduler (PR #3/9) Extends the centralized cleanup method to handle journey tracing state alongside core span cleanup. Fixes memory leak on natural completion path. Changes: - Extend _end_core_span_and_cleanup() with decoupled cleanup logic - Cleanup #1: Core spans (always runs, independent of flags) - Cleanup #2: Journey state (only if journey tracing enabled) - Remove duplicate inline cleanup from finish_requests() - Add 4 tests verifying state cleanup on all termination paths Tests: - test_journey_state_created: Verify state initialization - test_journey_state_cleaned_on_finish: Explicit abort cleanup - test_journey_state_cleaned_on_completion: Natural completion cleanup - test_no_state_leak: No accumulation over 20 iterations All 95 tests passing (4 new + 91 existing). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Mark PR #3 as completed in journey tracing plan Updates: - Mark PR #3 as COMPLETED in PR sequence summary - Update PR dependencies to show PR #3 complete - Add PR #3 to Implementation History section with full details - Document commit hash (f4cf790) and PR number (vllm-project#33126) - Record test results, code review process, and key achievements Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements step-level observability with probabilistic sampling: - CLI flags: --step-tracing-enabled, --step-tracing-sample-rate - Emits batch summary events per sampled scheduler step - 16 attributes: queue depths, batch composition, token counts, KV metrics - O(n) complexity, failure-safe, disabled by default - 9 comprehensive tests, zero regressions (111/111 tests pass) Fixes applied based on review: - Fixed O(n²) → O(n) complexity with dict-based lookup - Moved emission before _update_after_schedule() for spec compliance - Removed dead code, explicit endpoint handling, gated timestamp capture - Documented test assumptions for spec decode compatibility Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements step-level observability with probabilistic sampling: - CLI flags: --step-tracing-enabled, --step-tracing-sample-rate - Emits batch summary events per sampled scheduler step - 16 attributes: queue depths, batch composition, token counts, KV metrics - O(n) complexity, failure-safe, disabled by default - 9 comprehensive tests, zero regressions (111/111 tests pass) Fixes applied based on review: - Fixed O(n²) → O(n) complexity with dict-based lookup - Moved emission before _update_after_schedule() for spec compliance - Removed dead code, explicit endpoint handling, gated timestamp capture - Documented test assumptions for spec decode compatibility Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860

…ty (PR #5) (#27) * [Feature] Add rich request snapshot stream (PR #5) Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Update step tracing plan with implementation progress Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860 --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

sriumcp merged commit 995ffad into main Jan 23, 2026

sriumcp mentioned this pull request Jan 27, 2026

[Feature] Add journey state cleanup to scheduler (PR #3/9) #11

Merged

sriumcp mentioned this pull request Jan 28, 2026

[Feature] Add step-level batch summary tracing (PR #3) #22

Merged

sriumcp mentioned this pull request Jan 29, 2026

[Feature] Add rich request snapshot stream for step-level observability (PR #5) #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add CLI flag for journey tracing with OTEL integration#3

[Feature] Add CLI flag for journey tracing with OTEL integration#3
sriumcp merged 1 commit intomainfrom
cli

sriumcp commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sriumcp commented Jan 23, 2026

Summary

What Changed

CLI Enablement

OTEL Integration

Critical Bug Fixes

How to Use

Basic usage (collection only)

With OTEL export (recommended for production)

Programmatic API

Testing

Comprehensive test coverage (14 tests, all passing ✅)

Test results

Documentation

Performance Impact

Breaking Changes

Related PRs

Checklist

Example Output

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant