Skip to content

[Feature] Add CLI flag for journey tracing with OTEL integration#3

Merged
sriumcp merged 1 commit intomainfrom
cli
Jan 23, 2026
Merged

[Feature] Add CLI flag for journey tracing with OTEL integration#3
sriumcp merged 1 commit intomainfrom
cli

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 23, 2026

Summary

Adds --enable-journey-tracing CLI flag to vllm serve, enabling the v1 scheduler's request journey event tracing feature with automatic OpenTelemetry export.

Journey tracing was implemented in PR #2 but not exposed via CLI. This PR makes it accessible to users and integrates it with vLLM's existing OTEL infrastructure.

What Changed

CLI Enablement

  • Added --enable-journey-tracing flag to vllm serve (disabled by default)
  • Config flows: CLI → EngineArgs → ObservabilityConfig → Scheduler
  • Single source of truth in ObservabilityConfig.enable_journey_tracing

OTEL Integration

  • Journey events automatically exported as OTEL span events when tracing is active
  • Events exported on request completion with full lifecycle:
    • journey.QUEUED - Request added to scheduler
    • journey.SCHEDULED - Request allocated resources
    • journey.FIRST_TOKEN - First token generated (TTFT)
    • journey.PREEMPTED - Request preempted
    • journey.FINISHED - Request completed
  • Proper guards: span.is_recording() to respect OTEL state
  • None values excluded from attributes (OTEL-compliant)
  • Monotonic timestamps preserved as ts.monotonic attribute

Critical Bug Fixes

  1. AsyncLLM chunking event loss: Events now distributed once before chunk processing
  2. Events without outputs dropped: QUEUED events for non-scheduled requests now preserved
  3. Event duplication prevention: Proper consumption semantics with pop()
  4. Memory leak prevention: Events cleared after export or on request finish

How to Use

Basic usage (collection only)

vllm serve meta-llama/Llama-3.2-1B-Instruct --enable-journey-tracing

With OTEL export (recommended for production)

vllm serve meta-llama/Llama-3.2-1B-Instruct \
    --enable-journey-tracing \
    --otlp-traces-endpoint http://localhost:4317

Events will appear in Jaeger/Tempo/Zipkin as span events on the llm_request span.

Programmatic API

from vllm.config import ObservabilityConfig

observability_config = ObservabilityConfig(
    enable_journey_tracing=True,
    otlp_traces_endpoint="http://localhost:4317"
)

Testing

Comprehensive test coverage (14 tests, all passing ✅)

CLI & Config:

  • test_enable_journey_tracing_parsing() - Verify flag parsing
  • test_enable_journey_tracing_config_plumbing() - Verify config flow

Event Accumulation:

  • test_journey_events_accumulation() - Basic accumulation
  • test_journey_events_accumulation_across_iterations() - Multi-iteration
  • test_journey_events_ignored_for_unknown_requests() - Unknown request handling
  • test_journey_events_without_outputs_are_accumulated() - QUEUED events preserved
  • test_journey_events_with_async_chunking() - AsyncLLM chunking behavior

OTEL Integration:

  • test_otel_journey_events_span_events() - Span event export
  • test_otel_journey_events_with_preemption() - Preemption handling
  • test_otel_journey_events_without_tracer() - Graceful degradation
  • test_otel_journey_events_not_exported_when_span_not_recording() - Respects OTEL state

Memory Safety:

  • test_otel_journey_events_no_duplication_across_iterations() - No duplicates
  • test_otel_journey_events_cleared_after_each_do_tracing_call() - Cleared after export
  • test_journey_events_cleared_on_finish_without_tracer() - Cleared without tracer

Test results

$ pytest tests/v1/engine/test_journey_tracing_integration.py tests/engine/test_arg_utils.py::test_enable_journey_tracing* -v
================================ 14 passed in 1.81s ================================

Documentation

  • Updated JOURNEY_TRACING.md with CLI usage examples
  • Added "Event Delivery Guarantees & Caveats" section
  • Corrected memory overhead claims (minimal vs zero)
  • Production deployment recommendations with OTEL sampling guidance

Performance Impact

When disabled (default):

  • CPU: Single boolean check per emission point (6 checks/request)
  • Memory: ~56 bytes per request (empty list in RequestState)
  • Throughput: Negligible impact

When enabled:

  • Event creation: O(1) per event
  • Typical event count: 5-7 events per request (without/with preemption)
  • Events accumulate until request completion (bounded by O(5 + 2*preemptions))
  • Memory cleared on export or finish

Breaking Changes

None. Feature is disabled by default and fully backward compatible.

Related PRs

Checklist

  • CLI flag added and tested
  • Config plumbing verified end-to-end
  • OTEL integration tested with mocks
  • Critical bugs fixed (AsyncLLM chunking, events without outputs)
  • Memory leaks prevented
  • Documentation updated
  • 14 comprehensive tests passing
  • No performance regression when disabled
  • Backward compatible (disabled by default)

Example Output

When viewing traces in Jaeger with --enable-journey-tracing --otlp-traces-endpoint http://localhost:4317:

Span: llm_request (request-123)
├─ Span Event: journey.QUEUED (ts.monotonic=1234.5, scheduler.step=null, phase=PREFILL)
├─ Span Event: journey.SCHEDULED (ts.monotonic=1234.6, scheduler.step=1, schedule.kind=FIRST)
├─ Span Event: journey.FIRST_TOKEN (ts.monotonic=1235.2, scheduler.step=5, phase=DECODE)
└─ Span Event: journey.FINISHED (ts.monotonic=1238.9, scheduler.step=42, finish.status=length)

🤖 Generated with Claude Code

Adds --enable-journey-tracing CLI flag to vllm serve, enabling the v1
scheduler's request journey event tracing feature. Journey events are
automatically exported to OpenTelemetry when tracing is configured.

Features:
- CLI flag --enable-journey-tracing (disabled by default)
- Automatic OTEL span event export when tracer is active
- Journey events (QUEUED, SCHEDULED, FIRST_TOKEN, PREEMPTED, FINISHED)
  exported with full progress snapshots
- Events cleared after export to prevent memory leaks

Critical bug fixes:
- Fixed event loss in AsyncLLM chunking (events now distributed before
  chunk processing)
- Fixed silent dropping of events for requests without outputs (QUEUED
  events for non-scheduled requests)
- Fixed event duplication prevention with proper consumption semantics

OTEL integration:
- Journey events exported as span events on request completion
- Proper span.is_recording() guards to respect OTEL state
- None values excluded from attributes (OTEL-compliant)
- Monotonic timestamps preserved as ts.monotonic attribute

Testing:
- 14 comprehensive integration tests covering CLI, OTEL, edge cases
- Tests for AsyncLLM chunking behavior
- Tests for events without corresponding outputs
- Tests for memory leak prevention and duplication

Documentation:
- Updated JOURNEY_TRACING.md with CLI usage examples
- Added event delivery guarantees and caveats section
- Corrected memory overhead claims (minimal vs zero)
- Production deployment recommendations

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sriumcp sriumcp merged commit 995ffad into main Jan 23, 2026
sriumcp added a commit that referenced this pull request Jan 27, 2026
Extends the centralized cleanup method to handle journey tracing state
alongside core span cleanup. Fixes memory leak on natural completion path.

Changes:
- Extend _end_core_span_and_cleanup() with decoupled cleanup logic
  - Cleanup #1: Core spans (always runs, independent of flags)
  - Cleanup #2: Journey state (only if journey tracing enabled)
- Remove duplicate inline cleanup from finish_requests()
- Add 4 tests verifying state cleanup on all termination paths

Tests:
- test_journey_state_created: Verify state initialization
- test_journey_state_cleaned_on_finish: Explicit abort cleanup
- test_journey_state_cleaned_on_completion: Natural completion cleanup
- test_no_state_leak: No accumulation over 20 iterations

All 95 tests passing (4 new + 91 existing).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 27, 2026
Updates:
- Mark PR #3 as COMPLETED in PR sequence summary
- Update PR dependencies to show PR #3 complete
- Add PR #3 to Implementation History section with full details
- Document commit hash (f4cf790) and PR number (vllm-project#33126)
- Record test results, code review process, and key achievements

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 27, 2026
* [Feature] Add journey state cleanup to scheduler (PR #3/9)

Extends the centralized cleanup method to handle journey tracing state
alongside core span cleanup. Fixes memory leak on natural completion path.

Changes:
- Extend _end_core_span_and_cleanup() with decoupled cleanup logic
  - Cleanup #1: Core spans (always runs, independent of flags)
  - Cleanup #2: Journey state (only if journey tracing enabled)
- Remove duplicate inline cleanup from finish_requests()
- Add 4 tests verifying state cleanup on all termination paths

Tests:
- test_journey_state_created: Verify state initialization
- test_journey_state_cleaned_on_finish: Explicit abort cleanup
- test_journey_state_cleaned_on_completion: Natural completion cleanup
- test_no_state_leak: No accumulation over 20 iterations

All 95 tests passing (4 new + 91 existing).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Docs] Mark PR #3 as completed in journey tracing plan

Updates:
- Mark PR #3 as COMPLETED in PR sequence summary
- Update PR dependencies to show PR #3 complete
- Add PR #3 to Implementation History section with full details
- Document commit hash (f4cf790) and PR number (vllm-project#33126)
- Record test results, code review process, and key achievements

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
Implements step-level observability with probabilistic sampling:
- CLI flags: --step-tracing-enabled, --step-tracing-sample-rate
- Emits batch summary events per sampled scheduler step
- 16 attributes: queue depths, batch composition, token counts, KV metrics
- O(n) complexity, failure-safe, disabled by default
- 9 comprehensive tests, zero regressions (111/111 tests pass)

Fixes applied based on review:
- Fixed O(n²) → O(n) complexity with dict-based lookup
- Moved emission before _update_after_schedule() for spec compliance
- Removed dead code, explicit endpoint handling, gated timestamp capture
- Documented test assumptions for spec decode compatibility

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
Implements step-level observability with probabilistic sampling:
- CLI flags: --step-tracing-enabled, --step-tracing-sample-rate
- Emits batch summary events per sampled scheduler step
- 16 attributes: queue depths, batch composition, token counts, KV metrics
- O(n) complexity, failure-safe, disabled by default
- 9 comprehensive tests, zero regressions (111/111 tests pass)

Fixes applied based on review:
- Fixed O(n²) → O(n) complexity with dict-based lookup
- Moved emission before _update_after_schedule() for spec compliance
- Removed dead code, explicit endpoint handling, gated timestamp capture
- Documented test assumptions for spec decode compatibility

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 29, 2026
Implements subsampled per-request detailed progress events with KV metrics:

- Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%)
- Emit step.REQUEST_SNAPSHOT events for running requests when subsampled
- Use PR #4 get_per_request_kv_metrics() for KV cache data
- Two-stage sampling: batch summary sampled AND rich subsampled
- SpanAttributes: 10 new constants for per-request metrics
- Emission after batch summary, before _update_after_schedule()

Also fixes PR #3 CLI wiring bug:
- Wire step_tracing_enabled/sample_rate through EngineArgs
- Add fields to EngineArgs dataclass
- Pass to ObservabilityConfig constructor
- Add test_step_tracing_cli_wiring() for regression prevention

Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 29, 2026
Refresh plan to capture completed PRs #3, #4, #5 with accurate history:

Progress tracking:
- Add Implementation Progress section with status table
- Mark PR #3, #4, #5 as complete with commit hashes
- Mark PR #1, #2 as deferred (low priority, orthogonal)
- Update dependency graph with status indicators

Historical corrections:
- PR #3: CLI args defined but wiring missing (fixed in PR #5)
- PR #5: Added CLI wiring fix for all 3 step tracing flags
- Add NOTE in PR #3 section about wiring gap
- Update PR #5 behavioral contract to document CLI fix

Technical corrections:
- Fix output tokens source: len(_output_token_ids) → num_output_tokens (property)
- Update test file references: test_scheduler.py → test_step_tracing.py
- Change test count "15/15" → "test suite passing" (future-proof)

Verification updates:
- Mark all PR #3, #4, #5 checklist items as complete
- Add CLI wiring regression test item to PR #5 checklist

Current state: PR #5 ready for merge at commit f951860
sriumcp added a commit that referenced this pull request Jan 29, 2026
…ty (PR #5) (#27)

* [Feature] Add rich request snapshot stream (PR #5)

Implements subsampled per-request detailed progress events with KV metrics:

- Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%)
- Emit step.REQUEST_SNAPSHOT events for running requests when subsampled
- Use PR #4 get_per_request_kv_metrics() for KV cache data
- Two-stage sampling: batch summary sampled AND rich subsampled
- SpanAttributes: 10 new constants for per-request metrics
- Emission after batch summary, before _update_after_schedule()

Also fixes PR #3 CLI wiring bug:
- Wire step_tracing_enabled/sample_rate through EngineArgs
- Add fields to EngineArgs dataclass
- Pass to ObservabilityConfig constructor
- Add test_step_tracing_cli_wiring() for regression prevention

Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Docs] Update step tracing plan with implementation progress

Refresh plan to capture completed PRs #3, #4, #5 with accurate history:

Progress tracking:
- Add Implementation Progress section with status table
- Mark PR #3, #4, #5 as complete with commit hashes
- Mark PR #1, #2 as deferred (low priority, orthogonal)
- Update dependency graph with status indicators

Historical corrections:
- PR #3: CLI args defined but wiring missing (fixed in PR #5)
- PR #5: Added CLI wiring fix for all 3 step tracing flags
- Add NOTE in PR #3 section about wiring gap
- Update PR #5 behavioral contract to document CLI fix

Technical corrections:
- Fix output tokens source: len(_output_token_ids) → num_output_tokens (property)
- Update test file references: test_scheduler.py → test_step_tracing.py
- Change test count "15/15" → "test suite passing" (future-proof)

Verification updates:
- Mark all PR #3, #4, #5 checklist items as complete
- Add CLI wiring regression test item to PR #5 checklist

Current state: PR #5 ready for merge at commit f951860

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant