[Refactor] Use SpanAttributes constants for journey event attributes#4
Merged
[Refactor] Use SpanAttributes constants for journey event attributes#4
Conversation
Refactored hard-coded journey event attribute strings to use centralized constants in SpanAttributes class. Added 11 new JOURNEY_* constants to vllm/tracing.py and updated output_processor.py to use them. This improves maintainability and consistency with existing OTEL attribute patterns. Also added Claude Code developer guidance to JOURNEY_TRACING.md. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 26, 2026
This commit migrates span attributes from the deprecated do_tracing() method to the new dual-stream API span architecture (partial completion - API side only). API Span Attributes Added: **Request Metadata (set at ARRIVED):** - GEN_AI_RESPONSE_MODEL - model name - GEN_AI_USAGE_PROMPT_TOKENS - prompt token count - GEN_AI_REQUEST_TEMPERATURE - sampling param (if not None) - GEN_AI_REQUEST_TOP_P - sampling param (if not None) - GEN_AI_REQUEST_MAX_TOKENS - sampling param (if not None) - GEN_AI_REQUEST_N - sampling param (if not None) - GEN_AI_REQUEST_ID - already set at span creation **Completion Metrics (set at DEPARTED):** - GEN_AI_LATENCY_E2E - end-to-end latency (DEPARTED - ARRIVED) - GEN_AI_LATENCY_TIME_TO_FIRST_TOKEN - time to first token (FIRST_RESPONSE - ARRIVED) - GEN_AI_USAGE_COMPLETION_TOKENS - completion token count Implementation Details: 1. Added _set_api_span_request_attributes() helper method - Sets model, prompt tokens, and sampling params on API span - Called after sampling_params are computed (line ~430) 2. Added timestamp tracking to RequestResponseMetadata - arrival_time: monotonic time when span created - first_response_time: monotonic time when first output received - Used for calculating latencies at DEPARTED 3. Updated both streaming and non-streaming paths - Track first_response_time in result_generator iteration - Calculate and set latencies at DEPARTED event - Set completion tokens from final_usage_info Remaining Work (Core Span): - GEN_AI_LATENCY_TIME_IN_QUEUE (scheduler) - GEN_AI_LATENCY_TIME_IN_MODEL_PREFILL (scheduler) - GEN_AI_LATENCY_TIME_IN_MODEL_DECODE (scheduler) - GEN_AI_LATENCY_TIME_IN_MODEL_INFERENCE (scheduler) Related: Addresses Task #4 (API side) - full feature parity with old do_tracing() Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 27, 2026
Add journey event emission directly to OpenTelemetry spans in parallel with existing buffering. Events (QUEUED, SCHEDULED, PREEMPTED, FIRST_TOKEN, FINISHED) are now emitted to core spans with full progress snapshots. Changes: - Extended _emit_journey_event() to accept optional span parameter - Added span emission logic with defensive error handling - Updated all 6 call sites to pass span from _core_spans dict - Added FINISHED emission in natural completion path (update_from_output) - Extended _compute_progress_snapshot() to support WAITING phase - Changed QUEUED scheduler_step from None to counter (typically 0) - Added 9 comprehensive tests covering all event types and edge cases Safety properties: - No new resources created (uses existing spans from PR#2) - Defensive programming (try/except around all OTEL calls) - Zero overhead when disabled (feature flag gate) - Legacy buffering preserved (parallel operation until PR#9) Tests: 9 new tests (328 lines), all passing Size: ~113 lines production code, 328 lines test code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
9 tasks
sriumcp
added a commit
that referenced
this pull request
Jan 27, 2026
Updated JOURNEY_TRACING_PR_PLAN.md to reflect PR #4 completion: - Updated PR sequence summary table (PR #4: COMPLETED) - Updated PR dependencies diagram (PR #4: ✅ COMPLETED) - Added detailed completion status to PR #4 section - Listed all 9 tests implemented - Documented actual sizes: ~113 lines production, 328 lines test code
sriumcp
added a commit
that referenced
this pull request
Jan 27, 2026
* [Feature] Emit journey events to core spans (PR #4/9) Add journey event emission directly to OpenTelemetry spans in parallel with existing buffering. Events (QUEUED, SCHEDULED, PREEMPTED, FIRST_TOKEN, FINISHED) are now emitted to core spans with full progress snapshots. Changes: - Extended _emit_journey_event() to accept optional span parameter - Added span emission logic with defensive error handling - Updated all 6 call sites to pass span from _core_spans dict - Added FINISHED emission in natural completion path (update_from_output) - Extended _compute_progress_snapshot() to support WAITING phase - Changed QUEUED scheduler_step from None to counter (typically 0) - Added 9 comprehensive tests covering all event types and edge cases Safety properties: - No new resources created (uses existing spans from PR#2) - Defensive programming (try/except around all OTEL calls) - Zero overhead when disabled (feature flag gate) - Legacy buffering preserved (parallel operation until PR#9) Tests: 9 new tests (328 lines), all passing Size: ~113 lines production code, 328 lines test code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Mark PR #4 as completed in journey tracing plan Updated JOURNEY_TRACING_PR_PLAN.md to reflect PR #4 completion: - Updated PR sequence summary table (PR #4: COMPLETED) - Updated PR dependencies diagram (PR #4: ✅ COMPLETED) - Added detailed completion status to PR #4 section - Listed all 9 tests implemented - Documented actual sizes: ~113 lines production, 328 lines test code --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This was referenced Jan 27, 2026
sriumcp
added a commit
that referenced
this pull request
Jan 28, 2026
Add read-only KV cache observability helper module for step-level tracing. Provides utilities to extract per-request and per-step KV cache metrics using only existing exposed interfaces. Key additions: - vllm/v1/core/kv_cache_observability.py: PerRequestKVMetrics and StepKVSummary dataclasses with query functions - tests/v1/core/test_kv_cache_observability.py: 18 unit tests with minimal fakes (17 fake-based + 1 smoke test) Design principles: - Read-only access to existing KV cache state - Defensive programming (never raises exceptions) - Aggregates across all KV cache groups (multi-group support) - Guaranteed GPU metrics + optional best-effort fields - No changes to KV cache behavior, scheduler, or Request fields - No new APIs or expensive scans - Python 3.9+ compatible (uses __future__ annotations) Implementation details: - Aggregates blocks across all single_type_managers (not just [0]) - Defensive clamping for blocks_total (prevents negative values) - Conservative usage_ratio fallback (0.0 when unmeasurable) - Tests use minimal fakes (no scheduler coupling) - Fast, deterministic tests (2.48s, no heuristics) All 18 tests passing. Zero impact on existing functionality. Part of Step-Level Tracing implementation (PR #4 of 5). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 28, 2026
Add read-only KV cache observability helper module for step-level tracing. Provides utilities to extract per-request and per-step KV cache metrics using only existing exposed interfaces. Key additions: - vllm/v1/core/kv_cache_observability.py: PerRequestKVMetrics and StepKVSummary dataclasses with query functions - tests/v1/core/test_kv_cache_observability.py: 18 unit tests with minimal fakes (17 fake-based + 1 smoke test) Design principles: - Read-only access to existing KV cache state - Defensive programming (never raises exceptions) - Aggregates across all KV cache groups (multi-group support) - Guaranteed GPU metrics + optional best-effort fields - No changes to KV cache behavior, scheduler, or Request fields - No new APIs or expensive scans - Python 3.9+ compatible (uses __future__ annotations) Implementation details: - Aggregates blocks across all single_type_managers (not just [0]) - Defensive clamping for blocks_total (prevents negative values) - Conservative usage_ratio fallback (0.0 when unmeasurable) - Tests use minimal fakes (no scheduler coupling) - Fast, deterministic tests (2.48s, no heuristics) All 18 tests passing. Zero impact on existing functionality. Part of Step-Level Tracing implementation (PR #4 of 5). Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 29, 2026
Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 29, 2026
Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860
sriumcp
added a commit
that referenced
this pull request
Jan 29, 2026
…ty (PR #5) (#27) * [Feature] Add rich request snapshot stream (PR #5) Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Update step tracing plan with implementation progress Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860 --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactors hard-coded journey event attribute strings in
output_processor.pyto use centralized constants from theSpanAttributesclass. This change improves code maintainability and brings journey event attributes in line with the existing pattern used for all other OpenTelemetry attributes in the codebase.Motivation
Journey event attributes were the only OTEL attributes using hard-coded strings throughout the codebase. All other OTEL attributes (25+ span attributes like
GEN_AI_USAGE_COMPLETION_TOKENS,GEN_AI_LATENCY_E2E, etc.) use constants defined in theSpanAttributesclass invllm/tracing.py.Issues with hard-coded strings:
Changes
1. Added Constants to
SpanAttributes(vllm/tracing.py)Added 11 new constants with the
JOURNEY_prefix:2. Updated Usage (
vllm/v1/engine/output_processor.py)Replaced all hard-coded attribute strings with constants:
Before:
After:
3. Documentation (
JOURNEY_TRACING.md)Added Claude Code developer guidance to help contributors explore the journey tracing feature.
Safety & Testing
Zero Behavioral Changes ✅
Test Results ✅
All 20 journey tracing tests pass:
tests/v1/core/test_journey_events.pytests/v1/engine/test_journey_tracing_integration.pyKey test validated:
test_otel_journey_events_span_events- Verifies attributes are exported correctly to OTELWhy Tests Pass Without Modification
Tests check attributes by their string keys (e.g.,
assert attrs["event.type"] == "QUEUED"). Since our constants evaluate to these exact strings, the tests see no difference. This confirms the refactoring is purely internal.Verification Performed
Benefits
SpanAttributesclass)Backward Compatibility
✅ Fully backward compatible
Example Usage
Developers can now reference journey attributes consistently:
Scope
Files modified: 3
vllm/tracing.py(+12 lines)vllm/v1/engine/output_processor.py(22 changes)JOURNEY_TRACING.md(+1 line)Total changes: 24 insertions, 11 deletions
This is a focused refactoring with minimal scope and maximum safety.
🤖 Generated with Claude Code