[Refactor] Use SpanAttributes constants for journey event attributes by sriumcp · Pull Request #4 · inference-sim/vllm

sriumcp · 2026-01-24T14:26:29Z

Summary

Refactors hard-coded journey event attribute strings in output_processor.py to use centralized constants from the SpanAttributes class. This change improves code maintainability and brings journey event attributes in line with the existing pattern used for all other OpenTelemetry attributes in the codebase.

Motivation

Journey event attributes were the only OTEL attributes using hard-coded strings throughout the codebase. All other OTEL attributes (25+ span attributes like GEN_AI_USAGE_COMPLETION_TOKENS, GEN_AI_LATENCY_E2E, etc.) use constants defined in the SpanAttributes class in vllm/tracing.py.

Issues with hard-coded strings:

Inconsistency: Journey attributes didn't follow established patterns
Maintainability: Risk of typos when referencing attribute names
Discoverability: Developers couldn't find attribute definitions in one place

Changes

1. Added Constants to `SpanAttributes` (`vllm/tracing.py`)

Added 11 new constants with the JOURNEY_ prefix:

# Journey event attributes (for request lifecycle span events)
JOURNEY_EVENT_TYPE = "event.type"
JOURNEY_TS_MONOTONIC = "ts.monotonic"
JOURNEY_PHASE = "phase"
JOURNEY_PREFILL_DONE_TOKENS = "prefill.done_tokens"
JOURNEY_PREFILL_TOTAL_TOKENS = "prefill.total_tokens"
JOURNEY_DECODE_DONE_TOKENS = "decode.done_tokens"
JOURNEY_DECODE_MAX_TOKENS = "decode.max_tokens"
JOURNEY_NUM_PREEMPTIONS = "num_preemptions"
JOURNEY_SCHEDULER_STEP = "scheduler.step"
JOURNEY_SCHEDULE_KIND = "schedule.kind"
JOURNEY_FINISH_STATUS = "finish.status"

2. Updated Usage (`vllm/v1/engine/output_processor.py`)

Replaced all hard-coded attribute strings with constants:

Before:

attributes = {
    "event.type": event.event_type.name,
    "ts.monotonic": event.ts_monotonic,
    "scheduler.step": event.scheduler_step,
    # ... etc
}

After:

attributes = {
    SpanAttributes.JOURNEY_EVENT_TYPE: event.event_type.name,
    SpanAttributes.JOURNEY_TS_MONOTONIC: event.ts_monotonic,
    SpanAttributes.JOURNEY_SCHEDULER_STEP: event.scheduler_step,
    # ... etc
}

3. Documentation (`JOURNEY_TRACING.md`)

Added Claude Code developer guidance to help contributors explore the journey tracing feature.

Safety & Testing

Zero Behavioral Changes ✅

Attribute string values remain byte-for-byte identical
OTEL exporters see the exact same attribute names
No changes to exported data format or API

Test Results ✅

All 20 journey tracing tests pass:

✅ 8/8 tests in tests/v1/core/test_journey_events.py
✅ 12/12 tests in tests/v1/engine/test_journey_tracing_integration.py

Key test validated:

test_otel_journey_events_span_events - Verifies attributes are exported correctly to OTEL

Why Tests Pass Without Modification

Tests check attributes by their string keys (e.g., assert attrs["event.type"] == "QUEUED"). Since our constants evaluate to these exact strings, the tests see no difference. This confirms the refactoring is purely internal.

Verification Performed

✅ Verified all 11 constants evaluate to correct string values
✅ Confirmed no hard-coded journey attribute strings remain
✅ Validated dictionary key equivalence (constants → strings)
✅ All journey event tests pass
✅ OTEL export test confirms correct attribute names

Benefits

Consistency: Journey attributes now follow the same pattern as all other OTEL attributes
Maintainability: Centralized definitions reduce risk of typos
Type Safety: IDE autocomplete and type checking for attribute names
Discoverability: All OTEL attribute definitions in one location (SpanAttributes class)
Documentation: Constants serve as self-documenting code

Backward Compatibility

✅ Fully backward compatible

No changes to exported attribute names or values
Existing OTEL collectors and dashboards work without modification
Tests require no updates

Example Usage

Developers can now reference journey attributes consistently:

from vllm.tracing import SpanAttributes

# Clear, discoverable, type-safe
span.add_event(
    name="journey.SCHEDULED",
    attributes={
        SpanAttributes.JOURNEY_EVENT_TYPE: "SCHEDULED",
        SpanAttributes.JOURNEY_SCHEDULER_STEP: step_number,
    }
)

Scope

Files modified: 3

vllm/tracing.py (+12 lines)
vllm/v1/engine/output_processor.py (22 changes)
JOURNEY_TRACING.md (+1 line)

Total changes: 24 insertions, 11 deletions

This is a focused refactoring with minimal scope and maximum safety.

🤖 Generated with Claude Code

Refactored hard-coded journey event attribute strings to use centralized constants in SpanAttributes class. Added 11 new JOURNEY_* constants to vllm/tracing.py and updated output_processor.py to use them. This improves maintainability and consistency with existing OTEL attribute patterns. Also added Claude Code developer guidance to JOURNEY_TRACING.md. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit migrates span attributes from the deprecated do_tracing() method to the new dual-stream API span architecture (partial completion - API side only). API Span Attributes Added: **Request Metadata (set at ARRIVED):** - GEN_AI_RESPONSE_MODEL - model name - GEN_AI_USAGE_PROMPT_TOKENS - prompt token count - GEN_AI_REQUEST_TEMPERATURE - sampling param (if not None) - GEN_AI_REQUEST_TOP_P - sampling param (if not None) - GEN_AI_REQUEST_MAX_TOKENS - sampling param (if not None) - GEN_AI_REQUEST_N - sampling param (if not None) - GEN_AI_REQUEST_ID - already set at span creation **Completion Metrics (set at DEPARTED):** - GEN_AI_LATENCY_E2E - end-to-end latency (DEPARTED - ARRIVED) - GEN_AI_LATENCY_TIME_TO_FIRST_TOKEN - time to first token (FIRST_RESPONSE - ARRIVED) - GEN_AI_USAGE_COMPLETION_TOKENS - completion token count Implementation Details: 1. Added _set_api_span_request_attributes() helper method - Sets model, prompt tokens, and sampling params on API span - Called after sampling_params are computed (line ~430) 2. Added timestamp tracking to RequestResponseMetadata - arrival_time: monotonic time when span created - first_response_time: monotonic time when first output received - Used for calculating latencies at DEPARTED 3. Updated both streaming and non-streaming paths - Track first_response_time in result_generator iteration - Calculate and set latencies at DEPARTED event - Set completion tokens from final_usage_info Remaining Work (Core Span): - GEN_AI_LATENCY_TIME_IN_QUEUE (scheduler) - GEN_AI_LATENCY_TIME_IN_MODEL_PREFILL (scheduler) - GEN_AI_LATENCY_TIME_IN_MODEL_DECODE (scheduler) - GEN_AI_LATENCY_TIME_IN_MODEL_INFERENCE (scheduler) Related: Addresses Task #4 (API side) - full feature parity with old do_tracing() Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add journey event emission directly to OpenTelemetry spans in parallel with existing buffering. Events (QUEUED, SCHEDULED, PREEMPTED, FIRST_TOKEN, FINISHED) are now emitted to core spans with full progress snapshots. Changes: - Extended _emit_journey_event() to accept optional span parameter - Added span emission logic with defensive error handling - Updated all 6 call sites to pass span from _core_spans dict - Added FINISHED emission in natural completion path (update_from_output) - Extended _compute_progress_snapshot() to support WAITING phase - Changed QUEUED scheduler_step from None to counter (typically 0) - Added 9 comprehensive tests covering all event types and edge cases Safety properties: - No new resources created (uses existing spans from PR#2) - Defensive programming (try/except around all OTEL calls) - Zero overhead when disabled (feature flag gate) - Legacy buffering preserved (parallel operation until PR#9) Tests: 9 new tests (328 lines), all passing Size: ~113 lines production code, 328 lines test code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Updated JOURNEY_TRACING_PR_PLAN.md to reflect PR #4 completion: - Updated PR sequence summary table (PR #4: COMPLETED) - Updated PR dependencies diagram (PR #4: ✅ COMPLETED) - Added detailed completion status to PR #4 section - Listed all 9 tests implemented - Documented actual sizes: ~113 lines production, 328 lines test code

* [Feature] Emit journey events to core spans (PR #4/9) Add journey event emission directly to OpenTelemetry spans in parallel with existing buffering. Events (QUEUED, SCHEDULED, PREEMPTED, FIRST_TOKEN, FINISHED) are now emitted to core spans with full progress snapshots. Changes: - Extended _emit_journey_event() to accept optional span parameter - Added span emission logic with defensive error handling - Updated all 6 call sites to pass span from _core_spans dict - Added FINISHED emission in natural completion path (update_from_output) - Extended _compute_progress_snapshot() to support WAITING phase - Changed QUEUED scheduler_step from None to counter (typically 0) - Added 9 comprehensive tests covering all event types and edge cases Safety properties: - No new resources created (uses existing spans from PR#2) - Defensive programming (try/except around all OTEL calls) - Zero overhead when disabled (feature flag gate) - Legacy buffering preserved (parallel operation until PR#9) Tests: 9 new tests (328 lines), all passing Size: ~113 lines production code, 328 lines test code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Mark PR #4 as completed in journey tracing plan Updated JOURNEY_TRACING_PR_PLAN.md to reflect PR #4 completion: - Updated PR sequence summary table (PR #4: COMPLETED) - Updated PR dependencies diagram (PR #4: ✅ COMPLETED) - Added detailed completion status to PR #4 section - Listed all 9 tests implemented - Documented actual sizes: ~113 lines production, 328 lines test code --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

Add read-only KV cache observability helper module for step-level tracing. Provides utilities to extract per-request and per-step KV cache metrics using only existing exposed interfaces. Key additions: - vllm/v1/core/kv_cache_observability.py: PerRequestKVMetrics and StepKVSummary dataclasses with query functions - tests/v1/core/test_kv_cache_observability.py: 18 unit tests with minimal fakes (17 fake-based + 1 smoke test) Design principles: - Read-only access to existing KV cache state - Defensive programming (never raises exceptions) - Aggregates across all KV cache groups (multi-group support) - Guaranteed GPU metrics + optional best-effort fields - No changes to KV cache behavior, scheduler, or Request fields - No new APIs or expensive scans - Python 3.9+ compatible (uses __future__ annotations) Implementation details: - Aggregates blocks across all single_type_managers (not just [0]) - Defensive clamping for blocks_total (prevents negative values) - Conservative usage_ratio fallback (0.0 when unmeasurable) - Tests use minimal fakes (no scheduler coupling) - Fast, deterministic tests (2.48s, no heuristics) All 18 tests passing. Zero impact on existing functionality. Part of Step-Level Tracing implementation (PR #4 of 5). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add read-only KV cache observability helper module for step-level tracing. Provides utilities to extract per-request and per-step KV cache metrics using only existing exposed interfaces. Key additions: - vllm/v1/core/kv_cache_observability.py: PerRequestKVMetrics and StepKVSummary dataclasses with query functions - tests/v1/core/test_kv_cache_observability.py: 18 unit tests with minimal fakes (17 fake-based + 1 smoke test) Design principles: - Read-only access to existing KV cache state - Defensive programming (never raises exceptions) - Aggregates across all KV cache groups (multi-group support) - Guaranteed GPU metrics + optional best-effort fields - No changes to KV cache behavior, scheduler, or Request fields - No new APIs or expensive scans - Python 3.9+ compatible (uses __future__ annotations) Implementation details: - Aggregates blocks across all single_type_managers (not just [0]) - Defensive clamping for blocks_total (prevents negative values) - Conservative usage_ratio fallback (0.0 when unmeasurable) - Tests use minimal fakes (no scheduler coupling) - Fast, deterministic tests (2.48s, no heuristics) All 18 tests passing. Zero impact on existing functionality. Part of Step-Level Tracing implementation (PR #4 of 5). Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860

…ty (PR #5) (#27) * [Feature] Add rich request snapshot stream (PR #5) Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Update step tracing plan with implementation progress Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860 --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

sriumcp merged commit 9f3b156 into main Jan 24, 2026

sriumcp mentioned this pull request Jan 27, 2026

[Feature] Add journey state cleanup to scheduler (PR #3/9) #11

Merged

sriumcp mentioned this pull request Jan 27, 2026

[Feature] Emit journey events to core spans (PR #4/9) #12

Merged

9 tasks

This was referenced Jan 27, 2026

[Feature] Add API span tracking infrastructure (PR #5/9) #13

Merged

[Feature] Remove journey event buffering (PR #9/9) #17

Merged

Journey Tracing: Complete Implementation (PRs #0-#9) + Regression Audit #18

Merged

sriumcp mentioned this pull request Jan 28, 2026

[Feature] Add KV cache metrics utilities for observability (PR #4) #24

Merged

sriumcp mentioned this pull request Jan 29, 2026

[Feature] Add rich request snapshot stream for step-level observability (PR #5) #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Use SpanAttributes constants for journey event attributes#4

[Refactor] Use SpanAttributes constants for journey event attributes#4
sriumcp merged 1 commit intomainfrom
understandjourneytracing

sriumcp commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sriumcp commented Jan 24, 2026

Summary

Motivation

Issues with hard-coded strings:

Changes

1. Added Constants to SpanAttributes (vllm/tracing.py)

2. Updated Usage (vllm/v1/engine/output_processor.py)

3. Documentation (JOURNEY_TRACING.md)

Safety & Testing

Zero Behavioral Changes ✅

Test Results ✅

Why Tests Pass Without Modification

Verification Performed

Benefits

Backward Compatibility

Example Usage

Scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Added Constants to `SpanAttributes` (`vllm/tracing.py`)

2. Updated Usage (`vllm/v1/engine/output_processor.py`)

3. Documentation (`JOURNEY_TRACING.md`)