Skip to content

[Refactor] Use SpanAttributes constants for journey event attributes#4

Merged
sriumcp merged 1 commit intomainfrom
understandjourneytracing
Jan 24, 2026
Merged

[Refactor] Use SpanAttributes constants for journey event attributes#4
sriumcp merged 1 commit intomainfrom
understandjourneytracing

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 24, 2026

Summary

Refactors hard-coded journey event attribute strings in output_processor.py to use centralized constants from the SpanAttributes class. This change improves code maintainability and brings journey event attributes in line with the existing pattern used for all other OpenTelemetry attributes in the codebase.

Motivation

Journey event attributes were the only OTEL attributes using hard-coded strings throughout the codebase. All other OTEL attributes (25+ span attributes like GEN_AI_USAGE_COMPLETION_TOKENS, GEN_AI_LATENCY_E2E, etc.) use constants defined in the SpanAttributes class in vllm/tracing.py.

Issues with hard-coded strings:

  • Inconsistency: Journey attributes didn't follow established patterns
  • Maintainability: Risk of typos when referencing attribute names
  • Discoverability: Developers couldn't find attribute definitions in one place

Changes

1. Added Constants to SpanAttributes (vllm/tracing.py)

Added 11 new constants with the JOURNEY_ prefix:

# Journey event attributes (for request lifecycle span events)
JOURNEY_EVENT_TYPE = "event.type"
JOURNEY_TS_MONOTONIC = "ts.monotonic"
JOURNEY_PHASE = "phase"
JOURNEY_PREFILL_DONE_TOKENS = "prefill.done_tokens"
JOURNEY_PREFILL_TOTAL_TOKENS = "prefill.total_tokens"
JOURNEY_DECODE_DONE_TOKENS = "decode.done_tokens"
JOURNEY_DECODE_MAX_TOKENS = "decode.max_tokens"
JOURNEY_NUM_PREEMPTIONS = "num_preemptions"
JOURNEY_SCHEDULER_STEP = "scheduler.step"
JOURNEY_SCHEDULE_KIND = "schedule.kind"
JOURNEY_FINISH_STATUS = "finish.status"

2. Updated Usage (vllm/v1/engine/output_processor.py)

Replaced all hard-coded attribute strings with constants:

Before:

attributes = {
    "event.type": event.event_type.name,
    "ts.monotonic": event.ts_monotonic,
    "scheduler.step": event.scheduler_step,
    # ... etc
}

After:

attributes = {
    SpanAttributes.JOURNEY_EVENT_TYPE: event.event_type.name,
    SpanAttributes.JOURNEY_TS_MONOTONIC: event.ts_monotonic,
    SpanAttributes.JOURNEY_SCHEDULER_STEP: event.scheduler_step,
    # ... etc
}

3. Documentation (JOURNEY_TRACING.md)

Added Claude Code developer guidance to help contributors explore the journey tracing feature.

Safety & Testing

Zero Behavioral Changes ✅

  • Attribute string values remain byte-for-byte identical
  • OTEL exporters see the exact same attribute names
  • No changes to exported data format or API

Test Results ✅

All 20 journey tracing tests pass:

  • 8/8 tests in tests/v1/core/test_journey_events.py
  • 12/12 tests in tests/v1/engine/test_journey_tracing_integration.py

Key test validated:

  • test_otel_journey_events_span_events - Verifies attributes are exported correctly to OTEL

Why Tests Pass Without Modification

Tests check attributes by their string keys (e.g., assert attrs["event.type"] == "QUEUED"). Since our constants evaluate to these exact strings, the tests see no difference. This confirms the refactoring is purely internal.

Verification Performed

  1. ✅ Verified all 11 constants evaluate to correct string values
  2. ✅ Confirmed no hard-coded journey attribute strings remain
  3. ✅ Validated dictionary key equivalence (constants → strings)
  4. ✅ All journey event tests pass
  5. ✅ OTEL export test confirms correct attribute names

Benefits

  1. Consistency: Journey attributes now follow the same pattern as all other OTEL attributes
  2. Maintainability: Centralized definitions reduce risk of typos
  3. Type Safety: IDE autocomplete and type checking for attribute names
  4. Discoverability: All OTEL attribute definitions in one location (SpanAttributes class)
  5. Documentation: Constants serve as self-documenting code

Backward Compatibility

Fully backward compatible

  • No changes to exported attribute names or values
  • Existing OTEL collectors and dashboards work without modification
  • Tests require no updates

Example Usage

Developers can now reference journey attributes consistently:

from vllm.tracing import SpanAttributes

# Clear, discoverable, type-safe
span.add_event(
    name="journey.SCHEDULED",
    attributes={
        SpanAttributes.JOURNEY_EVENT_TYPE: "SCHEDULED",
        SpanAttributes.JOURNEY_SCHEDULER_STEP: step_number,
    }
)

Scope

Files modified: 3

  • vllm/tracing.py (+12 lines)
  • vllm/v1/engine/output_processor.py (22 changes)
  • JOURNEY_TRACING.md (+1 line)

Total changes: 24 insertions, 11 deletions

This is a focused refactoring with minimal scope and maximum safety.


🤖 Generated with Claude Code

Refactored hard-coded journey event attribute strings to use centralized
constants in SpanAttributes class. Added 11 new JOURNEY_* constants to
vllm/tracing.py and updated output_processor.py to use them. This improves
maintainability and consistency with existing OTEL attribute patterns.

Also added Claude Code developer guidance to JOURNEY_TRACING.md.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sriumcp sriumcp merged commit 9f3b156 into main Jan 24, 2026
sriumcp added a commit that referenced this pull request Jan 26, 2026
This commit migrates span attributes from the deprecated do_tracing() method
to the new dual-stream API span architecture (partial completion - API side only).

API Span Attributes Added:

**Request Metadata (set at ARRIVED):**
- GEN_AI_RESPONSE_MODEL - model name
- GEN_AI_USAGE_PROMPT_TOKENS - prompt token count
- GEN_AI_REQUEST_TEMPERATURE - sampling param (if not None)
- GEN_AI_REQUEST_TOP_P - sampling param (if not None)
- GEN_AI_REQUEST_MAX_TOKENS - sampling param (if not None)
- GEN_AI_REQUEST_N - sampling param (if not None)
- GEN_AI_REQUEST_ID - already set at span creation

**Completion Metrics (set at DEPARTED):**
- GEN_AI_LATENCY_E2E - end-to-end latency (DEPARTED - ARRIVED)
- GEN_AI_LATENCY_TIME_TO_FIRST_TOKEN - time to first token (FIRST_RESPONSE - ARRIVED)
- GEN_AI_USAGE_COMPLETION_TOKENS - completion token count

Implementation Details:

1. Added _set_api_span_request_attributes() helper method
   - Sets model, prompt tokens, and sampling params on API span
   - Called after sampling_params are computed (line ~430)

2. Added timestamp tracking to RequestResponseMetadata
   - arrival_time: monotonic time when span created
   - first_response_time: monotonic time when first output received
   - Used for calculating latencies at DEPARTED

3. Updated both streaming and non-streaming paths
   - Track first_response_time in result_generator iteration
   - Calculate and set latencies at DEPARTED event
   - Set completion tokens from final_usage_info

Remaining Work (Core Span):
- GEN_AI_LATENCY_TIME_IN_QUEUE (scheduler)
- GEN_AI_LATENCY_TIME_IN_MODEL_PREFILL (scheduler)
- GEN_AI_LATENCY_TIME_IN_MODEL_DECODE (scheduler)
- GEN_AI_LATENCY_TIME_IN_MODEL_INFERENCE (scheduler)

Related: Addresses Task #4 (API side) - full feature parity with old do_tracing()

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 27, 2026
Add journey event emission directly to OpenTelemetry spans in parallel
with existing buffering. Events (QUEUED, SCHEDULED, PREEMPTED, FIRST_TOKEN,
FINISHED) are now emitted to core spans with full progress snapshots.

Changes:
- Extended _emit_journey_event() to accept optional span parameter
- Added span emission logic with defensive error handling
- Updated all 6 call sites to pass span from _core_spans dict
- Added FINISHED emission in natural completion path (update_from_output)
- Extended _compute_progress_snapshot() to support WAITING phase
- Changed QUEUED scheduler_step from None to counter (typically 0)
- Added 9 comprehensive tests covering all event types and edge cases

Safety properties:
- No new resources created (uses existing spans from PR#2)
- Defensive programming (try/except around all OTEL calls)
- Zero overhead when disabled (feature flag gate)
- Legacy buffering preserved (parallel operation until PR#9)

Tests: 9 new tests (328 lines), all passing
Size: ~113 lines production code, 328 lines test code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 27, 2026
Updated JOURNEY_TRACING_PR_PLAN.md to reflect PR #4 completion:
- Updated PR sequence summary table (PR #4: COMPLETED)
- Updated PR dependencies diagram (PR #4: ✅ COMPLETED)
- Added detailed completion status to PR #4 section
- Listed all 9 tests implemented
- Documented actual sizes: ~113 lines production, 328 lines test code
sriumcp added a commit that referenced this pull request Jan 27, 2026
* [Feature] Emit journey events to core spans (PR #4/9)

Add journey event emission directly to OpenTelemetry spans in parallel
with existing buffering. Events (QUEUED, SCHEDULED, PREEMPTED, FIRST_TOKEN,
FINISHED) are now emitted to core spans with full progress snapshots.

Changes:
- Extended _emit_journey_event() to accept optional span parameter
- Added span emission logic with defensive error handling
- Updated all 6 call sites to pass span from _core_spans dict
- Added FINISHED emission in natural completion path (update_from_output)
- Extended _compute_progress_snapshot() to support WAITING phase
- Changed QUEUED scheduler_step from None to counter (typically 0)
- Added 9 comprehensive tests covering all event types and edge cases

Safety properties:
- No new resources created (uses existing spans from PR#2)
- Defensive programming (try/except around all OTEL calls)
- Zero overhead when disabled (feature flag gate)
- Legacy buffering preserved (parallel operation until PR#9)

Tests: 9 new tests (328 lines), all passing
Size: ~113 lines production code, 328 lines test code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Docs] Mark PR #4 as completed in journey tracing plan

Updated JOURNEY_TRACING_PR_PLAN.md to reflect PR #4 completion:
- Updated PR sequence summary table (PR #4: COMPLETED)
- Updated PR dependencies diagram (PR #4: ✅ COMPLETED)
- Added detailed completion status to PR #4 section
- Listed all 9 tests implemented
- Documented actual sizes: ~113 lines production, 328 lines test code

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
Add read-only KV cache observability helper module for step-level tracing.
Provides utilities to extract per-request and per-step KV cache metrics
using only existing exposed interfaces.

Key additions:
- vllm/v1/core/kv_cache_observability.py: PerRequestKVMetrics and
  StepKVSummary dataclasses with query functions
- tests/v1/core/test_kv_cache_observability.py: 18 unit tests with
  minimal fakes (17 fake-based + 1 smoke test)

Design principles:
- Read-only access to existing KV cache state
- Defensive programming (never raises exceptions)
- Aggregates across all KV cache groups (multi-group support)
- Guaranteed GPU metrics + optional best-effort fields
- No changes to KV cache behavior, scheduler, or Request fields
- No new APIs or expensive scans
- Python 3.9+ compatible (uses __future__ annotations)

Implementation details:
- Aggregates blocks across all single_type_managers (not just [0])
- Defensive clamping for blocks_total (prevents negative values)
- Conservative usage_ratio fallback (0.0 when unmeasurable)
- Tests use minimal fakes (no scheduler coupling)
- Fast, deterministic tests (2.48s, no heuristics)

All 18 tests passing. Zero impact on existing functionality.

Part of Step-Level Tracing implementation (PR #4 of 5).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 28, 2026
Add read-only KV cache observability helper module for step-level tracing.
Provides utilities to extract per-request and per-step KV cache metrics
using only existing exposed interfaces.

Key additions:
- vllm/v1/core/kv_cache_observability.py: PerRequestKVMetrics and
  StepKVSummary dataclasses with query functions
- tests/v1/core/test_kv_cache_observability.py: 18 unit tests with
  minimal fakes (17 fake-based + 1 smoke test)

Design principles:
- Read-only access to existing KV cache state
- Defensive programming (never raises exceptions)
- Aggregates across all KV cache groups (multi-group support)
- Guaranteed GPU metrics + optional best-effort fields
- No changes to KV cache behavior, scheduler, or Request fields
- No new APIs or expensive scans
- Python 3.9+ compatible (uses __future__ annotations)

Implementation details:
- Aggregates blocks across all single_type_managers (not just [0])
- Defensive clamping for blocks_total (prevents negative values)
- Conservative usage_ratio fallback (0.0 when unmeasurable)
- Tests use minimal fakes (no scheduler coupling)
- Fast, deterministic tests (2.48s, no heuristics)

All 18 tests passing. Zero impact on existing functionality.

Part of Step-Level Tracing implementation (PR #4 of 5).

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 29, 2026
Implements subsampled per-request detailed progress events with KV metrics:

- Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%)
- Emit step.REQUEST_SNAPSHOT events for running requests when subsampled
- Use PR #4 get_per_request_kv_metrics() for KV cache data
- Two-stage sampling: batch summary sampled AND rich subsampled
- SpanAttributes: 10 new constants for per-request metrics
- Emission after batch summary, before _update_after_schedule()

Also fixes PR #3 CLI wiring bug:
- Wire step_tracing_enabled/sample_rate through EngineArgs
- Add fields to EngineArgs dataclass
- Pass to ObservabilityConfig constructor
- Add test_step_tracing_cli_wiring() for regression prevention

Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp added a commit that referenced this pull request Jan 29, 2026
Refresh plan to capture completed PRs #3, #4, #5 with accurate history:

Progress tracking:
- Add Implementation Progress section with status table
- Mark PR #3, #4, #5 as complete with commit hashes
- Mark PR #1, #2 as deferred (low priority, orthogonal)
- Update dependency graph with status indicators

Historical corrections:
- PR #3: CLI args defined but wiring missing (fixed in PR #5)
- PR #5: Added CLI wiring fix for all 3 step tracing flags
- Add NOTE in PR #3 section about wiring gap
- Update PR #5 behavioral contract to document CLI fix

Technical corrections:
- Fix output tokens source: len(_output_token_ids) → num_output_tokens (property)
- Update test file references: test_scheduler.py → test_step_tracing.py
- Change test count "15/15" → "test suite passing" (future-proof)

Verification updates:
- Mark all PR #3, #4, #5 checklist items as complete
- Add CLI wiring regression test item to PR #5 checklist

Current state: PR #5 ready for merge at commit f951860
sriumcp added a commit that referenced this pull request Jan 29, 2026
…ty (PR #5) (#27)

* [Feature] Add rich request snapshot stream (PR #5)

Implements subsampled per-request detailed progress events with KV metrics:

- Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%)
- Emit step.REQUEST_SNAPSHOT events for running requests when subsampled
- Use PR #4 get_per_request_kv_metrics() for KV cache data
- Two-stage sampling: batch summary sampled AND rich subsampled
- SpanAttributes: 10 new constants for per-request metrics
- Emission after batch summary, before _update_after_schedule()

Also fixes PR #3 CLI wiring bug:
- Wire step_tracing_enabled/sample_rate through EngineArgs
- Add fields to EngineArgs dataclass
- Pass to ObservabilityConfig constructor
- Add test_step_tracing_cli_wiring() for regression prevention

Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [Docs] Update step tracing plan with implementation progress

Refresh plan to capture completed PRs #3, #4, #5 with accurate history:

Progress tracking:
- Add Implementation Progress section with status table
- Mark PR #3, #4, #5 as complete with commit hashes
- Mark PR #1, #2 as deferred (low priority, orthogonal)
- Update dependency graph with status indicators

Historical corrections:
- PR #3: CLI args defined but wiring missing (fixed in PR #5)
- PR #5: Added CLI wiring fix for all 3 step tracing flags
- Add NOTE in PR #3 section about wiring gap
- Update PR #5 behavioral contract to document CLI fix

Technical corrections:
- Fix output tokens source: len(_output_token_ids) → num_output_tokens (property)
- Update test file references: test_scheduler.py → test_step_tracing.py
- Change test count "15/15" → "test suite passing" (future-proof)

Verification updates:
- Mark all PR #3, #4, #5 checklist items as complete
- Add CLI wiring regression test item to PR #5 checklist

Current state: PR #5 ready for merge at commit f951860

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant