[Feature] Add rich request snapshot stream for step-level observability (PR #5) by sriumcp · Pull Request #27 · inference-sim/vllm

sriumcp · 2026-01-29T11:28:55Z

Summary

This PR implements PR #5: Rich Request Snapshot Stream from the step-level tracing plan, adding per-request observability events within sampled scheduler steps.

Part of the step-level tracing series:

PR [Feature] Add CLI flag for journey tracing with OTEL integration #3: Step-level batch summary tracing ✅ (merged)
PR [Refactor] Use SpanAttributes constants for journey event attributes #4: KV cache metrics utilities ✅ (merged)
PR [Bugfix] Fix prefill progress tracking for chunked prefill preemption #5: Rich request snapshot stream ⬅️ this PR

What This PR Adds

1. Per-Request Snapshot Events

Emits step.REQUEST_SNAPSHOT events containing detailed state for each running request:

Request identity: request.id
Execution phase: request.phase (PREFILL/DECODE)
Token counts: prompt, computed, output tokens
Scheduling info: tokens scheduled this step, preemption count
KV cache metrics: allocated blocks, cached blocks, effective prompt length

2. Two-Stage Probabilistic Sampling

Implements hierarchical sampling to control event volume:

Step sampling (default 1%): Must first sample the batch summary
Rich subsampling (default 0.1%): Only 10% of sampled steps get per-request snapshots

Result: ~0.01% of steps emit detailed snapshots (10x reduction from batch summaries)

3. Configuration

New CLI flag and config field:

vllm serve model --step-tracing-enabled \
                  --step-tracing-sample-rate 0.01 \
                  --step-tracing-rich-subsample-rate 0.001

4. Bug Fix: CLI Flag Wiring

Critical fix: Discovered that PR #3's CLI flags were parsed but never wired to ObservabilityConfig. This PR fixes:

--step-tracing-enabled
--step-tracing-sample-rate
--step-tracing-rich-subsample-rate (new)

All three flags now properly flow: CLI → EngineArgs → ObservabilityConfig

Implementation Details

Files Modified

vllm/config/observability.py: Add step_tracing_rich_subsample_rate field (default 0.001)
vllm/tracing.py: Add 10 new SpanAttributes constants for per-request metrics
vllm/v1/core/sched/scheduler.py:
- Add _emit_rich_request_snapshots() method
- Implement two-stage emission logic at end of schedule()
vllm/engine/arg_utils.py: Wire all 3 step tracing CLI flags to config
tests/v1/core/utils.py: Add rich subsample rate parameter to test utilities
tests/v1/core/test_step_tracing.py: Add 6 new tests (5 rich snapshot + 1 CLI wiring)

Key Design Decisions

Emission Point: After batch summary, before _update_after_schedule()

Guarantees scheduler.running represents the executed batch
Matches exact state used to construct SchedulerOutput

Source of Truth: scheduler.running at SchedulerOutput construction

Stable queue: no modifications between construction and emission
Accurate representation of what the scheduler actually executed

KV Metrics: Uses PR #4's get_per_request_kv_metrics() utility

Defensive: never raises exceptions, returns minimal metrics on failure
Accesses internal req_to_blocks and num_cached_block structures

Phase Detection: num_output_tokens == 0 → PREFILL, else DECODE

num_output_tokens is a @property that computes len(_output_token_ids)
Always live, no staleness risk

Test Coverage

Added 6 comprehensive tests (all passing ✅):

test_rich_snapshot_rate_zero: Verifies 0.0 rate produces no events
test_rich_snapshot_enabled: Verifies 1.0 rate emits all running requests
test_rich_snapshot_gated_on_batch_summary: Verifies two-stage sampling gate
test_rich_snapshot_deterministic_sampling: Uses deterministic sampler for reproducibility
test_rich_snapshot_with_zero_running_requests: Handles empty running queue gracefully
test_step_tracing_cli_wiring: Regression test for CLI flag → config wiring

All step tracing tests passing (10 from PR #3 + 6 new).

Verification

# Run step tracing tests
pytest tests/v1/core/test_step_tracing.py -v

# Run CLI wiring test specifically  
pytest tests/v1/core/test_step_tracing.py::test_step_tracing_cli_wiring -v

Backward Compatibility

New config field has sensible default (0.001 = 0.1%)
New CLI flag is optional
No changes to existing APIs
AsyncScheduler inherits functionality automatically (no override needed)

Dependencies

Depends on:

✅ PR [Feature] Add CLI flag for journey tracing with OTEL integration #3: scheduler_step counter and step span infrastructure
✅ PR [Refactor] Use SpanAttributes constants for journey event attributes #4: get_per_request_kv_metrics() utility

Future Work

These per-request snapshots enable:

Request-level performance analysis
KV cache efficiency tracking per request
Correlation between batch-level and request-level metrics
Understanding of preemption impact on individual requests

🤖 Generated with Claude Code

Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860

sriumcp and others added 2 commits January 29, 2026 06:15

sriumcp merged commit d10ad59 into main Jan 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add rich request snapshot stream for step-level observability (PR #5)#27

[Feature] Add rich request snapshot stream for step-level observability (PR #5)#27
sriumcp merged 2 commits intomainfrom
pr5ofstepstream

sriumcp commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sriumcp commented Jan 29, 2026

Summary

What This PR Adds

1. Per-Request Snapshot Events

2. Two-Stage Probabilistic Sampling

3. Configuration

4. Bug Fix: CLI Flag Wiring

Implementation Details

Files Modified

Key Design Decisions

Test Coverage

Verification

Backward Compatibility

Dependencies

Future Work

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant