Skip to content

[Feature] Add rich request snapshot stream for step-level observability (PR #5)#27

Merged
sriumcp merged 2 commits intomainfrom
pr5ofstepstream
Jan 29, 2026
Merged

[Feature] Add rich request snapshot stream for step-level observability (PR #5)#27
sriumcp merged 2 commits intomainfrom
pr5ofstepstream

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 29, 2026

Summary

This PR implements PR #5: Rich Request Snapshot Stream from the step-level tracing plan, adding per-request observability events within sampled scheduler steps.

Part of the step-level tracing series:

What This PR Adds

1. Per-Request Snapshot Events

Emits step.REQUEST_SNAPSHOT events containing detailed state for each running request:

  • Request identity: request.id
  • Execution phase: request.phase (PREFILL/DECODE)
  • Token counts: prompt, computed, output tokens
  • Scheduling info: tokens scheduled this step, preemption count
  • KV cache metrics: allocated blocks, cached blocks, effective prompt length

2. Two-Stage Probabilistic Sampling

Implements hierarchical sampling to control event volume:

  1. Step sampling (default 1%): Must first sample the batch summary
  2. Rich subsampling (default 0.1%): Only 10% of sampled steps get per-request snapshots

Result: ~0.01% of steps emit detailed snapshots (10x reduction from batch summaries)

3. Configuration

New CLI flag and config field:

vllm serve model --step-tracing-enabled \
                  --step-tracing-sample-rate 0.01 \
                  --step-tracing-rich-subsample-rate 0.001

4. Bug Fix: CLI Flag Wiring

Critical fix: Discovered that PR #3's CLI flags were parsed but never wired to ObservabilityConfig. This PR fixes:

  • --step-tracing-enabled
  • --step-tracing-sample-rate
  • --step-tracing-rich-subsample-rate (new)

All three flags now properly flow: CLI → EngineArgsObservabilityConfig

Implementation Details

Files Modified

  • vllm/config/observability.py: Add step_tracing_rich_subsample_rate field (default 0.001)
  • vllm/tracing.py: Add 10 new SpanAttributes constants for per-request metrics
  • vllm/v1/core/sched/scheduler.py:
    • Add _emit_rich_request_snapshots() method
    • Implement two-stage emission logic at end of schedule()
  • vllm/engine/arg_utils.py: Wire all 3 step tracing CLI flags to config
  • tests/v1/core/utils.py: Add rich subsample rate parameter to test utilities
  • tests/v1/core/test_step_tracing.py: Add 6 new tests (5 rich snapshot + 1 CLI wiring)

Key Design Decisions

Emission Point: After batch summary, before _update_after_schedule()

  • Guarantees scheduler.running represents the executed batch
  • Matches exact state used to construct SchedulerOutput

Source of Truth: scheduler.running at SchedulerOutput construction

  • Stable queue: no modifications between construction and emission
  • Accurate representation of what the scheduler actually executed

KV Metrics: Uses PR #4's get_per_request_kv_metrics() utility

  • Defensive: never raises exceptions, returns minimal metrics on failure
  • Accesses internal req_to_blocks and num_cached_block structures

Phase Detection: num_output_tokens == 0 → PREFILL, else DECODE

  • num_output_tokens is a @property that computes len(_output_token_ids)
  • Always live, no staleness risk

Test Coverage

Added 6 comprehensive tests (all passing ✅):

  1. test_rich_snapshot_rate_zero: Verifies 0.0 rate produces no events
  2. test_rich_snapshot_enabled: Verifies 1.0 rate emits all running requests
  3. test_rich_snapshot_gated_on_batch_summary: Verifies two-stage sampling gate
  4. test_rich_snapshot_deterministic_sampling: Uses deterministic sampler for reproducibility
  5. test_rich_snapshot_with_zero_running_requests: Handles empty running queue gracefully
  6. test_step_tracing_cli_wiring: Regression test for CLI flag → config wiring

All step tracing tests passing (10 from PR #3 + 6 new).

Verification

# Run step tracing tests
pytest tests/v1/core/test_step_tracing.py -v

# Run CLI wiring test specifically  
pytest tests/v1/core/test_step_tracing.py::test_step_tracing_cli_wiring -v

Backward Compatibility

  • New config field has sensible default (0.001 = 0.1%)
  • New CLI flag is optional
  • No changes to existing APIs
  • AsyncScheduler inherits functionality automatically (no override needed)

Dependencies

Depends on:

Future Work

These per-request snapshots enable:

  • Request-level performance analysis
  • KV cache efficiency tracking per request
  • Correlation between batch-level and request-level metrics
  • Understanding of preemption impact on individual requests

🤖 Generated with Claude Code

sriumcp and others added 2 commits January 29, 2026 06:15
Implements subsampled per-request detailed progress events with KV metrics:

- Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%)
- Emit step.REQUEST_SNAPSHOT events for running requests when subsampled
- Use PR #4 get_per_request_kv_metrics() for KV cache data
- Two-stage sampling: batch summary sampled AND rich subsampled
- SpanAttributes: 10 new constants for per-request metrics
- Emission after batch summary, before _update_after_schedule()

Also fixes PR #3 CLI wiring bug:
- Wire step_tracing_enabled/sample_rate through EngineArgs
- Add fields to EngineArgs dataclass
- Pass to ObservabilityConfig constructor
- Add test_step_tracing_cli_wiring() for regression prevention

Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Refresh plan to capture completed PRs #3, #4, #5 with accurate history:

Progress tracking:
- Add Implementation Progress section with status table
- Mark PR #3, #4, #5 as complete with commit hashes
- Mark PR #1, #2 as deferred (low priority, orthogonal)
- Update dependency graph with status indicators

Historical corrections:
- PR #3: CLI args defined but wiring missing (fixed in PR #5)
- PR #5: Added CLI wiring fix for all 3 step tracing flags
- Add NOTE in PR #3 section about wiring gap
- Update PR #5 behavioral contract to document CLI fix

Technical corrections:
- Fix output tokens source: len(_output_token_ids) → num_output_tokens (property)
- Update test file references: test_scheduler.py → test_step_tracing.py
- Change test count "15/15" → "test suite passing" (future-proof)

Verification updates:
- Mark all PR #3, #4, #5 checklist items as complete
- Add CLI wiring regression test item to PR #5 checklist

Current state: PR #5 ready for merge at commit f951860
@sriumcp sriumcp merged commit d10ad59 into main Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant