[Feature] Add rich request snapshot stream for step-level observability (PR #5)#27
Merged
[Feature] Add rich request snapshot stream for step-level observability (PR #5)#27
Conversation
Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements PR #5: Rich Request Snapshot Stream from the step-level tracing plan, adding per-request observability events within sampled scheduler steps.
Part of the step-level tracing series:
What This PR Adds
1. Per-Request Snapshot Events
Emits
step.REQUEST_SNAPSHOTevents containing detailed state for each running request:request.idrequest.phase(PREFILL/DECODE)2. Two-Stage Probabilistic Sampling
Implements hierarchical sampling to control event volume:
Result: ~0.01% of steps emit detailed snapshots (10x reduction from batch summaries)
3. Configuration
New CLI flag and config field:
vllm serve model --step-tracing-enabled \ --step-tracing-sample-rate 0.01 \ --step-tracing-rich-subsample-rate 0.0014. Bug Fix: CLI Flag Wiring
Critical fix: Discovered that PR #3's CLI flags were parsed but never wired to
ObservabilityConfig. This PR fixes:--step-tracing-enabled--step-tracing-sample-rate--step-tracing-rich-subsample-rate(new)All three flags now properly flow: CLI →
EngineArgs→ObservabilityConfigImplementation Details
Files Modified
step_tracing_rich_subsample_ratefield (default 0.001)SpanAttributesconstants for per-request metrics_emit_rich_request_snapshots()methodschedule()Key Design Decisions
Emission Point: After batch summary, before
_update_after_schedule()scheduler.runningrepresents the executed batchSchedulerOutputSource of Truth:
scheduler.runningat SchedulerOutput constructionKV Metrics: Uses PR #4's
get_per_request_kv_metrics()utilityreq_to_blocksandnum_cached_blockstructuresPhase Detection:
num_output_tokens == 0→ PREFILL, else DECODEnum_output_tokensis a@propertythat computeslen(_output_token_ids)Test Coverage
Added 6 comprehensive tests (all passing ✅):
All step tracing tests passing (10 from PR #3 + 6 new).
Verification
Backward Compatibility
Dependencies
Depends on:
scheduler_stepcounter and step span infrastructureget_per_request_kv_metrics()utilityFuture Work
These per-request snapshots enable:
🤖 Generated with Claude Code