[Bugfix] Fix prefill progress tracking for chunked prefill preemption#5
Merged
[Bugfix] Fix prefill progress tracking for chunked prefill preemption#5
Conversation
The prefill hi-water mark was only updated when scheduling from the waiting queue, not when scheduling from the running queue. This caused chunked prefill requests (scheduled multiple times across iterations) to report stale progress in PREEMPTED events. Fix: Centralize hiwater update in _update_after_schedule() to cover all scheduling paths (waiting, running, resume). This ensures accurate prefill progress tracking for chunked prefill scenarios. Tests: Add comprehensive tests for chunked prefill hiwater updates, preemption accuracy, and monotonicity invariants. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This was referenced Jan 26, 2026
sriumcp
added a commit
that referenced
this pull request
Jan 27, 2026
Add span tracking dict and helper methods to OpenAIServing for future API-side journey tracing. This PR adds infrastructure only - no spans are created or managed yet (deferred to PR #6). Changes: - Add _api_spans dict to track (span, arrival_time, first_response_time) - Add _cached_is_tracing_enabled flag with defensive error handling - Add helper methods: _get_is_tracing_enabled, _store_api_span, _get_api_span_info, _cleanup_api_span - Add 8 comprehensive unit tests for dict operations and caching All state is private and lives outside Pydantic models to avoid serialization issues. Safe to merge independently - introduces no lifecycle obligations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 27, 2026
Mark PR #5 as completed with implementation summary: - Status: Completed (PR #13 merged) - Branch: pr5ofjourney - Size: 67 lines production code, 177 lines test code - Tests: 8 comprehensive tests, all passing - Implementation details: Added _api_spans dict, helper methods, and defensive error handling Updated code examples to match actual implementation including error handling in _get_is_tracing_enabled() method.
sriumcp
added a commit
that referenced
this pull request
Jan 27, 2026
Reduce Implementation History from 229 lines to 53 lines (~77% reduction): - Condensed PR #0-#5 to essential info only (commit, PR number, line stats) - Removed verbose implementation details (already in individual PR sections) - Added PR #5 completion entry - Added summary line: 5 PRs completed, ~350 production lines, ~1022 test lines, 31 tests Total file size: 2266 → 2089 lines (177 line reduction)
sriumcp
added a commit
that referenced
this pull request
Jan 27, 2026
* [Feature] Add API span tracking infrastructure (PR #5/9) Add span tracking dict and helper methods to OpenAIServing for future API-side journey tracing. This PR adds infrastructure only - no spans are created or managed yet (deferred to PR #6). Changes: - Add _api_spans dict to track (span, arrival_time, first_response_time) - Add _cached_is_tracing_enabled flag with defensive error handling - Add helper methods: _get_is_tracing_enabled, _store_api_span, _get_api_span_info, _cleanup_api_span - Add 8 comprehensive unit tests for dict operations and caching All state is private and lives outside Pydantic models to avoid serialization issues. Safe to merge independently - introduces no lifecycle obligations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Update journey tracing plan with PR #5 completion Mark PR #5 as completed with implementation summary: - Status: Completed (PR #13 merged) - Branch: pr5ofjourney - Size: 67 lines production code, 177 lines test code - Tests: 8 comprehensive tests, all passing - Implementation details: Added _api_spans dict, helper methods, and defensive error handling Updated code examples to match actual implementation including error handling in _get_is_tracing_enabled() method. * [Docs] Condense Implementation History section Reduce Implementation History from 229 lines to 53 lines (~77% reduction): - Condensed PR #0-#5 to essential info only (commit, PR number, line stats) - Removed verbose implementation details (already in individual PR sections) - Added PR #5 completion entry - Added summary line: 5 PRs completed, ~350 production lines, ~1022 test lines, 31 tests Total file size: 2266 → 2089 lines (177 line reduction) --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This was referenced Jan 27, 2026
sriumcp
added a commit
that referenced
this pull request
Jan 29, 2026
Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
sriumcp
added a commit
that referenced
this pull request
Jan 29, 2026
Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860
sriumcp
added a commit
that referenced
this pull request
Jan 29, 2026
…ty (PR #5) (#27) * [Feature] Add rich request snapshot stream (PR #5) Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Update step tracing plan with implementation progress Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860 --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a bug where prefill progress tracking was incomplete for chunked prefill requests that get preempted after multiple scheduling iterations. This caused
PREEMPTEDjourney events to show stale prefill progress (e.g., 256 tokens instead of 768 tokens actually scheduled).Problem
The prefill hi-water mark (
_journey_prefill_hiwater) was only updated when scheduling requests from the waiting queue, not from the running queue.For chunked prefill (when
long_prefill_token_thresholdis set), requests are:This affected:
long_prefill_token_threshold)Solution
Centralized the hiwater update in
_update_after_schedule()method, which runs for ALL scheduling paths:This ensures hiwater accurately tracks cumulative prefill progress across all scheduling iterations.
Key changes:
_update_after_schedule()(lines 933-940 in scheduler.py)num_computed_tokensis incremented, ensuring correct valuesTesting
Added 3 comprehensive tests covering the bug and edge cases:
test_chunked_prefill_running_queue_hiwater: Verifies hiwater updates across multiple running queue scheduling iterations (chunked prefill)test_chunked_prefill_preemption_accurate_progress: Verifies PREEMPTED events show correct prefill progress after multiple chunks (core bug fix validation)test_hiwater_monotonic_across_preemption_cycles: Verifies hiwater survives preemption and never decreasesUpdated 1 existing test to reflect new timing (hiwater updated after event emission).
Test results:
Impact
Example
Before fix:
After fix:
🤖 Generated with Claude Code