[Bugfix] Fix prefill progress tracking for chunked prefill preemption by sriumcp · Pull Request #5 · inference-sim/vllm

sriumcp · 2026-01-24T15:45:38Z

Summary

Fixes a bug where prefill progress tracking was incomplete for chunked prefill requests that get preempted after multiple scheduling iterations. This caused PREEMPTED journey events to show stale prefill progress (e.g., 256 tokens instead of 768 tokens actually scheduled).

Problem

The prefill hi-water mark (_journey_prefill_hiwater) was only updated when scheduling requests from the waiting queue, not from the running queue.

For chunked prefill (when long_prefill_token_threshold is set), requests are:

Scheduled from waiting queue for first chunk (e.g., 256 tokens) → hiwater updated ✓
Moved to running queue with status=RUNNING
Scheduled again from running queue for next chunk (e.g., 256 more tokens) → hiwater NOT updated ✗
If preempted at this point, the PREEMPTED event shows only 256 tokens progress instead of 512+

This affected:

Workloads with long prompts (> long_prefill_token_threshold)
High preemption pressure scenarios
OpenTelemetry trace consumers relying on accurate prefill progress

Solution

Centralized the hiwater update in _update_after_schedule() method, which runs for ALL scheduling paths:

Waiting queue schedules
Running queue schedules (chunked prefill continuation)
Resumed requests after preemption

This ensures hiwater accurately tracks cumulative prefill progress across all scheduling iterations.

Key changes:

Added hiwater update logic to _update_after_schedule() (lines 933-940 in scheduler.py)
Removed redundant update from waiting queue loop to avoid timing inconsistencies
Update happens after num_computed_tokens is incremented, ensuring correct values

Testing

Added 3 comprehensive tests covering the bug and edge cases:

test_chunked_prefill_running_queue_hiwater: Verifies hiwater updates across multiple running queue scheduling iterations (chunked prefill)
test_chunked_prefill_preemption_accurate_progress: Verifies PREEMPTED events show correct prefill progress after multiple chunks (core bug fix validation)
test_hiwater_monotonic_across_preemption_cycles: Verifies hiwater survives preemption and never decreases

Updated 1 existing test to reflect new timing (hiwater updated after event emission).

Test results:

✅ All 11 journey event tests pass
✅ All 81 scheduler tests pass
✅ No regressions

Impact

Backward compatible: No API/schema changes, only data accuracy improvements
Performance: Near-zero overhead when journey tracing is disabled (single boolean check)
Correctness: PREEMPTED events now show accurate prefill progress for chunked prefill scenarios
Coverage: Fixes apply to all request types (standard, prefix-cached, KV-transferred, speculative)

Example

Before fix:

Iteration 1: Schedule 256 tokens → hiwater = 256
Iteration 2: Schedule 256 tokens → hiwater = 256 (stale)
Iteration 3: Schedule 256 tokens → hiwater = 256 (stale)
Preemption → PREEMPTED event shows 256/1000 tokens (25.6% progress) ❌

After fix:

Iteration 1: Schedule 256 tokens → hiwater = 256
Iteration 2: Schedule 256 tokens → hiwater = 512
Iteration 3: Schedule 256 tokens → hiwater = 768
Preemption → PREEMPTED event shows 768/1000 tokens (76.8% progress) ✓

🤖 Generated with Claude Code

The prefill hi-water mark was only updated when scheduling from the waiting queue, not when scheduling from the running queue. This caused chunked prefill requests (scheduled multiple times across iterations) to report stale progress in PREEMPTED events. Fix: Centralize hiwater update in _update_after_schedule() to cover all scheduling paths (waiting, running, resume). This ensures accurate prefill progress tracking for chunked prefill scenarios. Tests: Add comprehensive tests for chunked prefill hiwater updates, preemption accuracy, and monotonicity invariants. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add span tracking dict and helper methods to OpenAIServing for future API-side journey tracing. This PR adds infrastructure only - no spans are created or managed yet (deferred to PR #6). Changes: - Add _api_spans dict to track (span, arrival_time, first_response_time) - Add _cached_is_tracing_enabled flag with defensive error handling - Add helper methods: _get_is_tracing_enabled, _store_api_span, _get_api_span_info, _cleanup_api_span - Add 8 comprehensive unit tests for dict operations and caching All state is private and lives outside Pydantic models to avoid serialization issues. Safe to merge independently - introduces no lifecycle obligations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Mark PR #5 as completed with implementation summary: - Status: Completed (PR #13 merged) - Branch: pr5ofjourney - Size: 67 lines production code, 177 lines test code - Tests: 8 comprehensive tests, all passing - Implementation details: Added _api_spans dict, helper methods, and defensive error handling Updated code examples to match actual implementation including error handling in _get_is_tracing_enabled() method.

Reduce Implementation History from 229 lines to 53 lines (~77% reduction): - Condensed PR #0-#5 to essential info only (commit, PR number, line stats) - Removed verbose implementation details (already in individual PR sections) - Added PR #5 completion entry - Added summary line: 5 PRs completed, ~350 production lines, ~1022 test lines, 31 tests Total file size: 2266 → 2089 lines (177 line reduction)

* [Feature] Add API span tracking infrastructure (PR #5/9) Add span tracking dict and helper methods to OpenAIServing for future API-side journey tracing. This PR adds infrastructure only - no spans are created or managed yet (deferred to PR #6). Changes: - Add _api_spans dict to track (span, arrival_time, first_response_time) - Add _cached_is_tracing_enabled flag with defensive error handling - Add helper methods: _get_is_tracing_enabled, _store_api_span, _get_api_span_info, _cleanup_api_span - Add 8 comprehensive unit tests for dict operations and caching All state is private and lives outside Pydantic models to avoid serialization issues. Safe to merge independently - introduces no lifecycle obligations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Update journey tracing plan with PR #5 completion Mark PR #5 as completed with implementation summary: - Status: Completed (PR #13 merged) - Branch: pr5ofjourney - Size: 67 lines production code, 177 lines test code - Tests: 8 comprehensive tests, all passing - Implementation details: Added _api_spans dict, helper methods, and defensive error handling Updated code examples to match actual implementation including error handling in _get_is_tracing_enabled() method. * [Docs] Condense Implementation History section Reduce Implementation History from 229 lines to 53 lines (~77% reduction): - Condensed PR #0-#5 to essential info only (commit, PR number, line stats) - Removed verbose implementation details (already in individual PR sections) - Added PR #5 completion entry - Added summary line: 5 PRs completed, ~350 production lines, ~1022 test lines, 31 tests Total file size: 2266 → 2089 lines (177 line reduction) --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860

…ty (PR #5) (#27) * [Feature] Add rich request snapshot stream (PR #5) Implements subsampled per-request detailed progress events with KV metrics: - Add step_tracing_rich_subsample_rate config (default 0.001 = 0.1%) - Emit step.REQUEST_SNAPSHOT events for running requests when subsampled - Use PR #4 get_per_request_kv_metrics() for KV cache data - Two-stage sampling: batch summary sampled AND rich subsampled - SpanAttributes: 10 new constants for per-request metrics - Emission after batch summary, before _update_after_schedule() Also fixes PR #3 CLI wiring bug: - Wire step_tracing_enabled/sample_rate through EngineArgs - Add fields to EngineArgs dataclass - Pass to ObservabilityConfig constructor - Add test_step_tracing_cli_wiring() for regression prevention Tests: 6 new tests (5 rich snapshot + 1 CLI wiring), all 15 pass Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [Docs] Update step tracing plan with implementation progress Refresh plan to capture completed PRs #3, #4, #5 with accurate history: Progress tracking: - Add Implementation Progress section with status table - Mark PR #3, #4, #5 as complete with commit hashes - Mark PR #1, #2 as deferred (low priority, orthogonal) - Update dependency graph with status indicators Historical corrections: - PR #3: CLI args defined but wiring missing (fixed in PR #5) - PR #5: Added CLI wiring fix for all 3 step tracing flags - Add NOTE in PR #3 section about wiring gap - Update PR #5 behavioral contract to document CLI fix Technical corrections: - Fix output tokens source: len(_output_token_ids) → num_output_tokens (property) - Update test file references: test_scheduler.py → test_step_tracing.py - Change test count "15/15" → "test suite passing" (future-proof) Verification updates: - Mark all PR #3, #4, #5 checklist items as complete - Add CLI wiring regression test item to PR #5 checklist Current state: PR #5 ready for merge at commit f951860 --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

sriumcp merged commit a8f3dfc into main Jan 24, 2026

This was referenced Jan 26, 2026

[Feature] Initialize OTEL tracer in scheduler for journey tracing (PR #1/9) #8

Merged

[Feature] Add journey state cleanup to scheduler (PR #3/9) #11

Merged

[Feature] Emit journey events to core spans (PR #4/9) #12

Merged

sriumcp mentioned this pull request Jan 27, 2026

[Feature] Add API span tracking infrastructure (PR #5/9) #13

Merged

This was referenced Jan 27, 2026

[Feature] Add API parent span lifecycle management (PR #6/9) #14

Merged

[Feature] Remove journey event buffering (PR #9/9) #17

Merged

Journey Tracing: Complete Implementation (PRs #0-#9) + Regression Audit #18

Merged

sriumcp mentioned this pull request Jan 29, 2026

[Feature] Add rich request snapshot stream for step-level observability (PR #5) #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix prefill progress tracking for chunked prefill preemption#5

[Bugfix] Fix prefill progress tracking for chunked prefill preemption#5
sriumcp merged 1 commit intomainfrom
candidateissues

sriumcp commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sriumcp commented Jan 24, 2026

Summary

Problem

Solution

Testing

Impact

Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant