Skip to content

[Feature] Add journey state cleanup to scheduler (PR #3/9)#11

Merged
sriumcp merged 2 commits intomainfrom
pr3ofjourney
Jan 27, 2026
Merged

[Feature] Add journey state cleanup to scheduler (PR #3/9)#11
sriumcp merged 2 commits intomainfrom
pr3ofjourney

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 27, 2026

Overview

Extends the centralized cleanup method to handle journey tracing state alongside core span cleanup. Fixes memory leak on natural completion path.

Problem Solved

Memory leak identified: Journey state (_first_token_emitted, _journey_prefill_hiwater) was not being cleaned up on natural completion path, only on explicit abort path.

Solution

Centralized all cleanup in _end_core_span_and_cleanup() with proper decoupling:

Changes

Production Code (vllm/v1/core/sched/scheduler.py)

  • Extended _end_core_span_and_cleanup() with decoupled cleanup logic
  • Removed duplicate inline cleanup from finish_requests()
  • Added comprehensive docstring explaining the two independent cleanup concerns

Tests (tests/v1/core/test_scheduler.py)

Added 4 comprehensive tests:

  1. test_journey_state_created - Verify state initialization
  2. test_journey_state_cleaned_on_finish - Explicit abort cleanup
  3. test_journey_state_cleaned_on_completion - Natural completion cleanup ⭐ Critical test
  4. test_no_state_leak - No accumulation over 20 iterations (focused unit test)

Test Results

✅ All 95 tests passing (4 new + 91 existing)
✅ No regressions
✅ Memory leak fixed
✅ Both termination paths verified

Key Design Principles

  • Decoupling: Span cleanup NEVER gated behind feature flags
  • Centralization: Single method handles all cleanup
  • Idempotency: Safe to call multiple times
  • Defensive: Cleanup runs even on exceptions

Part of Journey Tracing Initiative

This is PR #3 of 9 in the dual-stream journey tracing implementation plan:

Review Notes

  • Production code reviewed and approved by ChatGPT (2 rounds)
  • Tests improved based on feedback: simpler, faster, more focused
  • No breaking changes
  • Backward compatible

🤖 Generated with Claude Code

sriumcp and others added 2 commits January 26, 2026 19:27
Extends the centralized cleanup method to handle journey tracing state
alongside core span cleanup. Fixes memory leak on natural completion path.

Changes:
- Extend _end_core_span_and_cleanup() with decoupled cleanup logic
  - Cleanup #1: Core spans (always runs, independent of flags)
  - Cleanup #2: Journey state (only if journey tracing enabled)
- Remove duplicate inline cleanup from finish_requests()
- Add 4 tests verifying state cleanup on all termination paths

Tests:
- test_journey_state_created: Verify state initialization
- test_journey_state_cleaned_on_finish: Explicit abort cleanup
- test_journey_state_cleaned_on_completion: Natural completion cleanup
- test_no_state_leak: No accumulation over 20 iterations

All 95 tests passing (4 new + 91 existing).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updates:
- Mark PR #3 as COMPLETED in PR sequence summary
- Update PR dependencies to show PR #3 complete
- Add PR #3 to Implementation History section with full details
- Document commit hash (f4cf790) and PR number (vllm-project#33126)
- Record test results, code review process, and key achievements

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sriumcp sriumcp merged commit 35c46c3 into main Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant