Skip to content

fix(l1_manager): propagate extra_count through prefetch path to prevent premature eviction#2725

Merged
maobaolong merged 1 commit intoLMCache:devfrom
liuyumoye:dev
Mar 11, 2026
Merged

fix(l1_manager): propagate extra_count through prefetch path to prevent premature eviction#2725
maobaolong merged 1 commit intoLMCache:devfrom
liuyumoye:dev

Conversation

@liuyumoye
Copy link
Copy Markdown
Contributor

@liuyumoye liuyumoye commented Mar 10, 2026

fix(l1_manager): propagate extra_count through prefetch path to prevent premature eviction

What this PR does / why we need it:
When a prefetch request completes and loaded keys transition from write-locked to read-locked, finish_write_and_reserve_read was always acquiring exactly 1 read lock per key, regardless of the extra_count passed by the caller. This caused premature eviction in Tensor Parallelism (TP > 1) scenarios (e.g. MLA models), where each TP worker needs to consume its own read lock — but only 1 lock was held, so the first worker's finish_read would drop the ref-count to 0 and trigger eviction before the remaining workers could access the data.

This PR propagates extra_count through the entire prefetch path:

StorageManager.submit_prefetch_request → PrefetchController.submit_prefetch_request → _submission_queue / _pending_queue → _start_lookup_phase → InFlightPrefetchRequest.extra_count

On prefetch completion: finish_write_and_reserve_read(loaded_keys, extra_count=request.extra_count) acquires 1 + extra_count read locks per key

Non-prefix loaded keys that are immediately released also pass extra_count to finish_read

L1Manager.finish_write_and_reserve_read now accepts extra_count and calls entry.read_lock.lock() a total of 1 + extra_count times per key

Special notes for your reviewers:
The extra_count semantics are unchanged from the existing submit_prefetch_task / finish_read contract — this PR only ensures the value is correctly threaded through the prefetch pipeline rather than silently dropped.

_pending_queue and _submission_queue tuple types are widened from (..., MemoryLayoutDesc) to (..., MemoryLayoutDesc, int)

No behavior change when extra_count=0 (the default), so existing single-TP deployments are unaffected.

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue causing premature eviction of prefetched keys in distributed environments utilizing Tensor Parallelism (TP > 1). Previously, only a single read lock was acquired per key upon prefetch completion, leading to data eviction before all TP workers could access it. The fix involves systematically propagating the extra_count parameter, which specifies additional required read locks, through the entire prefetch pipeline. This ensures that the correct number of read locks is held for each key, preventing premature eviction and maintaining data availability for all consuming workers.

Highlights

  • L1Manager Lock Acquisition: The L1Manager.finish_write_and_reserve_read method now accepts an extra_count parameter, enabling it to acquire 1 + extra_count read locks per key, addressing premature eviction in Tensor Parallelism (TP > 1) scenarios.
  • Prefetch Path Propagation: The extra_count parameter is now correctly propagated through the entire prefetch pipeline, including StorageManager.submit_prefetch_request, PrefetchController's internal queues (_submission_queue, _pending_queue), InFlightPrefetchRequest, and _start_lookup_phase.
  • InFlightPrefetchRequest Update: The InFlightPrefetchRequest data structure was updated to include the extra_count field, ensuring that the required number of locks is tracked throughout the prefetch process.
  • No Behavior Change for Default Cases: The changes introduce no behavior modification when extra_count is 0 (the default), ensuring existing single-TP deployments remain unaffected.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lmcache/v1/distributed/l1_manager.py
    • Updated finish_write_and_reserve_read method signature to include extra_count parameter.
    • Added validation for the extra_count parameter.
    • Modified the read lock acquisition logic to loop 1 + extra_count times for each key.
  • lmcache/v1/distributed/storage_controllers/prefetch_controller.py
    • Extended InFlightPrefetchRequest dataclass with a new extra_count field.
    • Updated type hints for _pending_queue and _submission_queue to include extra_count in their tuple structures.
    • Modified submit_prefetch_request method signature to accept extra_count.
    • Updated _start_lookup_phase method signature to accept extra_count.
    • Ensured _finalize_load passes request.extra_count to l1_mgr.finish_write_and_reserve_read and l1_mgr.finish_read.
  • lmcache/v1/distributed/storage_manager.py
    • Modified submit_prefetch_task to pass the extra_count argument to the _prefetch_controller.submit_prefetch_request call.
Activity
  • No human activity (comments, reviews, etc.) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug causing premature eviction in multi-worker scenarios by propagating the extra_count parameter through the prefetch path. However, it misses a critical propagation point in the read_prefetched_results context manager, which could lead to lock leaks. Additionally, the lack of type validation for the new parameter at API boundaries poses a risk of crashing background processing threads if malformed input is provided.

request_id = self._prefetch_controller.submit_prefetch_request(
remaining_keys,
layout_desc,
extra_count=extra_count,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The propagation of the extra_count parameter is incomplete. While it is correctly passed to the PrefetchController in submit_prefetch_task, the read_prefetched_results context manager (lines 173-233) was not updated to accept or use this parameter.

In its finally block (line 230), it calls self._l1_manager.finish_read(good_keys) with the default extra_count=0. If the objects were originally prefetched with an extra_count > 0, this call will only release one of the multiple read locks acquired. This results in a lock leak, which prevents the objects from being evicted or deleted from the L1 cache, eventually leading to resource exhaustion and a Denial of Service of the cache manager.

Comment on lines +540 to +541
extra_count = _validate_extra_count(extra_count)
total = 1 + extra_count
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The extra_count parameter is not validated to be an integer before being used in calculations and loops. While _validate_extra_count performs range checking, it does not verify the type.

In the PrefetchController, this value is queued and processed by a background thread. If an invalid type (e.g., a string or a float) is passed to the public submit_prefetch_request API, it will eventually cause the background thread to crash with a TypeError when it attempts to perform comparisons (line 101), string formatting in the logger (line 103), or use the value in range() (line 569). A crash in the background loop disables prefetch functionality for the entire application. It is recommended to validate that extra_count is a non-negative integer at the entry points of the public APIs.

@ApostaC
Copy link
Copy Markdown
Contributor

ApostaC commented Mar 10, 2026

Hey @liuyumoye , can you fix the DCO issue by doing?

git commit --amend --signoff
git push -f

Thanks!

@liuyumoye
Copy link
Copy Markdown
Contributor Author

Hey @liuyumoye , can you fix the DCO issue by doing?

git commit --amend --signoff
git push -f

Thanks!

Thanks for the reminder! Fixed the DCO issue and force-pushed.

@liuyumoye liuyumoye reopened this Mar 10, 2026
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some unit test to test the extra_count path for the prefetch controller?

Otherwise LGTM!

@liuyumoye
Copy link
Copy Markdown
Contributor Author

liuyumoye commented Mar 10, 2026

Can we add some unit test to test the extra_count path for the prefetch controller?

Otherwise LGTM!

Thanks for the review! I've added a TestExtraCountPrefetch test class in tests/v1/distributed/test_prefetch_controller.py covering the extra_count path:

  • test_extra_count_zero_default_behavior – verifies extra_count=0 degrades to the original single-lock behavior: 1 lock acquired, 1 finish_read fully releases it

  • test_extra_count_one_requires_two_finish_reads – verifies extra_count=1 acquires 2 locks; the key remains readable after the 1st finish_read and is only released after the 2nd

  • test_extra_count_three_requires_four_finish_reads – same logic with extra_count=3 (4 locks total); the key stays readable through 3 finish_read calls and is released on the 4th

  • test_extra_count_with_prefix_trim – verifies that when L2 has a gap, only the contiguous prefix keys get 1 + extra_count locks; keys beyond the gap are never loaded into L1

  • test_extra_count_non_prefix_loaded_keys_fully_released – verifies that _finalize_load correctly releases all 1 + extra_count locks for keys that were loaded but fall outside the prefix, preventing lock leaks

@maobaolong
Copy link
Copy Markdown
Collaborator

@liuyumoye Maybe you have to update your branch to the dev, otherwise, the UT will not passed.

…nt premature eviction

Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @liuyumoye Thanks for this fix.

@maobaolong maobaolong enabled auto-merge (squash) March 11, 2026 02:12
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 11, 2026
@maobaolong maobaolong merged commit 038cd7c into LMCache:dev Mar 11, 2026
26 of 29 checks passed
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
…nt premature eviction (LMCache#2725)

Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>
Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026
…nt premature eviction (LMCache#2725)

Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>
Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>
Signed-off-by: Aaron Wu <aaron.wu@dell.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…nt premature eviction (LMCache#2725)

Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>
Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…nt premature eviction (LMCache#2725)

Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>
Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants