Skip to content

[Add] L2 Prefetch Controller and StorageManager integration#2667

Merged
ApostaC merged 2 commits intoLMCache:devfrom
ApostaC:local-dev/l2-adapter-5
Mar 3, 2026
Merged

[Add] L2 Prefetch Controller and StorageManager integration#2667
ApostaC merged 2 commits intoLMCache:devfrom
ApostaC:local-dev/l2-adapter-5

Conversation

@ApostaC
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC commented Mar 2, 2026

Summary

  • PrefetchController: New event-driven background controller that asynchronously prefetches KV cache data from L2 adapters into L1 memory. Uses a two-phase state machine (LOOKUP → PLAN_AND_LOAD) with select.poll() on adapter eventfds, max-in-flight request limiting, and comprehensive L1/L2 lock management.
  • PrefetchPolicy ABC + DefaultPrefetchPolicy: Policy interface for deciding which adapter loads which key when multiple adapters have the same data. Default policy assigns each key to the first (lowest-indexed) adapter that has it.
  • StorageManager integration: Wires PrefetchController into StorageManager. submit_prefetch_task now checks L1 for prefix hits first, then delegates remaining keys to PrefetchController for L2 prefetch. query_prefetch_status combines L1 + L2 results with latency logging.
  • L1Manager.finish_write_and_reserve_read(): Atomic write-to-read lock transition, preventing eviction between L2 load completion and read lock acquisition.
  • L1Manager.clear(force=False): Safe clear that skips locked objects by default, protecting in-flight store/prefetch operations.
  • L2 adapter eventfd contract: Added docstrings requiring globally unique event fds across all adapters and operation types.
  • Design doc: l2_adapters/DESIGN.md documenting the full store/prefetch controller architecture, data flows, lock invariants, and assumptions.

Key Design Decisions

  • Prefix-only loading: Only the contiguous prefix of found keys is loaded from L2. Gaps break the prefix since vLLM requires contiguous KV cache. This is enforced by trim_load_plan_to_prefix().
  • Two-phase L2 unlock: Phase 1 unlocks keys not in the load plan (after planning); Phase 2 unlocks all plan keys (after load completes). Ensures L2 locks are always released.
  • Temporary L1 write buffers: Prefetch write reservations use is_temporary=True to allow eviction controller reclaim if needed.
  • Fire-and-forget unlock: submit_unlock is never retried by the controller — the adapter must guarantee eventual success internally.

Files Changed

File Change
storage_controllers/prefetch_controller.py New: full PrefetchController implementation (675 lines)
storage_controllers/prefetch_policy.py New: PrefetchPolicy ABC + DefaultPrefetchPolicy
storage_controllers/__init__.py Export PrefetchController
storage_manager.py Wire PrefetchController, update submit/query/close, add timing logs, safe clear()
l1_manager.py Add finish_write_and_reserve_read(), safe clear(force)
l2_adapters/base.py Add eventfd uniqueness contract docstrings
l2_adapters/DESIGN.md New: architecture and invariants design doc
tests/.../test_prefetch_controller.py New: 18 unit tests for PrefetchController
tests/.../test_distributed_storage_manager.py Add 5 L2 prefetch integration tests

Test Plan

  • pytest -xvs tests/v1/distributed/test_prefetch_controller.py — 18 tests covering lifecycle, single/multi adapter prefetch, overlap dedup, no-hits, max-in-flight queuing, partial load failure, query result pop semantics
  • pytest -xvs tests/v1/distributed/test_distributed_storage_manager.py — 5 new integration tests: L2 round-trip, mixed L1/L2 hits, no L2 hits, partial prefix, L1+L2 continuation
  • pytest -xvs tests/v1/distributed/ — all 208 tests pass
  • E2E test with vLLM + LMCache MP server + mock L2 adapter: 3.7x TTFT speedup (0.910s cold → 0.246s warm)

Signed-off-by: ApostaC <yihua98@uchicago.edu>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the caching system by introducing an L2 Prefetch Controller, enabling asynchronous prefetching of KV cache data from L2 storage into L1 memory. This integration with the StorageManager aims to improve performance by proactively loading data, reducing latency for subsequent requests. The changes include a new event-driven controller, a flexible prefetch policy, and critical updates to L1 and L2 management to ensure data consistency and efficient resource utilization during prefetch operations.

Highlights

  • L2 Prefetch Controller: Introduced a new event-driven background controller that asynchronously prefetches KV cache data from L2 adapters into L1 memory. It uses a two-phase state machine (LOOKUP → PLAN_AND_LOAD), limits in-flight requests, and manages L1/L2 locks.
  • Prefetch Policy: Added a PrefetchPolicy abstract base class and a DefaultPrefetchPolicy for deciding which adapter loads which key when multiple adapters have the same data. The default policy assigns keys to the lowest-indexed adapter.
  • StorageManager Integration: Integrated the new PrefetchController into StorageManager. The submit_prefetch_task now checks L1 for prefix hits first, then delegates remaining keys to the PrefetchController. query_prefetch_status combines L1 and L2 results and includes latency logging.
  • L1Manager Enhancements: Modified L1Manager.clear() to optionally skip locked objects, protecting in-flight store/prefetch operations. The PR description also mentions L1Manager.finish_write_and_reserve_read() for atomic write-to-read lock transition, which is a key component for prefetching.
  • L2 Adapter Eventfd Contract: Updated L2AdapterInterface docstrings to explicitly require globally unique event file descriptors across all adapters and operation types to prevent misrouting of events.
  • Design Documentation: Added a comprehensive design document (l2_adapters/DESIGN.md) detailing the store/prefetch controller architecture, data flows, lock invariants, and assumptions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lmcache/v1/distributed/l1_manager.py
    • Modified clear method to accept a force parameter, allowing safe clearing of unlocked objects by default while protecting locked objects, or a forced clear of all objects.
  • lmcache/v1/distributed/l2_adapters/DESIGN.md
    • Added a new design document outlining the architecture, data flow, and lock invariants for L2 store and prefetch controllers.
  • lmcache/v1/distributed/l2_adapters/base.py
    • Updated docstrings for get_store_event_fd, get_lookup_and_lock_event_fd, and get_load_event_fd to emphasize the requirement for globally distinct event file descriptors.
  • lmcache/v1/distributed/storage_controllers/init.py
    • Exported the new PrefetchController.
  • lmcache/v1/distributed/storage_controllers/prefetch_controller.py
    • Added a new file implementing the PrefetchController, including event-driven loop, two-phase request state machine (LOOKUP, PLAN_AND_LOAD), prefix-only loading logic, and L2 lock management.
  • lmcache/v1/distributed/storage_manager.py
    • Imported and initialized the PrefetchController.
    • Updated PrefetchHandle dataclass to include submit_time for latency logging.
    • Modified submit_prefetch_task to delegate remaining keys (after L1 check) to the PrefetchController.
    • Updated query_prefetch_status to combine L1 and L2 prefetch results and log completion latency.
    • Modified clear method to pass the force parameter to the underlying L1 manager.
    • Added _prefetch_controller.stop() to the close method for proper shutdown.
  • tests/v1/distributed/test_distributed_storage_manager.py
    • Added wait_for_condition and wait_for_prefetch_status helper functions.
    • Added a new pytest fixture l2_storage_manager_config for L2 testing.
    • Added a new test class TestStorageManagerL2Prefetch with integration tests for prefetching from L2, mixed L1/L2 hits, no L2 hits, partial prefix handling, and L1+L2 continuation.
  • tests/v1/distributed/test_prefetch_controller.py
    • Added a new file containing comprehensive unit tests for the PrefetchController, covering lifecycle, single/multi-adapter prefetch, overlap deduplication, no-hits scenarios, max-in-flight queuing, partial load failure, and query result semantics.
Activity
  • A new PrefetchController was implemented to handle asynchronous L2 prefetching.
  • The StorageManager was updated to integrate with the new prefetch controller, including changes to submit_prefetch_task and query_prefetch_status.
  • The L1Manager.clear() method was modified to support safe clearing of unlocked objects.
  • Docstrings in l2_adapters/base.py were updated to clarify eventfd uniqueness requirements.
  • A detailed design document (l2_adapters/DESIGN.md) was added to explain the new architecture.
  • Extensive unit tests for PrefetchController (18 tests) were added in tests/v1/distributed/test_prefetch_controller.py.
  • Five new integration tests for L2 prefetching were added to tests/v1/distributed/test_distributed_storage_manager.py.
  • All 208 existing distributed tests were confirmed to pass.
  • An end-to-end test with vLLM + LMCache MP server + mock L2 adapter demonstrated a 3.7x TTFT speedup (0.910s cold → 0.246s warm).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ApostaC ApostaC requested review from KuntaiDu and YaoJiayi March 2, 2026 06:41
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature, the PrefetchController, for asynchronously loading data from L2 cache into L1. The implementation is comprehensive, including a detailed design document, robust controller logic with a state machine, and thorough integration with the existing StorageManager. The addition of extensive unit and integration tests is excellent and covers many edge cases. My feedback includes a few suggestions to improve the robustness of the tests by replacing fixed-time sleeps with condition polling, and a minor refactoring for code clarity. Overall, this is a high-quality contribution.

Comment on lines +337 to +342
if handle.request_id != -1:
l2_r = self._prefetch_controller.query_prefetch_result(handle.request_id)

if l2_r is None:
return None
l2_result = l2_r # Just to make linter happy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment # Just to make linter happy suggests a potential code smell or a workaround for a linter issue. This part of the code can be refactored for better clarity and to remove the need for such a comment.

Suggested change
if handle.request_id != -1:
l2_r = self._prefetch_controller.query_prefetch_result(handle.request_id)
if l2_r is None:
return None
l2_result = l2_r # Just to make linter happy
if handle.request_id != -1:
l2_hits = self._prefetch_controller.query_prefetch_result(handle.request_id)
if l2_hits is None:
return None
l2_result = l2_hits

Comment on lines +408 to +410
# Brief sleep to let StoreController release read locks
# after L2 store completion, then clear L1
time.sleep(0.05)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using time.sleep() in tests can introduce flakiness. It's more robust to poll for the specific condition you're waiting for. In this case, you're waiting for the StoreController to release L1 read locks so that sm.clear() can successfully remove the objects. You can use the existing wait_for_condition helper to poll the lock state of the keys.

Suggested change
# Brief sleep to let StoreController release read locks
# after L2 store completion, then clear L1
time.sleep(0.05)
# Wait for StoreController to release read locks after L2 store completion.
ok = wait_for_condition(
lambda: all(
not sm._l1_manager.get_object_state(k).read_lock.is_locked()
for k in keys
if sm._l1_manager.get_object_state(k) is not None
),
timeout=5.0,
)
assert ok, "StoreController should have released L1 read locks"

Comment on lines +481 to +483
# Brief sleep to let StoreController release read locks
# after L2 store completion, then clear L1
time.sleep(0.05)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the other test, using time.sleep() can make the test flaky. It's more robust to poll for the condition that the StoreController has released the L1 read locks on the written keys before clearing L1.

Suggested change
# Brief sleep to let StoreController release read locks
# after L2 store completion, then clear L1
time.sleep(0.05)
# Wait for StoreController to release read locks after L2 store completion.
ok = wait_for_condition(
lambda: all(
not sm._l1_manager.get_object_state(k).read_lock.is_locked()
for k in keys_to_write
if sm._l1_manager.get_object_state(k) is not None
),
timeout=5.0,
)
assert ok, "StoreController should have released L1 read locks"

@ApostaC ApostaC added the mp Buildkite trigger for multi-processing mode test label Mar 2, 2026
Signed-off-by: ApostaC <yihua98@uchicago.edu>
@ApostaC ApostaC enabled auto-merge (squash) March 3, 2026 01:12
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 3, 2026
Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC merged commit c885a52 into LMCache:dev Mar 3, 2026
33 of 34 checks passed
oferki pushed a commit to oferki/LMCache that referenced this pull request Mar 3, 2026
…2667)

* finish L2 prefetch controller

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Ofer Kiselov Nahman <ofer.kiselovnahman@weka.io>
oferki pushed a commit to oferki/LMCache that referenced this pull request Mar 3, 2026
…2667)

* finish L2 prefetch controller

Signed-off-by: ApostaC <yihua98@uchicago.edu>
@ApostaC ApostaC mentioned this pull request Mar 4, 2026
8 tasks
mauryaavinash95 pushed a commit to mauryaavinash95/LMCache that referenced this pull request Mar 7, 2026
…2667)

* finish L2 prefetch controller

Signed-off-by: ApostaC <yihua98@uchicago.edu>
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
…2667)

* finish L2 prefetch controller

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026
…2667)

* finish L2 prefetch controller

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Aaron Wu <aaron.wu@dell.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…2667)

* finish L2 prefetch controller

Signed-off-by: ApostaC <yihua98@uchicago.edu>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…2667)

* finish L2 prefetch controller

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR mp Buildkite trigger for multi-processing mode test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants