[Add] L2 Prefetch Controller and StorageManager integration by ApostaC · Pull Request #2667 · LMCache/LMCache

ApostaC · 2026-03-02T06:38:31Z

Summary

PrefetchController: New event-driven background controller that asynchronously prefetches KV cache data from L2 adapters into L1 memory. Uses a two-phase state machine (LOOKUP → PLAN_AND_LOAD) with select.poll() on adapter eventfds, max-in-flight request limiting, and comprehensive L1/L2 lock management.
PrefetchPolicy ABC + DefaultPrefetchPolicy: Policy interface for deciding which adapter loads which key when multiple adapters have the same data. Default policy assigns each key to the first (lowest-indexed) adapter that has it.
StorageManager integration: Wires PrefetchController into StorageManager. submit_prefetch_task now checks L1 for prefix hits first, then delegates remaining keys to PrefetchController for L2 prefetch. query_prefetch_status combines L1 + L2 results with latency logging.
L1Manager.finish_write_and_reserve_read(): Atomic write-to-read lock transition, preventing eviction between L2 load completion and read lock acquisition.
L1Manager.clear(force=False): Safe clear that skips locked objects by default, protecting in-flight store/prefetch operations.
L2 adapter eventfd contract: Added docstrings requiring globally unique event fds across all adapters and operation types.
Design doc: l2_adapters/DESIGN.md documenting the full store/prefetch controller architecture, data flows, lock invariants, and assumptions.

Key Design Decisions

Prefix-only loading: Only the contiguous prefix of found keys is loaded from L2. Gaps break the prefix since vLLM requires contiguous KV cache. This is enforced by trim_load_plan_to_prefix().
Two-phase L2 unlock: Phase 1 unlocks keys not in the load plan (after planning); Phase 2 unlocks all plan keys (after load completes). Ensures L2 locks are always released.
Temporary L1 write buffers: Prefetch write reservations use is_temporary=True to allow eviction controller reclaim if needed.
Fire-and-forget unlock: submit_unlock is never retried by the controller — the adapter must guarantee eventual success internally.

Files Changed

File	Change
`storage_controllers/prefetch_controller.py`	New: full PrefetchController implementation (675 lines)
`storage_controllers/prefetch_policy.py`	New: PrefetchPolicy ABC + DefaultPrefetchPolicy
`storage_controllers/__init__.py`	Export PrefetchController
`storage_manager.py`	Wire PrefetchController, update submit/query/close, add timing logs, safe clear()
`l1_manager.py`	Add `finish_write_and_reserve_read()`, safe `clear(force)`
`l2_adapters/base.py`	Add eventfd uniqueness contract docstrings
`l2_adapters/DESIGN.md`	New: architecture and invariants design doc
`tests/.../test_prefetch_controller.py`	New: 18 unit tests for PrefetchController
`tests/.../test_distributed_storage_manager.py`	Add 5 L2 prefetch integration tests

Test Plan

pytest -xvs tests/v1/distributed/test_prefetch_controller.py — 18 tests covering lifecycle, single/multi adapter prefetch, overlap dedup, no-hits, max-in-flight queuing, partial load failure, query result pop semantics
pytest -xvs tests/v1/distributed/test_distributed_storage_manager.py — 5 new integration tests: L2 round-trip, mixed L1/L2 hits, no L2 hits, partial prefix, L1+L2 continuation
pytest -xvs tests/v1/distributed/ — all 208 tests pass
E2E test with vLLM + LMCache MP server + mock L2 adapter: 3.7x TTFT speedup (0.910s cold → 0.246s warm)

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist · 2026-03-02T06:38:55Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the caching system by introducing an L2 Prefetch Controller, enabling asynchronous prefetching of KV cache data from L2 storage into L1 memory. This integration with the StorageManager aims to improve performance by proactively loading data, reducing latency for subsequent requests. The changes include a new event-driven controller, a flexible prefetch policy, and critical updates to L1 and L2 management to ensure data consistency and efficient resource utilization during prefetch operations.

Highlights

L2 Prefetch Controller: Introduced a new event-driven background controller that asynchronously prefetches KV cache data from L2 adapters into L1 memory. It uses a two-phase state machine (LOOKUP → PLAN_AND_LOAD), limits in-flight requests, and manages L1/L2 locks.
Prefetch Policy: Added a PrefetchPolicy abstract base class and a DefaultPrefetchPolicy for deciding which adapter loads which key when multiple adapters have the same data. The default policy assigns keys to the lowest-indexed adapter.
StorageManager Integration: Integrated the new PrefetchController into StorageManager. The submit_prefetch_task now checks L1 for prefix hits first, then delegates remaining keys to the PrefetchController. query_prefetch_status combines L1 and L2 results and includes latency logging.
L1Manager Enhancements: Modified L1Manager.clear() to optionally skip locked objects, protecting in-flight store/prefetch operations. The PR description also mentions L1Manager.finish_write_and_reserve_read() for atomic write-to-read lock transition, which is a key component for prefetching.
L2 Adapter Eventfd Contract: Updated L2AdapterInterface docstrings to explicitly require globally unique event file descriptors across all adapters and operation types to prevent misrouting of events.
Design Documentation: Added a comprehensive design document (l2_adapters/DESIGN.md) detailing the store/prefetch controller architecture, data flows, lock invariants, and assumptions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lmcache/v1/distributed/l1_manager.py
- Modified clear method to accept a force parameter, allowing safe clearing of unlocked objects by default while protecting locked objects, or a forced clear of all objects.
lmcache/v1/distributed/l2_adapters/DESIGN.md
- Added a new design document outlining the architecture, data flow, and lock invariants for L2 store and prefetch controllers.
lmcache/v1/distributed/l2_adapters/base.py
- Updated docstrings for get_store_event_fd, get_lookup_and_lock_event_fd, and get_load_event_fd to emphasize the requirement for globally distinct event file descriptors.
lmcache/v1/distributed/storage_controllers/init.py
- Exported the new PrefetchController.
lmcache/v1/distributed/storage_controllers/prefetch_controller.py
- Added a new file implementing the PrefetchController, including event-driven loop, two-phase request state machine (LOOKUP, PLAN_AND_LOAD), prefix-only loading logic, and L2 lock management.
lmcache/v1/distributed/storage_manager.py
- Imported and initialized the PrefetchController.
- Updated PrefetchHandle dataclass to include submit_time for latency logging.
- Modified submit_prefetch_task to delegate remaining keys (after L1 check) to the PrefetchController.
- Updated query_prefetch_status to combine L1 and L2 prefetch results and log completion latency.
- Modified clear method to pass the force parameter to the underlying L1 manager.
- Added _prefetch_controller.stop() to the close method for proper shutdown.
tests/v1/distributed/test_distributed_storage_manager.py
- Added wait_for_condition and wait_for_prefetch_status helper functions.
- Added a new pytest fixture l2_storage_manager_config for L2 testing.
- Added a new test class TestStorageManagerL2Prefetch with integration tests for prefetching from L2, mixed L1/L2 hits, no L2 hits, partial prefix handling, and L1+L2 continuation.
tests/v1/distributed/test_prefetch_controller.py
- Added a new file containing comprehensive unit tests for the PrefetchController, covering lifecycle, single/multi-adapter prefetch, overlap deduplication, no-hits scenarios, max-in-flight queuing, partial load failure, and query result semantics.

Activity

A new PrefetchController was implemented to handle asynchronous L2 prefetching.
The StorageManager was updated to integrate with the new prefetch controller, including changes to submit_prefetch_task and query_prefetch_status.
The L1Manager.clear() method was modified to support safe clearing of unlocked objects.
Docstrings in l2_adapters/base.py were updated to clarify eventfd uniqueness requirements.
A detailed design document (l2_adapters/DESIGN.md) was added to explain the new architecture.
Extensive unit tests for PrefetchController (18 tests) were added in tests/v1/distributed/test_prefetch_controller.py.
Five new integration tests for L2 prefetching were added to tests/v1/distributed/test_distributed_storage_manager.py.
All 208 existing distributed tests were confirmed to pass.
An end-to-end test with vLLM + LMCache MP server + mock L2 adapter demonstrated a 3.7x TTFT speedup (0.910s cold → 0.246s warm).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant new feature, the PrefetchController, for asynchronously loading data from L2 cache into L1. The implementation is comprehensive, including a detailed design document, robust controller logic with a state machine, and thorough integration with the existing StorageManager. The addition of extensive unit and integration tests is excellent and covers many edge cases. My feedback includes a few suggestions to improve the robustness of the tests by replacing fixed-time sleeps with condition polling, and a minor refactoring for code clarity. Overall, this is a high-quality contribution.

gemini-code-assist · 2026-03-02T06:42:51Z

+        if handle.request_id != -1:
+            l2_r = self._prefetch_controller.query_prefetch_result(handle.request_id)
+
+            if l2_r is None:
+                return None
+            l2_result = l2_r  # Just to make linter happy


The comment # Just to make linter happy suggests a potential code smell or a workaround for a linter issue. This part of the code can be refactored for better clarity and to remove the need for such a comment.

Suggested change

if handle.request_id != -1:

l2_r = self._prefetch_controller.query_prefetch_result(handle.request_id)

if l2_r is None:

return None

l2_result = l2_r # Just to make linter happy

if handle.request_id != -1:

l2_hits = self._prefetch_controller.query_prefetch_result(handle.request_id)

if l2_hits is None:

return None

l2_result = l2_hits

gemini-code-assist · 2026-03-02T06:42:52Z

+        # Brief sleep to let StoreController release read locks
+        # after L2 store completion, then clear L1
+        time.sleep(0.05)


Using time.sleep() in tests can introduce flakiness. It's more robust to poll for the specific condition you're waiting for. In this case, you're waiting for the StoreController to release L1 read locks so that sm.clear() can successfully remove the objects. You can use the existing wait_for_condition helper to poll the lock state of the keys.

Suggested change

# Brief sleep to let StoreController release read locks

# after L2 store completion, then clear L1

time.sleep(0.05)

# Wait for StoreController to release read locks after L2 store completion.

ok = wait_for_condition(

lambda: all(

not sm._l1_manager.get_object_state(k).read_lock.is_locked()

for k in keys

if sm._l1_manager.get_object_state(k) is not None

),

timeout=5.0,

)

assert ok, "StoreController should have released L1 read locks"

gemini-code-assist · 2026-03-02T06:42:52Z

+        # Brief sleep to let StoreController release read locks
+        # after L2 store completion, then clear L1
+        time.sleep(0.05)


Similar to the other test, using time.sleep() can make the test flaky. It's more robust to poll for the condition that the StoreController has released the L1 read locks on the written keys before clearing L1.

Suggested change

# Brief sleep to let StoreController release read locks

# after L2 store completion, then clear L1

time.sleep(0.05)

# Wait for StoreController to release read locks after L2 store completion.

ok = wait_for_condition(

lambda: all(

not sm._l1_manager.get_object_state(k).read_lock.is_locked()

for k in keys_to_write

if sm._l1_manager.get_object_state(k) is not None

),

timeout=5.0,

)

assert ok, "StoreController should have released L1 read locks"

Signed-off-by: ApostaC <yihua98@uchicago.edu>

KuntaiDu

LGTM!

…2667) * finish L2 prefetch controller Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Ofer Kiselov Nahman <ofer.kiselovnahman@weka.io>

…2667) * finish L2 prefetch controller Signed-off-by: ApostaC <yihua98@uchicago.edu>

…2667) * finish L2 prefetch controller Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: shaoxiawjc <wjc2800@163.com>

…2667) * finish L2 prefetch controller Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Aaron Wu <aaron.wu@dell.com>

…2667) * finish L2 prefetch controller Signed-off-by: ApostaC <yihua98@uchicago.edu>

finish L2 prefetch controller

f054ddc

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC requested review from KuntaiDu and YaoJiayi March 2, 2026 06:41

gemini-code-assist Bot reviewed Mar 2, 2026

View reviewed changes

DongDongJu mentioned this pull request Mar 2, 2026

[RFC] Rust based raw block device support #2394

Closed

ApostaC added the mp Buildkite trigger for multi-processing mode test label Mar 2, 2026

By default use force for clear operation

5312537

Signed-off-by: ApostaC <yihua98@uchicago.edu>

YaoJiayi approved these changes Mar 2, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) March 3, 2026 01:12

github-actions Bot added the full Run comprehensive tests on this PR label Mar 3, 2026

KuntaiDu approved these changes Mar 3, 2026

View reviewed changes

ApostaC merged commit c885a52 into LMCache:dev Mar 3, 2026
33 of 34 checks passed

oferki pushed a commit to oferki/LMCache that referenced this pull request Mar 3, 2026

[Add] L2 Prefetch Controller and StorageManager integration (LMCache#…

a860515

…2667) * finish L2 prefetch controller Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC mentioned this pull request Mar 4, 2026

Support L2 storage in MP mode #2562

Closed

8 tasks

mauryaavinash95 pushed a commit to mauryaavinash95/LMCache that referenced this pull request Mar 7, 2026

[Add] L2 Prefetch Controller and StorageManager integration (LMCache#…

fedfe5e

…2667) * finish L2 prefetch controller Signed-off-by: ApostaC <yihua98@uchicago.edu>

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[Add] L2 Prefetch Controller and StorageManager integration (LMCache#…

1afea9f

…2667) * finish L2 prefetch controller Signed-off-by: ApostaC <yihua98@uchicago.edu>

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[Add] L2 Prefetch Controller and StorageManager integration (LMCache#…

67f744b

…2667) * finish L2 prefetch controller Signed-off-by: ApostaC <yihua98@uchicago.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Add] L2 Prefetch Controller and StorageManager integration#2667

[Add] L2 Prefetch Controller and StorageManager integration#2667
ApostaC merged 2 commits intoLMCache:devfrom
ApostaC:local-dev/l2-adapter-5

ApostaC commented Mar 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

KuntaiDu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        # Brief sleep to let StoreController release read locks
-        # after L2 store completion, then clear L1
-        time.sleep(0.05)
+        # Wait for StoreController to release read locks after L2 store completion.
+        ok = wait_for_condition(
+            lambda: all(
+                not sm._l1_manager.get_object_state(k).read_lock.is_locked()
+                for k in keys
+                if sm._l1_manager.get_object_state(k) is not None
+            ),
+            timeout=5.0,
+        )
+        assert ok, "StoreController should have released L1 read locks"

Conversation

ApostaC commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Design Decisions

Files Changed

Test Plan

Uh oh!

gemini-code-assist Bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ApostaC commented Mar 2, 2026 •

edited

Loading