Sync lookup, and move prefetch to retrieve by yoo-kumaneko · Pull Request #2769 · LMCache/LMCache

yoo-kumaneko · 2026-03-13T10:06:24Z

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Align the LMCache MP server's lookup/prefetch/retrieve pipeline with the OffloadingConnector design: - Add SYNC_LOOKUP protocol: a blocking RPC that performs L1 prefix scan and L2 existence check (with pin) in a single round-trip, returning the hit count directly. Eliminates the QUERY_PREFETCH_STATUS polling loop from the scheduler hot path. - Merge L2-to-L1 prefetch into RETRIEVE: when the RETRIEVE RPC arrives, it first executes any pending L2→L1 data movement (from the prior SYNC_LOOKUP), then performs the L1→GPU copy. This overlaps the full cache load with the forward pass of other scheduled requests. - Add synchronous API to PrefetchController (synchronous_lookup, execute_load_phase, unlock_lookup_results) and StorageManager (synchronous_lookup_and_lock, execute_prefetch_load, unlock_l2_lookups). - Update the adapter to use SYNC_LOOKUP and store hit counts directly instead of job IDs, removing the two-phase lookup/poll pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… read lock count In buffer-only mode with MLA (tp > world_size), the L2-to-L1 load phase only acquired 1 read lock per key, but tp workers each independently called finish_read. The first release deleted the temporary object; remaining workers hit "finish read on non-existing key" warnings. Propagate extra_count from the pending lookup state through execute_prefetch_load -> execute_load_phase -> finish_write_and_reserve_read so that 1 + extra_count locks are acquired, matching the number of consuming workers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-03-13T10:06:48Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the cache lookup mechanism within LMCache, moving from an asynchronous, two-phase lookup protocol to a more streamlined synchronous approach. This change simplifies the client-side interaction by providing immediate hit count results, while the server-side components now include dedicated synchronous methods for managing L1 and L2 cache lookups and data loading. The overall goal is to improve the efficiency and reduce the complexity of cache prefetching and retrieval operations.

Highlights

Synchronous Lookup Protocol: Introduced a new SYNC_LOOKUP request type and protocol, replacing the previous two-phase asynchronous lookup mechanism. This new approach allows for a single round-trip to determine cache hit counts, simplifying the client-side logic and eliminating the need for polling.
Refactored VLLM Integration Adapter: The VLLMMultiProcessAdapter was updated to utilize the new synchronous lookup. This involved replacing the internal state tracking for job IDs and finished jobs with a direct mapping to hit counts, and modifying the lookup and result checking methods accordingly.
Synchronous Prefetch and Load in Storage Management: New synchronous methods (synchronous_lookup_and_lock, execute_prefetch_load, unlock_l2_lookups) were added to the StorageManager and PrefetchController to handle the L1 and L2 cache interactions. These methods perform blocking operations to ensure immediate results for the synchronous lookup flow.
Handoff Mechanism for Pending Lookups: A mechanism was implemented in the MPCacheEngine to save and retrieve pending lookup state (_PendingLookupState) between the SYNC_LOOKUP request and the subsequent RETRIEVE or FREE_LOOKUP_LOCKS operations. This ensures that L2 prefetch data can be correctly processed or released.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lmcache/integration/vllm/vllm_multi_process_adapter.py
- Replaced _lookup_job_ids and _finished_lookup_jobs with a single _lookup_hit_counts dictionary to store synchronous lookup results.
- Updated maybe_submit_lookup_request to send a SYNC_LOOKUP request and directly store the returned hit count.
- Simplified check_lookup_result to retrieve the hit count directly from _lookup_hit_counts without any network calls or polling.
- Modified cleanup_lookup_result to simply remove the request ID from _lookup_hit_counts.
lmcache/v1/distributed/storage_controllers/prefetch_controller.py
- Imported the time module for busy-polling in synchronous operations.
- Added synchronous_lookup method to submit lookup_and_lock tasks to L2 adapters and busy-poll for their results.
- Implemented execute_load_phase to synchronously perform the L2-to-L1 data loading, including plan selection, L1 buffer reservation, and L2 load task submission.
- Introduced unlock_lookup_results and an internal _unlock_results helper to release L2 pins from synchronous lookups without proceeding to load.
lmcache/v1/distributed/storage_manager.py
- Imported Bitmap for handling bitmap operations.
- Added synchronous_lookup_and_lock to perform a combined L1 and L2 prefix lookup, pinning L2 objects and returning hit counts.
- Implemented execute_prefetch_load to facilitate L2-to-L1 data movement based on prior synchronous lookup results.
- Added unlock_l2_lookups to release L2 pins when a synchronous lookup is not followed by a load.
lmcache/v1/multiprocess/protocols/base.py
- Added SYNC_LOOKUP as a new RequestType enum member.
lmcache/v1/multiprocess/protocols/engine.py
- Exported SYNC_LOOKUP in the module's __all__ list.
- Defined the SYNC_LOOKUP protocol, specifying its payload (KeyType, int) and response (int) and marking it as a BLOCKING handler type.
lmcache/v1/multiprocess/server.py
- Imported Bitmap for use in lookup state.
- Defined _PendingLookupState dataclass to store state for synchronous lookup handoff.
- Initialized _pending_lookups dictionary and _pending_lookups_lock to manage synchronous lookup state.
- Modified the retrieve method to check for and execute pending L2 prefetch loads from _pending_lookups.
- Added sync_lookup method to handle the new SYNC_LOOKUP request, performing L1 and L2 lookups and storing pending state.
- Updated free_lookup_locks to release any pending L2 pins stored in _pending_lookups.
- Registered engine.sync_lookup as the handler for the SYNC_LOOKUP request type.
tests/v1/multiprocess/test_free_locks.py
- Updated test assertions to reference _lookup_hit_counts instead of _lookup_job_ids.
- Adjusted mock return values to reflect hit counts (chunks) instead of job IDs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2026-03-13T10:06:50Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

root and others added 2 commits March 13, 2026 17:21

yoo-kumaneko changed the title ~~Sync lookup prefetch on retrieve pre rebase~~ Sync lookup and move prefetch to retrieve Mar 13, 2026

yoo-kumaneko changed the title ~~Sync lookup and move prefetch to retrieve~~ Sync lookup, and move prefetch to retrieve Mar 13, 2026

yoo-kumaneko marked this pull request as draft March 13, 2026 10:14

ApostaC mentioned this pull request Mar 19, 2026

[MP][Feat] Query lookup-phase status for MP mode #2818

Merged

2 tasks

yoo-kumaneko closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync lookup, and move prefetch to retrieve#2769

Sync lookup, and move prefetch to retrieve#2769
yoo-kumaneko wants to merge 2 commits intoLMCache:devfrom
yoo-kumaneko:sync-lookup-prefetch-on-retrieve-pre-rebase

yoo-kumaneko commented Mar 13, 2026

Uh oh!

gemini-code-assist Bot commented Mar 13, 2026

Uh oh!

gemini-code-assist Bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yoo-kumaneko commented Mar 13, 2026

Uh oh!

gemini-code-assist Bot commented Mar 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant