[MP][Feat] Query lookup-phase status for MP mode by ApostaC · Pull Request #2818 · LMCache/LMCache

ApostaC · 2026-03-18T21:22:16Z

What this PR does / why we need it:

Adds a new query_prefetch_lookup_hits API that lets callers check L2 prefix hits
as soon as the lookup phase finishes, without waiting for the full prefetch (load)
to complete. This enables earlier scheduling decisions in the engine when only the
hit count matters.

Changes:

PrefetchController: new query_lookup_result() method with separate lock and
dict for lookup-phase results; query_prefetch_result() cleans up lookup entries
StorageManager: new query_prefetch_lookup_hits() combining L1 + L2 hit counts
MPCacheEngine: new query_prefetch_lookup_hits() handler
MP protocol: new QUERY_PREFETCH_LOOKUP_HITS request type; QUERY_PREFETCH_STATUS
handler type changed from SYNC to BLOCKING

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

chatgpt-codex-connector · 2026-03-18T21:22:30Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

gemini-code-assist · 2026-03-18T21:22:38Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the prefetching mechanism by allowing early access to lookup-phase results. The primary goal is to decouple the availability of prefix hit information from the completion of the entire prefetch operation, thereby enabling more agile and informed scheduling decisions. This change provides a more granular view into the prefetch process, allowing systems to react faster based on initial lookup outcomes rather than waiting for the full data load.

Highlights

New API for Lookup-Phase Status: Introduced a new query_prefetch_lookup_hits API that allows callers to check L2 prefix hits as soon as the lookup phase completes, without waiting for the full prefetch (load) to finish.
Enhanced Scheduling Decisions: This new capability enables earlier scheduling decisions within the engine, particularly when only the hit count is critical, improving efficiency.
PrefetchController Updates: The PrefetchController now includes a query_lookup_result() method with its own lock and dictionary for managing lookup-phase results. The existing query_prefetch_result() method has been updated to clean up these lookup entries.
StorageManager Integration: The StorageManager gained a new query_prefetch_lookup_hits() method, which combines both L1 and L2 hit counts.
MPCacheEngine and Protocol Changes: A new query_prefetch_lookup_hits() handler was added to MPCacheEngine, and the MP protocol now includes a QUERY_PREFETCH_LOOKUP_HITS request type. Additionally, the QUERY_PREFETCH_STATUS handler type was changed from SYNC to BLOCKING.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a query_prefetch_lookup_hits API to allow for earlier insight into prefetch status. The changes are well-contained and include corresponding updates to the multiprocess protocol and new unit tests. My review identifies two key issues in the PrefetchController: a critical memory leak vulnerability due to a fragile cleanup mechanism that relies on the caller, and a high-severity race condition that could lead to inconsistent state. The feedback provides suggestions to make the new API more robust and thread-safe.

gemini-code-assist · 2026-03-18T21:24:47Z

+        Note:
+            This function does not pop the result. The caller need to make sure to call
+            the query_prefetch_result after calling this function, otherwise nobody
+            will clean up the completed lookups dictionary, causing memory leak.
+        """


The note in this docstring highlights a significant design issue that can lead to memory leaks. Relying on the caller to invoke query_prefetch_result for cleanup makes the API fragile. If a client submits a request but never queries the final result (e.g., due to a crash or logic error), the request ID and its result will be permanently stored in _completed_lookups and _completed_results, causing unbounded memory growth over time.

To make the controller more robust, I strongly recommend implementing an automatic cleanup mechanism within PrefetchController itself. A good approach would be to associate a timestamp with each completed request and have a periodic task (e.g., within _prefetch_loop) that purges entries older than a certain TTL. This would ensure resources are eventually reclaimed without depending on caller behavior.

gemini-code-assist · 2026-03-18T21:24:48Z

+        with self._prefetch_results_lock:
+            result = self._completed_results.pop(request_id, None)
+        if result is not None:
+            with self._lookup_results_lock:
+                self._completed_lookups.pop(request_id, None)
+        return result


There is a race condition here. The cleanup of _completed_results and _completed_lookups is not atomic because the locks are acquired separately. This can lead to incorrect behavior.

Consider this sequence:

Thread 1 calls query_prefetch_result, pops from _completed_results and releases _prefetch_results_lock.

Thread 2 calls query_lookup_result for the same request_id. It will successfully get the result from _completed_lookups.

Thread 1 proceeds to pop from _completed_lookups.

This violates the documented contract that query_lookup_result should return None if the request has been consumed by query_prefetch_result.

To fix this, you should acquire both locks together to ensure atomicity. Since no other part of the code holds both locks, you can establish a consistent locking order (e.g., _prefetch_results_lock then _lookup_results_lock) to prevent deadlocks.

Suggested change

with self._prefetch_results_lock:

result = self._completed_results.pop(request_id, None)

if result is not None:

with self._lookup_results_lock:

self._completed_lookups.pop(request_id, None)

return result

with self._prefetch_results_lock, self._lookup_results_lock:

result = self._completed_results.pop(request_id, None)

if result is not None:

self._completed_lookups.pop(request_id, None)

return result

YaoJiayi · 2026-03-18T22:38:59Z

@ApostaC The blend server test is failing

Signed-off-by: ApsotaC <yihua98@uchicago.edu>

ApostaC · 2026-03-19T00:26:33Z

@YaoJiayi Should be fixed now

ApostaC · 2026-03-19T00:27:19Z

@maobaolong @yoo-kumaneko PTAL, this is related to #2735 and #2769

sammshen · 2026-03-19T04:54:11Z

            self._prefetch_jobs[job_id] = job
        return job_id

+    def query_prefetch_lookup_hits(


also normalize:

found_count = found_count // job.world_size

like query_prefetch_status?

to keep the two APIs consistent

Good catch!

Is there some related PR about vllm side changes? @ApostaC

sammshen

LGTM! one small nit

maobaolong · 2026-03-19T09:38:04Z

@yoo-kumaneko Would you like to take a look at this PR? Thanks!

Signed-off-by: ApsotaC <yihua98@uchicago.edu>

sammshen

LGTM

* fix dco and failed tests Signed-off-by: ApsotaC <yihua98@uchicago.edu>

gemini-code-assist Bot reviewed Mar 18, 2026

View reviewed changes

fix dco and failed tests

f753f17

Signed-off-by: ApsotaC <yihua98@uchicago.edu>

ApostaC force-pushed the local-dev/mp-lookup-interface branch from c193fbf to f753f17 Compare March 19, 2026 00:26

sammshen reviewed Mar 19, 2026

View reviewed changes

sammshen requested changes Mar 19, 2026

View reviewed changes

fix semantics and add tests

11f48b3

Signed-off-by: ApsotaC <yihua98@uchicago.edu>

sammshen approved these changes Mar 19, 2026

View reviewed changes

Merge branch 'dev' into local-dev/mp-lookup-interface

c65eb0c

YaoJiayi approved these changes Mar 20, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) March 23, 2026 22:37

github-actions Bot added the full Run comprehensive tests on this PR label Mar 23, 2026

ApostaC merged commit 6ff8efa into LMCache:dev Mar 23, 2026
26 of 27 checks passed

realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 26, 2026

[MP][Feat] Query lookup-phase status for MP mode (LMCache#2818)

4980f83

* fix dco and failed tests Signed-off-by: ApsotaC <yihua98@uchicago.edu>

deng451e pushed a commit to deng451e/LMCache that referenced this pull request Mar 27, 2026

[MP][Feat] Query lookup-phase status for MP mode (LMCache#2818)

967b953

* fix dco and failed tests Signed-off-by: ApsotaC <yihua98@uchicago.edu>

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[MP][Feat] Query lookup-phase status for MP mode (LMCache#2818)

a9ca363

* fix dco and failed tests Signed-off-by: ApsotaC <yihua98@uchicago.edu>

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[MP][Feat] Query lookup-phase status for MP mode (LMCache#2818)

9c02f5e

* fix dco and failed tests Signed-off-by: ApsotaC <yihua98@uchicago.edu>

ApostaC mentioned this pull request May 2, 2026

docs: daily drift check — multi-process mode (2026-05-02) #3184

Merged

4 tasks

Conversation

ApostaC commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 18, 2026

Uh oh!

gemini-code-assist Bot commented Mar 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

YaoJiayi commented Mar 18, 2026

Uh oh!

ApostaC commented Mar 19, 2026

Uh oh!

ApostaC commented Mar 19, 2026

Uh oh!

sammshen Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ApostaC Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

maobaolong Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

maobaolong commented Mar 19, 2026

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants