[MP][Core] Update the workflow for lookup to avoid busy loop by ApostaC · Pull Request #2710 · LMCache/LMCache

ApostaC · 2026-03-07T00:58:24Z

What this PR does / why we need it:

Splits the MP lookup operation into a two-phase non-blocking protocol to eliminate the busy-loop in MPCacheEngine.lookup().

Previously, lookup() called StorageManager.submit_prefetch_task() then busy-looped on query_prefetch_status() until the result was ready. With max_workers=1 (the default), this blocked the sole thread pool worker, preventing all other BLOCKING requests (STORE, RETRIEVE, FREE_LOOKUP_LOCKS, END_SESSION) from executing concurrently.

Now:

LOOKUP (BLOCKING handler): hashes keys, submits the prefetch task, stores a PrefetchHandle server-side, and returns a prefetch job ID immediately — no busy loop.
QUERY_PREFETCH_STATUS (new SYNC handler): takes a job ID, polls the prefetch status, and returns int | None. Runs in the main ZMQ loop so it never consumes a thread pool worker. Auto-cleans the job entry when a non-None result is returned (exactly-once semantics).

The client adapter (LMCacheMPSchedulerAdapter) blocks on LOOKUP to get the job ID in maybe_submit_lookup_request(), then sends blocking QUERY_PREFETCH_STATUS polls in check_lookup_result().

Special notes for your reviewers:

Telemetry event name changed from "lookup" to "lookup_and_prefetch" since the span now covers both phases.
Fault tolerance: if a client crashes after LOOKUP but never calls QUERY_PREFETCH_STATUS, the _prefetch_jobs entry leaks. A TODO is left for periodic TTL-based cleanup in a future PR.
The blend server's cb_lookup_pre_computed still has its own internal busy-loop — out of scope for this PR.

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist · 2026-03-07T00:58:48Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical performance issue in the LMCache's lookup operation by replacing a blocking busy-loop with a non-blocking, two-phase protocol. This change allows the server to submit prefetch tasks and immediately return a job ID, enabling clients to asynchronously poll for results without monopolizing thread pool workers. The refactoring ensures that other essential blocking operations can execute concurrently, leading to a more responsive and efficient cache server architecture.

Highlights

Two-Phase Non-Blocking Lookup Protocol: Introduced a new two-phase lookup mechanism to replace the previous busy-loop in MPCacheEngine.lookup(), significantly improving concurrency and preventing blocking of thread pool workers.
New Request Types and Handlers: Separated the lookup operation into a LOOKUP request (submits prefetch task, returns job ID) and a new QUERY_PREFETCH_STATUS request (polls for the prefetch result), with corresponding server-side handlers.
Client-Side Adapter Updates: The LMCacheMPSchedulerAdapter was updated to implement the new two-phase client-side logic, first requesting a job ID and then polling for completion.
Server-Side Job Tracking: Implemented server-side tracking for prefetch jobs using a new _PrefetchJob dataclass, including automatic cleanup of job entries upon completion.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lmcache/integration/vllm/vllm_multi_process_adapter.py
- Replaced lookup_futures with _lookup_job_ids for tracking prefetch job IDs.
- Updated maybe_submit_lookup_request to send a LOOKUP request and store the returned job ID.
- Modified check_lookup_result to send QUERY_PREFETCH_STATUS requests and poll for the prefetch result.
- Adjusted cleanup_lookup_result to remove the job ID from _lookup_job_ids.
lmcache/v1/multiprocess/blend_server.py
- Registered the new QUERY_PREFETCH_STATUS request type with its corresponding handler engine.query_prefetch_status.
lmcache/v1/multiprocess/protocols/base.py
- Added QUERY_PREFETCH_STATUS as a new RequestType enum member.
lmcache/v1/multiprocess/protocols/engine.py
- Updated documentation for LOOKUP to reflect its new role of returning a prefetch job ID.
- Added a new protocol definition for QUERY_PREFETCH_STATUS, specifying its payload, response, and SYNC handler type.
- Included QUERY_PREFETCH_STATUS in the list of exported protocol definitions.
lmcache/v1/multiprocess/server.py
- Imported dataclass and PrefetchHandle.
- Defined a new _PrefetchJob dataclass to encapsulate prefetch job details.
- Added _prefetch_jobs dictionary, _next_prefetch_job_id, and _prefetch_job_lock to MPCacheEngine for tracking prefetch jobs.
- Modified lookup method to submit a prefetch task and immediately return a job ID via _register_prefetch_job.
- Implemented _register_prefetch_job to assign and store new prefetch jobs.
- Added query_prefetch_status method to poll the status of a prefetch job and clean up completed entries.
- Updated telemetry event name from "lookup" to "lookup_and_prefetch".
tests/v1/multiprocess/test_blend_server.py
- Modified test_cb_store_final_then_normal_lookup_retrieve to use the two-phase lookup protocol, first getting a job ID and then polling for the result.
tests/v1/multiprocess/test_cache_server.py
- Updated the lookup_all helper function to reflect the new two-phase lookup logic, submitting a LOOKUP request and then polling with QUERY_PREFETCH_STATUS.
tests/v1/multiprocess/test_free_locks.py
- Adjusted mock objects in tests to reference _lookup_job_ids instead of lookup_futures and to mock future.result() for job IDs.

Activity

Unit tests have been added to cover the new two-phase lookup logic.
The telemetry event name for lookup operations has been changed from "lookup" to "lookup_and_prefetch" to reflect the expanded scope.
A known issue regarding potential _prefetch_jobs entry leaks if a client crashes after LOOKUP but before QUERY_PREFETCH_STATUS is noted, with a TODO for future TTL-based cleanup.
The existing busy-loop in the blend server's cb_lookup_pre_computed is acknowledged as out of scope for this PR.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the lookup operation to a two-phase, non-blocking protocol, successfully resolving the busy-loop issue and preventing blocking the thread pool worker. However, this new stateful mechanism on the server introduces significant security vulnerabilities, specifically resource exhaustion (memory leak) and predictable resource identifiers (IDOR). Prefetch jobs are stored in a dictionary without a TTL or limit, and their IDs are easily guessable, which could lead to cross-client interference and denial of service. Additionally, I suggest refactoring some duplicated code to improve maintainability.

gemini-code-assist · 2026-03-07T01:00:48Z

+        # Prefetch job tracking for two-phase lookup
+        # TODO: implement periodic cleanup of stale _prefetch_jobs entries
+        # for crash resilience (e.g., client calls lookup but never queries)
+        self._prefetch_jobs: dict[int, _PrefetchJob] = {}


The _prefetch_jobs dictionary stores state for every LOOKUP request, but entries are only removed when query_prefetch_status is called and returns a non-None result. A malicious or buggy client can exhaust server memory by sending many LOOKUP requests without querying their status, leading to a Denial of Service (DoS). The PR description acknowledges this with a TODO, but the vulnerability is introduced by this change.

gemini-code-assist · 2026-03-07T01:00:48Z

+            job_id = self._next_prefetch_job_id
+            self._next_prefetch_job_id += 1


The prefetch_job_id is an incrementing integer, making it easily predictable. Since query_prefetch_status removes the job entry from the server's state upon a successful query, an attacker can guess job IDs and 'steal' the results of other clients' prefetch jobs. This results in a Denial of Service for the legitimate client, as their subsequent query will find no job entry and return 0.

gemini-code-assist · 2026-03-07T01:00:48Z

+            return self._register_prefetch_job(
+                _PrefetchJob(
+                    handle=PrefetchHandle(
+                        request_id=-1,
+                        l1_prefix_hit_count=0,
+                        total_requested_keys=0,
+                        submit_time=time.monotonic(),
+                    ),
+                    world_size=1,
+                    request_id=key.request_id,
                )
            )
-            return 0

        # Prepare for the obj keys
        ipc_keys.extend(key.to_hash_keys(self.token_hasher))
        if not ipc_keys:
-            log_telemetry(
-                make_end_event(
-                    "lookup",
-                    key.request_id,
-                    error="no_ipc_keys_generated",
+            return self._register_prefetch_job(
+                _PrefetchJob(
+                    handle=PrefetchHandle(
+                        request_id=-1,
+                        l1_prefix_hit_count=0,
+                        total_requested_keys=0,
+                        submit_time=time.monotonic(),
+                    ),
+                    world_size=1,
+                    request_id=key.request_id,
                )
            )


There's duplicated code for creating and registering a dummy prefetch job in these error-handling paths. This can be refactored into a helper method to improve code clarity and maintainability.

You can add a helper method like this to the MPCacheEngine class:

def _create_and_register_dummy_job(self, request_id: str) -> int: """Creates and registers a dummy prefetch job for error cases.""" dummy_job = _PrefetchJob( handle=PrefetchHandle( request_id=-1, l1_prefix_hit_count=0, total_requested_keys=0, submit_time=time.monotonic(), ), world_size=1, request_id=request_id, ) return self._register_prefetch_job(dummy_job)

Then, you can simplify the lookup method by calling this new helper.

return self._create_and_register_dummy_job(key.request_id) # Prepare for the obj keys ipc_keys.extend(key.to_hash_keys(self.token_hasher)) if not ipc_keys: return self._create_and_register_dummy_job(key.request_id)

KuntaiDu

LGTM!

Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>

Signed-off-by: ApostaC <yihua98@uchicago.edu>

Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>

Signed-off-by: ApostaC <yihua98@uchicago.edu>

…#2710) * Separate lookup and pefetch to avoid busy loop Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>

…#2710) * Separate lookup and pefetch to avoid busy loop Signed-off-by: Yihua Cheng <yihua98@uchicago.edu> Signed-off-by: shaoxiawjc <wjc2800@163.com>

…#2710) * Separate lookup and pefetch to avoid busy loop Signed-off-by: Yihua Cheng <yihua98@uchicago.edu> Signed-off-by: Aaron Wu <aaron.wu@dell.com>

…#2710) * Separate lookup and pefetch to avoid busy loop Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>

Separate lookup and pefetch to avoid busy loop

b98b25f

Signed-off-by: ApostaC <yihua98@uchicago.edu>

gemini-code-assist Bot reviewed Mar 7, 2026

View reviewed changes

KuntaiDu approved these changes Mar 7, 2026

View reviewed changes

Merge branch 'dev' into local-dev/mp-lookup-optimization

13e20bb

Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>

ApostaC added the mp Buildkite trigger for multi-processing mode test label Mar 7, 2026

YaoJiayi approved these changes Mar 7, 2026

View reviewed changes

ApostaC mentioned this pull request Mar 7, 2026

[Misc] Improve nixl perf in lmcache mp #2711

Merged

2 tasks

ApostaC added the full Run comprehensive tests on this PR label Mar 7, 2026

fix precommit

9f610b8

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC enabled auto-merge (squash) March 7, 2026 04:51

ApostaC added 2 commits March 6, 2026 21:20

Merge branch 'dev' into local-dev/mp-lookup-optimization

886d18a

Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>

enable async scheduling in test

0cf30e4

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC mentioned this pull request Mar 7, 2026

[CI][MP] Enable async scheduling in multiprocessing test #2712

Closed

2 tasks

fix flaky ut

b1ef241

Signed-off-by: ApostaC <yihua98@uchicago.edu>

ApostaC merged commit c836003 into LMCache:dev Mar 7, 2026
25 of 28 checks passed

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[MP][Core] Update the workflow for lookup to avoid busy loop (LMCache…

48fc9e8

…#2710) * Separate lookup and pefetch to avoid busy loop Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>

jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026

[MP][Core] Update the workflow for lookup to avoid busy loop (LMCache…

17c98ba

…#2710) * Separate lookup and pefetch to avoid busy loop Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>

ApostaC mentioned this pull request May 2, 2026

docs: daily drift check — multi-process mode (2026-05-02) #3184

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MP][Core] Update the workflow for lookup to avoid busy loop#2710

[MP][Core] Update the workflow for lookup to avoid busy loop#2710
ApostaC merged 6 commits intoLMCache:devfrom
ApostaC:local-dev/mp-lookup-optimization

ApostaC commented Mar 7, 2026

Uh oh!

gemini-code-assist Bot commented Mar 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 7, 2026

Uh oh!

gemini-code-assist Bot Mar 7, 2026

Uh oh!

gemini-code-assist Bot Mar 7, 2026

Uh oh!

KuntaiDu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		job_id = self._next_prefetch_job_id
		self._next_prefetch_job_id += 1

Conversation

ApostaC commented Mar 7, 2026

Uh oh!

gemini-code-assist Bot commented Mar 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

KuntaiDu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants