Skip to content

[MP][Core] Update the workflow for lookup to avoid busy loop#2710

Merged
ApostaC merged 6 commits intoLMCache:devfrom
ApostaC:local-dev/mp-lookup-optimization
Mar 7, 2026
Merged

[MP][Core] Update the workflow for lookup to avoid busy loop#2710
ApostaC merged 6 commits intoLMCache:devfrom
ApostaC:local-dev/mp-lookup-optimization

Conversation

@ApostaC
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC commented Mar 7, 2026

What this PR does / why we need it:

Splits the MP lookup operation into a two-phase non-blocking protocol to eliminate the busy-loop in MPCacheEngine.lookup().

Previously, lookup() called StorageManager.submit_prefetch_task() then busy-looped on query_prefetch_status() until the result was ready. With max_workers=1 (the default), this blocked the sole thread pool worker, preventing all other BLOCKING requests (STORE, RETRIEVE, FREE_LOOKUP_LOCKS, END_SESSION) from executing concurrently.

Now:

  • LOOKUP (BLOCKING handler): hashes keys, submits the prefetch task, stores a PrefetchHandle server-side, and returns a prefetch job ID immediately — no busy loop.
  • QUERY_PREFETCH_STATUS (new SYNC handler): takes a job ID, polls the prefetch status, and returns int | None. Runs in the main ZMQ loop so it never consumes a thread pool worker. Auto-cleans the job entry when a non-None result is returned (exactly-once semantics).

The client adapter (LMCacheMPSchedulerAdapter) blocks on LOOKUP to get the job ID in maybe_submit_lookup_request(), then sends blocking QUERY_PREFETCH_STATUS polls in check_lookup_result().

Special notes for your reviewers:

  • Telemetry event name changed from "lookup" to "lookup_and_prefetch" since the span now covers both phases.
  • Fault tolerance: if a client crashes after LOOKUP but never calls QUERY_PREFETCH_STATUS, the _prefetch_jobs entry leaks. A TODO is left for periodic TTL-based cleanup in a future PR.
  • The blend server's cb_lookup_pre_computed still has its own internal busy-loop — out of scope for this PR.

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Signed-off-by: ApostaC <yihua98@uchicago.edu>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical performance issue in the LMCache's lookup operation by replacing a blocking busy-loop with a non-blocking, two-phase protocol. This change allows the server to submit prefetch tasks and immediately return a job ID, enabling clients to asynchronously poll for results without monopolizing thread pool workers. The refactoring ensures that other essential blocking operations can execute concurrently, leading to a more responsive and efficient cache server architecture.

Highlights

  • Two-Phase Non-Blocking Lookup Protocol: Introduced a new two-phase lookup mechanism to replace the previous busy-loop in MPCacheEngine.lookup(), significantly improving concurrency and preventing blocking of thread pool workers.
  • New Request Types and Handlers: Separated the lookup operation into a LOOKUP request (submits prefetch task, returns job ID) and a new QUERY_PREFETCH_STATUS request (polls for the prefetch result), with corresponding server-side handlers.
  • Client-Side Adapter Updates: The LMCacheMPSchedulerAdapter was updated to implement the new two-phase client-side logic, first requesting a job ID and then polling for completion.
  • Server-Side Job Tracking: Implemented server-side tracking for prefetch jobs using a new _PrefetchJob dataclass, including automatic cleanup of job entries upon completion.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lmcache/integration/vllm/vllm_multi_process_adapter.py
    • Replaced lookup_futures with _lookup_job_ids for tracking prefetch job IDs.
    • Updated maybe_submit_lookup_request to send a LOOKUP request and store the returned job ID.
    • Modified check_lookup_result to send QUERY_PREFETCH_STATUS requests and poll for the prefetch result.
    • Adjusted cleanup_lookup_result to remove the job ID from _lookup_job_ids.
  • lmcache/v1/multiprocess/blend_server.py
    • Registered the new QUERY_PREFETCH_STATUS request type with its corresponding handler engine.query_prefetch_status.
  • lmcache/v1/multiprocess/protocols/base.py
    • Added QUERY_PREFETCH_STATUS as a new RequestType enum member.
  • lmcache/v1/multiprocess/protocols/engine.py
    • Updated documentation for LOOKUP to reflect its new role of returning a prefetch job ID.
    • Added a new protocol definition for QUERY_PREFETCH_STATUS, specifying its payload, response, and SYNC handler type.
    • Included QUERY_PREFETCH_STATUS in the list of exported protocol definitions.
  • lmcache/v1/multiprocess/server.py
    • Imported dataclass and PrefetchHandle.
    • Defined a new _PrefetchJob dataclass to encapsulate prefetch job details.
    • Added _prefetch_jobs dictionary, _next_prefetch_job_id, and _prefetch_job_lock to MPCacheEngine for tracking prefetch jobs.
    • Modified lookup method to submit a prefetch task and immediately return a job ID via _register_prefetch_job.
    • Implemented _register_prefetch_job to assign and store new prefetch jobs.
    • Added query_prefetch_status method to poll the status of a prefetch job and clean up completed entries.
    • Updated telemetry event name from "lookup" to "lookup_and_prefetch".
  • tests/v1/multiprocess/test_blend_server.py
    • Modified test_cb_store_final_then_normal_lookup_retrieve to use the two-phase lookup protocol, first getting a job ID and then polling for the result.
  • tests/v1/multiprocess/test_cache_server.py
    • Updated the lookup_all helper function to reflect the new two-phase lookup logic, submitting a LOOKUP request and then polling with QUERY_PREFETCH_STATUS.
  • tests/v1/multiprocess/test_free_locks.py
    • Adjusted mock objects in tests to reference _lookup_job_ids instead of lookup_futures and to mock future.result() for job IDs.
Activity
  • Unit tests have been added to cover the new two-phase lookup logic.
  • The telemetry event name for lookup operations has been changed from "lookup" to "lookup_and_prefetch" to reflect the expanded scope.
  • A known issue regarding potential _prefetch_jobs entry leaks if a client crashes after LOOKUP but before QUERY_PREFETCH_STATUS is noted, with a TODO for future TTL-based cleanup.
  • The existing busy-loop in the blend server's cb_lookup_pre_computed is acknowledged as out of scope for this PR.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the lookup operation to a two-phase, non-blocking protocol, successfully resolving the busy-loop issue and preventing blocking the thread pool worker. However, this new stateful mechanism on the server introduces significant security vulnerabilities, specifically resource exhaustion (memory leak) and predictable resource identifiers (IDOR). Prefetch jobs are stored in a dictionary without a TTL or limit, and their IDs are easily guessable, which could lead to cross-client interference and denial of service. Additionally, I suggest refactoring some duplicated code to improve maintainability.

# Prefetch job tracking for two-phase lookup
# TODO: implement periodic cleanup of stale _prefetch_jobs entries
# for crash resilience (e.g., client calls lookup but never queries)
self._prefetch_jobs: dict[int, _PrefetchJob] = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The _prefetch_jobs dictionary stores state for every LOOKUP request, but entries are only removed when query_prefetch_status is called and returns a non-None result. A malicious or buggy client can exhaust server memory by sending many LOOKUP requests without querying their status, leading to a Denial of Service (DoS). The PR description acknowledges this with a TODO, but the vulnerability is introduced by this change.

Comment on lines +575 to +576
job_id = self._next_prefetch_job_id
self._next_prefetch_job_id += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The prefetch_job_id is an incrementing integer, making it easily predictable. Since query_prefetch_status removes the job entry from the server's state upon a successful query, an attacker can guess job IDs and 'steal' the results of other clients' prefetch jobs. This results in a Denial of Service for the legitimate client, as their subsequent query will find no job entry and return 0.

Comment on lines +533 to 560
return self._register_prefetch_job(
_PrefetchJob(
handle=PrefetchHandle(
request_id=-1,
l1_prefix_hit_count=0,
total_requested_keys=0,
submit_time=time.monotonic(),
),
world_size=1,
request_id=key.request_id,
)
)
return 0

# Prepare for the obj keys
ipc_keys.extend(key.to_hash_keys(self.token_hasher))
if not ipc_keys:
log_telemetry(
make_end_event(
"lookup",
key.request_id,
error="no_ipc_keys_generated",
return self._register_prefetch_job(
_PrefetchJob(
handle=PrefetchHandle(
request_id=-1,
l1_prefix_hit_count=0,
total_requested_keys=0,
submit_time=time.monotonic(),
),
world_size=1,
request_id=key.request_id,
)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's duplicated code for creating and registering a dummy prefetch job in these error-handling paths. This can be refactored into a helper method to improve code clarity and maintainability.

You can add a helper method like this to the MPCacheEngine class:

    def _create_and_register_dummy_job(self, request_id: str) -> int:
        """Creates and registers a dummy prefetch job for error cases."""
        dummy_job = _PrefetchJob(
            handle=PrefetchHandle(
                request_id=-1,
                l1_prefix_hit_count=0,
                total_requested_keys=0,
                submit_time=time.monotonic(),
            ),
            world_size=1,
            request_id=request_id,
        )
        return self._register_prefetch_job(dummy_job)

Then, you can simplify the lookup method by calling this new helper.

            return self._create_and_register_dummy_job(key.request_id)

        # Prepare for the obj keys
        ipc_keys.extend(key.to_hash_keys(self.token_hasher))
        if not ipc_keys:
            return self._create_and_register_dummy_job(key.request_id)

Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>
@ApostaC ApostaC added the mp Buildkite trigger for multi-processing mode test label Mar 7, 2026
@ApostaC ApostaC added the full Run comprehensive tests on this PR label Mar 7, 2026
Signed-off-by: ApostaC <yihua98@uchicago.edu>
@ApostaC ApostaC enabled auto-merge (squash) March 7, 2026 04:51
ApostaC added 2 commits March 6, 2026 21:20
Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: ApostaC <yihua98@uchicago.edu>
@ApostaC ApostaC merged commit c836003 into LMCache:dev Mar 7, 2026
25 of 28 checks passed
mauryaavinash95 pushed a commit to mauryaavinash95/LMCache that referenced this pull request Mar 7, 2026
…#2710)

* Separate lookup and pefetch to avoid busy loop
Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
…#2710)

* Separate lookup and pefetch to avoid busy loop
Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026
…#2710)

* Separate lookup and pefetch to avoid busy loop
Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>
Signed-off-by: Aaron Wu <aaron.wu@dell.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…#2710)

* Separate lookup and pefetch to avoid busy loop
Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…#2710)

* Separate lookup and pefetch to avoid busy loop
Signed-off-by: Yihua Cheng <yihua98@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR mp Buildkite trigger for multi-processing mode test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants