Skip to content

Revert "[MP] Support MP Server restart (#2713)"#2729

Merged
ApostaC merged 1 commit intodevfrom
revert-2713-mp_server_restart
Mar 10, 2026
Merged

Revert "[MP] Support MP Server restart (#2713)"#2729
ApostaC merged 1 commit intodevfrom
revert-2713-mp_server_restart

Conversation

@ApostaC
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC commented Mar 10, 2026

Reverts #2713

See the comments for the reason: #2713 (review)

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses concerns with the previously implemented multi-process server restart support by reverting the entire feature. The core change involves simplifying the client-server communication for cache operations, moving from a detailed OperationStatus enum to a straightforward boolean success indicator. This streamlines the interaction and removes the client-side complexity of handling server restarts and re-registration, effectively removing the server restart resilience capability.

Highlights

  • Revert of Server Restart Support: This pull request fully reverts the changes introduced in [MP] Support MP Server restart #2713, which aimed to add multi-process server restart support.
  • Removal of OperationStatus Enum: The OperationStatus enum, previously used to indicate the outcome of store and retrieve operations, has been entirely removed from the codebase.
  • Simplified Store/Retrieve Results: The return types for store and retrieve operations have been simplified from a status integer (OperationStatus) to a boolean flag, indicating success or failure.
  • Removal of Auto Re-registration Logic: The client-side logic for detecting unregistered instances and automatically re-registering KV caches after a server restart has been removed.
  • Removed Server Restart Tests: The dedicated test file test_server_restart.py, which covered the server restart resilience scenario, has been removed.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lmcache/integration/vllm/vllm_multi_process_adapter.py
    • Removed OperationStatus import.
    • Changed StoreResult and RetrieveResult type hints from int to bool.
    • Removed the registered attribute and _do_register method.
    • Modified register_kv_caches to directly send the registration request.
    • Removed auto re-registration logic and need_reregister flag from get_finished.
    • Updated error logging for store/retrieve results to check for boolean failure.
  • lmcache/v1/multiprocess/custom_types.py
    • Removed IntEnum import.
    • Removed the OperationStatus enum definition.
  • lmcache/v1/multiprocess/futures.py
    • Removed self.is_done_.set() and related conditional logic in _on_raw_future_complete.
    • Removed is_done_.is_set() checks in wait and query methods.
    • Replaced if self.event_ is not None: with assert self.event_ is not None.
  • lmcache/v1/multiprocess/protocols/engine.py
    • Updated response_class for STORE and RETRIEVE protocols from tuple[bytes, int] to tuple[bytes, bool].
    • Updated docstrings to reflect boolean success flags for store/retrieve operations.
  • lmcache/v1/multiprocess/server.py
    • Removed OperationStatus import.
    • Changed return type hints for store and retrieve methods to tuple[bytes, bool].
    • Updated docstrings for store and retrieve to reflect boolean success flags.
    • Replaced warning and OperationStatus.NOT_REGISTERED return with an assert for unregistered instances.
    • Changed return values from OperationStatus codes to True or False.
  • tests/v1/multiprocess/test_blend_server.py
    • Removed OperationStatus import.
    • Updated assertions from OperationStatus.SUCCESS to True.
  • tests/v1/multiprocess/test_blend_server_v2.py
    • Removed OperationStatus import.
    • Updated assertions from OperationStatus.SUCCESS to True.
  • tests/v1/multiprocess/test_cache_server.py
    • Removed OperationStatus import.
    • Updated assertions from OperationStatus.SUCCESS to True.
    • Modified retrieve_keys to append the raw boolean result instead of comparing to OperationStatus.SUCCESS.
  • tests/v1/multiprocess/test_mq.py
    • Removed OperationStatus import.
    • Updated docstrings for test_mq_store and test_mq_retrieve to reflect (bytes, bool) return type.
    • Updated expected_response in helper.run_test from OperationStatus.SUCCESS to True.
  • tests/v1/multiprocess/test_mq_handler_helpers.py
    • Removed OperationStatus import.
    • Updated type hints for store_handler and retrieve_handler to tuple[bytes, bool].
    • Updated docstrings to reflect success flag.
    • Changed return values from OperationStatus.SUCCESS to True.
  • tests/v1/multiprocess/test_server_restart.py
    • Removed the entire test file.
Activity
  • This pull request is a direct revert of [MP] Support MP Server restart #2713, indicating that the previous implementation for multi-process server restart support encountered issues or was deemed not ready for production, as highlighted by the linked review comments.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

This reverts commit 55622e8.

Signed-off-by: ApostaC <yihua98@uchicago.edu>
@ApostaC ApostaC force-pushed the revert-2713-mp_server_restart branch from 5a39ed1 to b9f439d Compare March 10, 2026 19:41
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully reverts the support for MP Server restart, correctly removing the OperationStatus enum, related logic, and the server restart test file. However, a potential Denial of Service (DoS) vulnerability was identified in the MPCacheEngine.store and MPCacheEngine.retrieve methods in lmcache/v1/multiprocess/server.py. The use of assert statements for validating instance_id from external requests could lead to unhandled exceptions and server crashes, especially if assertions are disabled in production. It is recommended to replace these assertions with explicit error handling and return a failure status to the client. Minor suggestions have also been made to improve logging consistency and fix a typo.

Comment thread lmcache/v1/multiprocess/server.py
Comment thread lmcache/v1/multiprocess/server.py
Comment on lines +507 to 508
"store request for request_id=%s",
request_id,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the error logging for retrieve requests, consider including the s_result in the log message. This will be helpful for debugging failed store requests.

Suggested change
"store request for request_id=%s",
request_id,
"store request for request_id=%s, result=%s",
request_id, s_result,

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py
@ApostaC ApostaC added the full Run comprehensive tests on this PR label Mar 10, 2026
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ApostaC ApostaC merged commit 9af5412 into dev Mar 10, 2026
30 of 37 checks passed
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
This reverts commit 55622e8.

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026
This reverts commit 55622e8.

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Aaron Wu <aaron.wu@dell.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
This reverts commit 55622e8.

Signed-off-by: ApostaC <yihua98@uchicago.edu>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
This reverts commit 55622e8.

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants