Skip to content

[MP][UX] Unified config + argparse for multiprocess mode#2695

Merged
ApostaC merged 1 commit intoLMCache:devfrom
ApostaC:local-dev/mp-config-unify
Mar 5, 2026
Merged

[MP][UX] Unified config + argparse for multiprocess mode#2695
ApostaC merged 1 commit intoLMCache:devfrom
ApostaC:local-dev/mp-config-unify

Conversation

@ApostaC
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC commented Mar 5, 2026

What this PR does / why we need it:

The multiprocess module (server.py, blend_server.py, http_server.py) has fragmented config and argument parsing:

  • server.py defines 5 inline add_argument calls, and run_cache_server() takes 8 individual parameters.
  • http_server.py has a standalone ServerConfig dataclass with a to_storage_manager_config() method that hardcodes L1 settings and eviction policy. It does not support L2 adapters, hash algorithm, or Prometheus from CLI. It also uses --zmq-host/--zmq-port instead of the --host/--port convention used everywhere else.
  • blend_server.py reuses parse_args from server.py (good), but still passes 8 individual params to its own run_cache_server().

This PR creates lmcache/v1/multiprocess/config.py following the established composable (DataclassConfig, add_*_args, parse_args_to_*_config) triple pattern from lmcache/v1/distributed/config.py, unifying the scattered config.

User-facing changes:

  • --zmq-host / --zmq-port (http_server.py only) are replaced by --host / --port (consistent across all servers). HTTP-specific args are now --http-host / --http-port.
  • --cpu-buffer-size is removed from http_server.py. Users now use --l1-size-gb (from the storage manager arg group) instead.
  • http_server.py now supports the full set of CLI args: L2 adapters, hash algorithm, Prometheus, eviction tuning — previously hardcoded or missing.
  • All three servers (server.py, blend_server.py, http_server.py) now show organized argument groups in --help: MP Server, L1 Memory, Eviction, L2 Adapters, Prometheus (and HTTP Frontend for http_server.py).

Special notes for your reviewers:

  • MPCacheEngine constructor is unchanged — it doesn't need host/port/max_workers (those are ZMQ transport concerns). run_cache_server() extracts the relevant fields from MPServerConfig.
  • The _configs dict pattern in http_server.py replaces the old _server_config global; it avoids the need for uninitialized StorageManagerConfig sentinels since run_http_server() always populates it before FastAPI's lifespan starts.

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Signed-off-by: ApostaC <yihua98@uchicago.edu>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the configuration and argument parsing mechanisms for the multiprocess cache servers. By centralizing server-related settings into a dedicated config.py module and adopting a consistent pattern for argument handling, it streamlines the setup and customization of the LMCache system. This change enhances maintainability, improves the user experience through clearer command-line options, and unlocks full configurability for the HTTP server, addressing previous limitations where certain parameters were hardcoded or inaccessible via the CLI.

Highlights

  • Unified Configuration: Introduced a new config.py module to centralize configuration for multiprocess servers, replacing fragmented settings across server.py, blend_server.py, and http_server.py.
  • Standardized Argument Parsing: Implemented a consistent (DataclassConfig, add_*_args, parse_args_to_*_config) pattern for argument handling, improving maintainability and clarity across all server modules.
  • CLI Argument Consistency: Standardized ZMQ server host/port arguments to --host and --port, and introduced dedicated --http-host and --http-port for the HTTP frontend, ensuring a consistent command-line interface.
  • Enhanced HTTP Server Configurability: Enabled http_server.py to support a comprehensive set of CLI arguments, including L2 adapters, hash algorithms, Prometheus, and eviction tuning, which were previously hardcoded or unavailable.
  • Improved Help Output: Organized command-line arguments into logical groups (MP Server, L1 Memory, Eviction, L2 Adapters, Prometheus, HTTP Frontend) for a clearer and more user-friendly --help output.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lmcache/v1/multiprocess/blend_server.py
    • Imported MPServerConfig and parse_args_to_mp_server_config for unified configuration.
    • Modified run_cache_server function signature to accept a single mp_config: MPServerConfig object, replacing multiple individual parameters.
    • Updated internal calls within run_cache_server to retrieve configuration values from the mp_config object.
    • Refactored the main execution block (if __name__ == "__main__":) to parse mp_config from command-line arguments and pass it to run_cache_server.
  • lmcache/v1/multiprocess/config.py
    • Added a new file to centralize multiprocess server and HTTP frontend configurations.
    • Defined MPServerConfig dataclass to encapsulate ZMQ server parameters (host, port, chunk_size, max_workers, hash_algorithm).
    • Defined HTTPFrontendConfig dataclass to encapsulate HTTP server parameters (http_host, http_port).
    • Implemented add_mp_server_args and parse_args_to_mp_server_config functions for handling ZMQ server command-line arguments.
    • Implemented add_http_frontend_args and parse_args_to_http_frontend_config functions for handling HTTP frontend command-line arguments.
  • lmcache/v1/multiprocess/http_server.py
    • Removed the custom ServerConfig dataclass and its to_storage_manager_config method.
    • Replaced the global _server_config with a module-level _configs dictionary to store various configuration objects.
    • Updated the lifespan async context manager to retrieve mp_config, storage_manager_config, and prometheus_config from the new _configs dictionary.
    • Modified run_http_server to accept http_config, mp_config, storage_manager_config, and prometheus_config objects directly, and populate the _configs dictionary.
    • Refactored parse_args to utilize the new add_http_frontend_args, add_mp_server_args, add_storage_manager_args, and add_prometheus_args functions for comprehensive argument parsing.
    • Updated the main execution block (if __name__ == "__main__":) to parse all relevant configuration types from arguments and pass them to run_http_server.
  • lmcache/v1/multiprocess/server.py
    • Imported MPServerConfig, add_mp_server_args, and parse_args_to_mp_server_config for unified configuration.
    • Modified run_cache_server function signature to accept a single mp_config: MPServerConfig object, replacing multiple individual parameters.
    • Updated internal calls within run_cache_server to retrieve configuration values from the mp_config object.
    • Refactored parse_args to utilize the new add_mp_server_args, add_storage_manager_args, and add_prometheus_args functions.
    • Updated the main execution block (if __name__ == "__main__":) to parse mp_config from command-line arguments and pass it to run_cache_server.
  • tests/v1/multiprocess/test_blend_server.py
    • Imported MPServerConfig for use in test configurations.
    • Modified the server_process_runner test helper to instantiate MPServerConfig and pass it to the run_cache_server function, aligning with the new API.
  • tests/v1/multiprocess/test_cache_server.py
    • Imported MPServerConfig for use in test configurations.
    • Modified the server_process_runner test helper to instantiate MPServerConfig and pass it to the run_cache_server function, aligning with the new API.
Activity
  • No human activity (comments, reviews, etc.) was provided in the context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ApostaC ApostaC requested review from KuntaiDu and Oasis-Git March 5, 2026 05:30
@ApostaC ApostaC changed the title Unified config + argparse for multiprocess mode [MP][UX] Unified config + argparse for multiprocess mode Mar 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request unifies the configuration for the multiprocess servers, significantly improving maintainability and consistency through the introduction of lmcache/v1/multiprocess/config.py and its composable dataclass pattern. However, a critical insecure deserialization vulnerability that could lead to Remote Code Execution (RCE) has been identified. This is due to the ZMQ servers' reliance on pickle for deserializing CUDA IPC handles in custom_types.py for inter-process communication. While this PR modifies the configuration controlling server exposure rather than introducing the pickle usage itself, it is strongly recommended to replace pickle with a safe serialization format (like JSON or MessagePack). Additionally, to further enhance maintainability, consider reducing redundancy in argument parsing and avoiding module-level global state for configuration.

Comment thread lmcache/v1/multiprocess/server.py
Comment on lines +686 to +688
bind_url=f"tcp://{mp_config.host}:{mp_config.port}",
context=context,
max_workers=mp_config.max_workers,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Similar to the main cache server, the blend server initialized here is vulnerable to RCE via insecure pickle deserialization in the CudaIPCWrapper decoder. An attacker who can connect to the ZMQ port can execute arbitrary code on the server. The new configuration system introduced in this PR makes it easier to expose this vulnerable service to the network.

Comment thread lmcache/v1/multiprocess/http_server.py
Comment on lines +49 to +95
def add_mp_server_args(
parser: argparse.ArgumentParser,
) -> argparse.ArgumentParser:
"""
Add MP server configuration arguments to an existing parser.

Args:
parser: The argument parser to add arguments to.

Returns:
The same parser with MP server arguments added.
"""
mp_group = parser.add_argument_group(
"MP Server", "Configuration for the ZMQ multiprocess cache server"
)
mp_group.add_argument(
"--host",
type=str,
default="localhost",
help="Host to bind the ZMQ server. Default is localhost.",
)
mp_group.add_argument(
"--port",
type=int,
default=5555,
help="Port to bind the ZMQ server. Default is 5555.",
)
mp_group.add_argument(
"--chunk-size",
type=int,
default=256,
help="Chunk size for KV cache operations. Default is 256.",
)
mp_group.add_argument(
"--max-workers",
type=int,
default=1,
help="Maximum number of worker threads. Default is 1.",
)
mp_group.add_argument(
"--hash-algorithm",
type=str,
default="blake3",
help="Hash algorithm for token-based operations "
"(builtin, sha256_cbor, blake3). Default is blake3.",
)
return parser
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default values for the command-line arguments are hardcoded here, but they are also defined in the MPServerConfig dataclass. This duplication can lead to inconsistencies if one is updated and the other is not. To improve maintainability, you can source the default values from the DEFAULT_MP_SERVER_CONFIG instance. This ensures there is a single source of truth for the default configuration.

def add_mp_server_args(
    parser: argparse.ArgumentParser,
) -> argparse.ArgumentParser:
    """
    Add MP server configuration arguments to an existing parser.

    Args:
        parser: The argument parser to add arguments to.

    Returns:
        The same parser with MP server arguments added.
    """
    mp_group = parser.add_argument_group(
        "MP Server", "Configuration for the ZMQ multiprocess cache server"
    )
    mp_group.add_argument(
        "--host",
        type=str,
        default=DEFAULT_MP_SERVER_CONFIG.host,
        help=f"Host to bind the ZMQ server. Default is {DEFAULT_MP_SERVER_CONFIG.host}.",
    )
    mp_group.add_argument(
        "--port",
        type=int,
        default=DEFAULT_MP_SERVER_CONFIG.port,
        help=f"Port to bind the ZMQ server. Default is {DEFAULT_MP_SERVER_CONFIG.port}.",
    )
    mp_group.add_argument(
        "--chunk-size",
        type=int,
        default=DEFAULT_MP_SERVER_CONFIG.chunk_size,
        help=f"Chunk size for KV cache operations. Default is {DEFAULT_MP_SERVER_CONFIG.chunk_size}.",
    )
    mp_group.add_argument(
        "--max-workers",
        type=int,
        default=DEFAULT_MP_SERVER_CONFIG.max_workers,
        help=f"Maximum number of worker threads. Default is {DEFAULT_MP_SERVER_CONFIG.max_workers}.",
    )
    mp_group.add_argument(
        "--hash-algorithm",
        type=str,
        default=DEFAULT_MP_SERVER_CONFIG.hash_algorithm,
        help="Hash algorithm for token-based operations "
        f"(builtin, sha256_cbor, blake3). Default is {DEFAULT_MP_SERVER_CONFIG.hash_algorithm}.",
    )
    return parser

Comment on lines +119 to +146
def add_http_frontend_args(
parser: argparse.ArgumentParser,
) -> argparse.ArgumentParser:
"""
Add HTTP frontend configuration arguments to an existing parser.

Args:
parser: The argument parser to add arguments to.

Returns:
The same parser with HTTP frontend arguments added.
"""
http_group = parser.add_argument_group(
"HTTP Frontend", "Configuration for the HTTP frontend server"
)
http_group.add_argument(
"--http-host",
type=str,
default="0.0.0.0",
help="Host to bind the HTTP server. Default is 0.0.0.0.",
)
http_group.add_argument(
"--http-port",
type=int,
default=8000,
help="Port to bind the HTTP server. Default is 8000.",
)
return parser
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to add_mp_server_args, the default values for the HTTP frontend arguments are hardcoded here and in the HTTPFrontendConfig dataclass. To avoid this duplication and improve maintainability, you can use the DEFAULT_HTTP_FRONTEND_CONFIG instance as the single source of truth for default values.

def add_http_frontend_args(
    parser: argparse.ArgumentParser,
) -> argparse.ArgumentParser:
    """
    Add HTTP frontend configuration arguments to an existing parser.

    Args:
        parser: The argument parser to add arguments to.

    Returns:
        The same parser with HTTP frontend arguments added.
    """
    http_group = parser.add_argument_group(
        "HTTP Frontend", "Configuration for the HTTP frontend server"
    )
    http_group.add_argument(
        "--http-host",
        type=str,
        default=DEFAULT_HTTP_FRONTEND_CONFIG.http_host,
        help=f"Host to bind the HTTP server. Default is {DEFAULT_HTTP_FRONTEND_CONFIG.http_host}.",
    )
    http_group.add_argument(
        "--http-port",
        type=int,
        default=DEFAULT_HTTP_FRONTEND_CONFIG.http_port,
        help=f"Port to bind the HTTP server. Default is {DEFAULT_HTTP_FRONTEND_CONFIG.http_port}.",
    )
    return parser

_server_config = ServerConfig()
# Module-level config holders, set by run_http_server() before FastAPI startup.
# Stored in a dict so the lifespan closure captures the mutable container.
_configs: dict = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a module-level dictionary _configs to pass configuration to the FastAPI lifespan function works, but it's a form of global state that can make the application harder to reason about and test. A cleaner approach is to use FastAPI's app.state to store configuration, as it's the designated place for this kind of application-level state.

You can refactor this by attaching the configuration objects to app.state in run_http_server before starting uvicorn, and then accessing them from app.state within the lifespan context manager. This avoids the module-level global and makes the data flow more explicit.

Here's how you could change it:

  1. Remove the _configs dictionary at line 42.

  2. In run_http_server, attach the configs to app.state:

    def run_http_server(
        # ...
    ) -> None:
        # ...
        app.state.mp_config = mp_config
        app.state.storage_manager_config = storage_manager_config
        app.state.prometheus_config = prometheus_config
    
        config = uvicorn.Config(
            app=app,
            host=http_config.http_host,
            port=http_config.http_port,
            # ...
        )
        server = uvicorn.Server(config)
        server.run()
  3. In lifespan, read the configs from app.state:

    @asynccontextmanager
    async def lifespan(app: FastAPI):
        # ...
        zmq_server, engine = run_cache_server(
            mp_config=app.state.mp_config,
            storage_manager_config=app.state.storage_manager_config,
            prometheus_config=app.state.prometheus_config,
            return_engine=True,
        )
        # ...

@ApostaC ApostaC added the full Run comprehensive tests on this PR label Mar 5, 2026
@ApostaC ApostaC enabled auto-merge (squash) March 5, 2026 07:25
Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC merged commit e7eb75f into LMCache:dev Mar 5, 2026
27 of 29 checks passed
mauryaavinash95 pushed a commit to mauryaavinash95/LMCache that referenced this pull request Mar 7, 2026
Unify the config parsing for mp servers

Signed-off-by: ApostaC <yihua98@uchicago.edu>
shaoxiawjc pushed a commit to shaoxiawjc/LMCache that referenced this pull request Mar 11, 2026
Unify the config parsing for mp servers

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: shaoxiawjc <wjc2800@163.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026
Unify the config parsing for mp servers

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Aaron Wu <aaron.wu@dell.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
Unify the config parsing for mp servers

Signed-off-by: ApostaC <yihua98@uchicago.edu>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
Unify the config parsing for mp servers

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants