[MP][UX] Unified config + argparse for multiprocess mode#2695
[MP][UX] Unified config + argparse for multiprocess mode#2695ApostaC merged 1 commit intoLMCache:devfrom
Conversation
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the configuration and argument parsing mechanisms for the multiprocess cache servers. By centralizing server-related settings into a dedicated Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request unifies the configuration for the multiprocess servers, significantly improving maintainability and consistency through the introduction of lmcache/v1/multiprocess/config.py and its composable dataclass pattern. However, a critical insecure deserialization vulnerability that could lead to Remote Code Execution (RCE) has been identified. This is due to the ZMQ servers' reliance on pickle for deserializing CUDA IPC handles in custom_types.py for inter-process communication. While this PR modifies the configuration controlling server exposure rather than introducing the pickle usage itself, it is strongly recommended to replace pickle with a safe serialization format (like JSON or MessagePack). Additionally, to further enhance maintainability, consider reducing redundancy in argument parsing and avoiding module-level global state for configuration.
| bind_url=f"tcp://{mp_config.host}:{mp_config.port}", | ||
| context=context, | ||
| max_workers=mp_config.max_workers, |
There was a problem hiding this comment.
Similar to the main cache server, the blend server initialized here is vulnerable to RCE via insecure pickle deserialization in the CudaIPCWrapper decoder. An attacker who can connect to the ZMQ port can execute arbitrary code on the server. The new configuration system introduced in this PR makes it easier to expose this vulnerable service to the network.
| def add_mp_server_args( | ||
| parser: argparse.ArgumentParser, | ||
| ) -> argparse.ArgumentParser: | ||
| """ | ||
| Add MP server configuration arguments to an existing parser. | ||
|
|
||
| Args: | ||
| parser: The argument parser to add arguments to. | ||
|
|
||
| Returns: | ||
| The same parser with MP server arguments added. | ||
| """ | ||
| mp_group = parser.add_argument_group( | ||
| "MP Server", "Configuration for the ZMQ multiprocess cache server" | ||
| ) | ||
| mp_group.add_argument( | ||
| "--host", | ||
| type=str, | ||
| default="localhost", | ||
| help="Host to bind the ZMQ server. Default is localhost.", | ||
| ) | ||
| mp_group.add_argument( | ||
| "--port", | ||
| type=int, | ||
| default=5555, | ||
| help="Port to bind the ZMQ server. Default is 5555.", | ||
| ) | ||
| mp_group.add_argument( | ||
| "--chunk-size", | ||
| type=int, | ||
| default=256, | ||
| help="Chunk size for KV cache operations. Default is 256.", | ||
| ) | ||
| mp_group.add_argument( | ||
| "--max-workers", | ||
| type=int, | ||
| default=1, | ||
| help="Maximum number of worker threads. Default is 1.", | ||
| ) | ||
| mp_group.add_argument( | ||
| "--hash-algorithm", | ||
| type=str, | ||
| default="blake3", | ||
| help="Hash algorithm for token-based operations " | ||
| "(builtin, sha256_cbor, blake3). Default is blake3.", | ||
| ) | ||
| return parser |
There was a problem hiding this comment.
The default values for the command-line arguments are hardcoded here, but they are also defined in the MPServerConfig dataclass. This duplication can lead to inconsistencies if one is updated and the other is not. To improve maintainability, you can source the default values from the DEFAULT_MP_SERVER_CONFIG instance. This ensures there is a single source of truth for the default configuration.
def add_mp_server_args(
parser: argparse.ArgumentParser,
) -> argparse.ArgumentParser:
"""
Add MP server configuration arguments to an existing parser.
Args:
parser: The argument parser to add arguments to.
Returns:
The same parser with MP server arguments added.
"""
mp_group = parser.add_argument_group(
"MP Server", "Configuration for the ZMQ multiprocess cache server"
)
mp_group.add_argument(
"--host",
type=str,
default=DEFAULT_MP_SERVER_CONFIG.host,
help=f"Host to bind the ZMQ server. Default is {DEFAULT_MP_SERVER_CONFIG.host}.",
)
mp_group.add_argument(
"--port",
type=int,
default=DEFAULT_MP_SERVER_CONFIG.port,
help=f"Port to bind the ZMQ server. Default is {DEFAULT_MP_SERVER_CONFIG.port}.",
)
mp_group.add_argument(
"--chunk-size",
type=int,
default=DEFAULT_MP_SERVER_CONFIG.chunk_size,
help=f"Chunk size for KV cache operations. Default is {DEFAULT_MP_SERVER_CONFIG.chunk_size}.",
)
mp_group.add_argument(
"--max-workers",
type=int,
default=DEFAULT_MP_SERVER_CONFIG.max_workers,
help=f"Maximum number of worker threads. Default is {DEFAULT_MP_SERVER_CONFIG.max_workers}.",
)
mp_group.add_argument(
"--hash-algorithm",
type=str,
default=DEFAULT_MP_SERVER_CONFIG.hash_algorithm,
help="Hash algorithm for token-based operations "
f"(builtin, sha256_cbor, blake3). Default is {DEFAULT_MP_SERVER_CONFIG.hash_algorithm}.",
)
return parser| def add_http_frontend_args( | ||
| parser: argparse.ArgumentParser, | ||
| ) -> argparse.ArgumentParser: | ||
| """ | ||
| Add HTTP frontend configuration arguments to an existing parser. | ||
|
|
||
| Args: | ||
| parser: The argument parser to add arguments to. | ||
|
|
||
| Returns: | ||
| The same parser with HTTP frontend arguments added. | ||
| """ | ||
| http_group = parser.add_argument_group( | ||
| "HTTP Frontend", "Configuration for the HTTP frontend server" | ||
| ) | ||
| http_group.add_argument( | ||
| "--http-host", | ||
| type=str, | ||
| default="0.0.0.0", | ||
| help="Host to bind the HTTP server. Default is 0.0.0.0.", | ||
| ) | ||
| http_group.add_argument( | ||
| "--http-port", | ||
| type=int, | ||
| default=8000, | ||
| help="Port to bind the HTTP server. Default is 8000.", | ||
| ) | ||
| return parser |
There was a problem hiding this comment.
Similar to add_mp_server_args, the default values for the HTTP frontend arguments are hardcoded here and in the HTTPFrontendConfig dataclass. To avoid this duplication and improve maintainability, you can use the DEFAULT_HTTP_FRONTEND_CONFIG instance as the single source of truth for default values.
def add_http_frontend_args(
parser: argparse.ArgumentParser,
) -> argparse.ArgumentParser:
"""
Add HTTP frontend configuration arguments to an existing parser.
Args:
parser: The argument parser to add arguments to.
Returns:
The same parser with HTTP frontend arguments added.
"""
http_group = parser.add_argument_group(
"HTTP Frontend", "Configuration for the HTTP frontend server"
)
http_group.add_argument(
"--http-host",
type=str,
default=DEFAULT_HTTP_FRONTEND_CONFIG.http_host,
help=f"Host to bind the HTTP server. Default is {DEFAULT_HTTP_FRONTEND_CONFIG.http_host}.",
)
http_group.add_argument(
"--http-port",
type=int,
default=DEFAULT_HTTP_FRONTEND_CONFIG.http_port,
help=f"Port to bind the HTTP server. Default is {DEFAULT_HTTP_FRONTEND_CONFIG.http_port}.",
)
return parser| _server_config = ServerConfig() | ||
| # Module-level config holders, set by run_http_server() before FastAPI startup. | ||
| # Stored in a dict so the lifespan closure captures the mutable container. | ||
| _configs: dict = {} |
There was a problem hiding this comment.
Using a module-level dictionary _configs to pass configuration to the FastAPI lifespan function works, but it's a form of global state that can make the application harder to reason about and test. A cleaner approach is to use FastAPI's app.state to store configuration, as it's the designated place for this kind of application-level state.
You can refactor this by attaching the configuration objects to app.state in run_http_server before starting uvicorn, and then accessing them from app.state within the lifespan context manager. This avoids the module-level global and makes the data flow more explicit.
Here's how you could change it:
-
Remove the
_configsdictionary at line 42. -
In
run_http_server, attach the configs toapp.state:def run_http_server( # ... ) -> None: # ... app.state.mp_config = mp_config app.state.storage_manager_config = storage_manager_config app.state.prometheus_config = prometheus_config config = uvicorn.Config( app=app, host=http_config.http_host, port=http_config.http_port, # ... ) server = uvicorn.Server(config) server.run()
-
In
lifespan, read the configs fromapp.state:@asynccontextmanager async def lifespan(app: FastAPI): # ... zmq_server, engine = run_cache_server( mp_config=app.state.mp_config, storage_manager_config=app.state.storage_manager_config, prometheus_config=app.state.prometheus_config, return_engine=True, ) # ...
Unify the config parsing for mp servers Signed-off-by: ApostaC <yihua98@uchicago.edu>
Unify the config parsing for mp servers Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: shaoxiawjc <wjc2800@163.com>
Unify the config parsing for mp servers Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Aaron Wu <aaron.wu@dell.com>
Unify the config parsing for mp servers Signed-off-by: ApostaC <yihua98@uchicago.edu>
Unify the config parsing for mp servers Signed-off-by: ApostaC <yihua98@uchicago.edu>
What this PR does / why we need it:
The multiprocess module (
server.py,blend_server.py,http_server.py) has fragmented config and argument parsing:server.pydefines 5 inlineadd_argumentcalls, andrun_cache_server()takes 8 individual parameters.http_server.pyhas a standaloneServerConfigdataclass with ato_storage_manager_config()method that hardcodes L1 settings and eviction policy. It does not support L2 adapters, hash algorithm, or Prometheus from CLI. It also uses--zmq-host/--zmq-portinstead of the--host/--portconvention used everywhere else.blend_server.pyreusesparse_argsfromserver.py(good), but still passes 8 individual params to its ownrun_cache_server().This PR creates
lmcache/v1/multiprocess/config.pyfollowing the established composable(DataclassConfig, add_*_args, parse_args_to_*_config)triple pattern fromlmcache/v1/distributed/config.py, unifying the scattered config.User-facing changes:
--zmq-host/--zmq-port(http_server.py only) are replaced by--host/--port(consistent across all servers). HTTP-specific args are now--http-host/--http-port.--cpu-buffer-sizeis removed from http_server.py. Users now use--l1-size-gb(from the storage manager arg group) instead.http_server.pynow supports the full set of CLI args: L2 adapters, hash algorithm, Prometheus, eviction tuning — previously hardcoded or missing.server.py,blend_server.py,http_server.py) now show organized argument groups in--help: MP Server, L1 Memory, Eviction, L2 Adapters, Prometheus (and HTTP Frontend for http_server.py).Special notes for your reviewers:
MPCacheEngineconstructor is unchanged — it doesn't need host/port/max_workers (those are ZMQ transport concerns).run_cache_server()extracts the relevant fields fromMPServerConfig._configsdict pattern inhttp_server.pyreplaces the old_server_configglobal; it avoids the need for uninitializedStorageManagerConfigsentinels sincerun_http_server()always populates it before FastAPI's lifespan starts.If applicable: