[step3] remove unnecessary code in mp adapter#2994
[step3] remove unnecessary code in mp adapter#2994chunxiaozheng merged 2 commits intoLMCache:devfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the LMCacheMPSchedulerAdapter and related components to mandate the use of a ParallelStrategy object. It removes legacy parameters such as world_size, kv_rank, and tp_size from constructors, along with their corresponding internal attributes and fallback logic. The properties for accessing these values have been simplified to directly reference the parallel_strategy, improving code cleanliness and removing redundant checks. I have no feedback to provide.
fb92013 to
16bcdf9
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 16bcdf9. Configure here.
| context: zmq.Context, | ||
| model_name: str, | ||
| world_size: int = 1, | ||
| kv_rank: int = 0, |
There was a problem hiding this comment.
Constructor signature breaks existing callers in same repo
High Severity
Removing world_size, kv_rank, and tp_size parameters from LMCacheMPSchedulerAdapter.__init__ and LMCacheMPWorkerAdapter.__init__ breaks the only in-repo callers in lmcache_mp_connector_0180.py. The create_scheduler_adapter and create_worker_adapter functions still pass the old positional arguments (world_size, kv_rank, block_size), causing world_size to land in vllm_block_size, kv_rank (an int) to land in parallel_strategy (expects ParallelStrategy), and block_size to collide with the keyword mq_timeout, producing a TypeError at construction time.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 16bcdf9. Configure here.
ApostaC
left a comment
There was a problem hiding this comment.
LGTM! May need to wait after tonight for CI to build a new base image using latest vllm
okay |
Signed-off-by: idellzheng <idellzheng@tencent.com>
82e4b85 to
9772504
Compare
Signed-off-by: idellzheng <idellzheng@tencent.com>
The blend test's decoder has been failing with a 400s port-8200 timeout since LMCache#2994 landed (Apr 18). The actual error hidden in the decoder log is: TypeError: LMCacheMPWorkerAdapter.__init__() got multiple values for argument 'mq_timeout' Root cause: LMCache#2994 simplified `LMCacheMPSchedulerAdapter.__init__` and `LMCacheMPWorkerAdapter.__init__` to take positional args `(..., vllm_block_size, parallel_strategy, mq_timeout, ...)`, removing the backward-compat `(world_size, kv_rank, vllm_block_size, tp_size, parallel_strategy=None)` form. The vllm-bundled caller at vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py line 148 was not updated -- it still passes the legacy shape: LMCacheMPWorkerAdapter( server_url, zmq_context, model_name, world_size, kv_rank, vllm_block_size, mq_timeout=mq_timeout, heartbeat_interval=heartbeat_interval, ) After LMCache#2994, positional arg 6 (cache_config.block_size) lands on the `mq_timeout` parameter, and then the explicit `mq_timeout=` kwarg collides -- hence "multiple values". The decoder crashes during EngineCore init and never binds its HTTP port, which surfaces as the generic `wait_for_server` 400s timeout up in run-blend-test.sh. Because `lmcache_mp_connector.py` is imported lazily and the actual class is resolved via a try/except that prefers `lmcache.integration.vllm.vllm_multi_process_adapter` (this file) over the vllm-bundled fallback, an installed LMCache wheel with the new signature guarantees the crash on every recent vllm nightly. This is why every branch forked from dev since Apr 21 20:58 has failed blend with a mysterious decoder timeout. Fix: keep the new internal convention (the class stores `parallel_strategy` and derives world_size/tp_size from it) but accept BOTH calling conventions at the __init__ boundary. A small `_coerce_parallel_strategy()` helper inspects arg types at call time: - If the second polymorphic slot is a `ParallelStrategy`, use the new (vllm_block_size, parallel_strategy) form directly. - Otherwise treat the three slots as the legacy (world_size, kv_rank, vllm_block_size) and synthesize a minimal `ParallelStrategy` from them plus the `tp_size` kwarg. Callers currently in-tree (`lmcache_mp_connector_0180.py`) already use the new form, so they keep working. The vllm-bundled caller's legacy positional shape now also works without changes, unblocking the blend test decoder. Verified locally on the K3s host with the tensormesh/cacheblend image and the PR's source: - Before fix: decoder EngineCore crash on init, port 8200 never binds, run-blend-test.sh fails at the wait_for_server step. - After fix: prefiller ready in 43s, decoder ready in 16s, shuffle_doc_qa benchmark completes, "[PASS] Blend integration test completed successfully" in the build log. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
LMCache#2994 simplified `LMCacheMPSchedulerAdapter.__init__` and `LMCacheMPWorkerAdapter.__init__` signatures, removing the legacy (world_size, kv_rank, vllm_block_size, tp_size, parallel_strategy=None) params. The vllm-bundled caller at vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py:148 still passes the legacy shape (world_size, kv_rank, vllm_block_size positionally + mq_timeout as a kwarg). After LMCache#2994, positional arg 6 lands on the new `mq_timeout` param and the kwarg collides: TypeError: LMCacheMPWorkerAdapter.__init__() got multiple values for argument 'mq_timeout' The decoder crashes during EngineCore init, never binds port 8200, and surfaces as the generic 400s wait_for_server timeout in run-blend-test.sh. Every feature branch forked from dev since Apr 21 20:58 has failed blend this way. Fix: restore the pre-LMCache#2994 signature with legacy ints + optional `parallel_strategy`, and synthesise a minimal ParallelStrategy from world_size/kv_rank/tp_size when it isn't passed. Everything downstream that reads `self.parallel_strategy` keeps working unchanged. Verified locally on the K3s host with the tensormesh/cacheblend image: prefiller ready in 43s, decoder ready in 16s, shuffle_doc_qa benchmark completes, "[PASS] Blend integration test completed successfully" in the build log. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>


step1: #2935
step2: vllm-project/vllm#38810
step3: must merge after step2
Note
Medium Risk
Changes constructor signatures and removes legacy fallback fields (
world_size,kv_rank,tp_size), which can break callers if not updated. Runtime behavior is mostly the same but now assumesparallel_strategyis always provided for keying and MLA/PP-group logic.Overview
Simplifies the vLLM multiprocess adapter by removing backwards-compatibility paths that allowed
parallel_strategyto be optional.LMCacheMPSchedulerAdapterandLMCacheMPWorkerAdapternow requirevllm_block_sizeand a non-optionalParallelStrategy, and allworld_size/tp_size/worker_idaccessors and MLA-related checks unconditionally derive from that strategy (dropping storedworld_size,kv_rank, andtp_sizedefaults).Reviewed by Cursor Bugbot for commit 67aca0e. Bugbot is set up for automated code reviews on this repo. Configure here.