[step3] remove unnecessary code in mp adapter by chunxiaozheng · Pull Request #2994 · LMCache/LMCache

chunxiaozheng · 2026-04-10T03:38:34Z

step1: #2935
step2: vllm-project/vllm#38810
step3: must merge after step2

Note

Medium Risk
Changes constructor signatures and removes legacy fallback fields (world_size, kv_rank, tp_size), which can break callers if not updated. Runtime behavior is mostly the same but now assumes parallel_strategy is always provided for keying and MLA/PP-group logic.

Overview
Simplifies the vLLM multiprocess adapter by removing backwards-compatibility paths that allowed parallel_strategy to be optional.

LMCacheMPSchedulerAdapter and LMCacheMPWorkerAdapter now require vllm_block_size and a non-optional ParallelStrategy, and all world_size/tp_size/worker_id accessors and MLA-related checks unconditionally derive from that strategy (dropping stored world_size, kv_rank, and tp_size defaults).

^{Reviewed by Cursor Bugbot for commit 67aca0e. Bugbot is set up for automated code reviews on this repo. Configure here.}

gemini-code-assist

Code Review

This pull request refactors the LMCacheMPSchedulerAdapter and related components to mandate the use of a ParallelStrategy object. It removes legacy parameters such as world_size, kv_rank, and tp_size from constructors, along with their corresponding internal attributes and fallback logic. The properties for accessing these values have been simplified to directly reference the parallel_strategy, improving code cleanliness and removing redundant checks. I have no feedback to provide.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 16bcdf9. Configure here.}

cursor · 2026-04-14T01:28:50Z

        context: zmq.Context,
        model_name: str,
-        world_size: int = 1,
-        kv_rank: int = 0,


Constructor signature breaks existing callers in same repo

High Severity

Removing world_size, kv_rank, and tp_size parameters from LMCacheMPSchedulerAdapter.__init__ and LMCacheMPWorkerAdapter.__init__ breaks the only in-repo callers in lmcache_mp_connector_0180.py. The create_scheduler_adapter and create_worker_adapter functions still pass the old positional arguments (world_size, kv_rank, block_size), causing world_size to land in vllm_block_size, kv_rank (an int) to land in parallel_strategy (expects ParallelStrategy), and block_size to collide with the keyword mq_timeout, producing a TypeError at construction time.

Additional Locations (1)

lmcache/integration/vllm/vllm_multi_process_adapter.py#L534-L543

^{Reviewed by Cursor Bugbot for commit 16bcdf9. Configure here.}

ApostaC

LGTM! May need to wait after tonight for CI to build a new base image using latest vllm

chunxiaozheng · 2026-04-14T06:20:05Z

LGTM! May need to wait after tonight for CI to build a new base image using latest vllm

okay

Signed-off-by: idellzheng <idellzheng@tencent.com>

The blend test's decoder has been failing with a 400s port-8200 timeout since LMCache#2994 landed (Apr 18). The actual error hidden in the decoder log is: TypeError: LMCacheMPWorkerAdapter.__init__() got multiple values for argument 'mq_timeout' Root cause: LMCache#2994 simplified `LMCacheMPSchedulerAdapter.__init__` and `LMCacheMPWorkerAdapter.__init__` to take positional args `(..., vllm_block_size, parallel_strategy, mq_timeout, ...)`, removing the backward-compat `(world_size, kv_rank, vllm_block_size, tp_size, parallel_strategy=None)` form. The vllm-bundled caller at vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py line 148 was not updated -- it still passes the legacy shape: LMCacheMPWorkerAdapter( server_url, zmq_context, model_name, world_size, kv_rank, vllm_block_size, mq_timeout=mq_timeout, heartbeat_interval=heartbeat_interval, ) After LMCache#2994, positional arg 6 (cache_config.block_size) lands on the `mq_timeout` parameter, and then the explicit `mq_timeout=` kwarg collides -- hence "multiple values". The decoder crashes during EngineCore init and never binds its HTTP port, which surfaces as the generic `wait_for_server` 400s timeout up in run-blend-test.sh. Because `lmcache_mp_connector.py` is imported lazily and the actual class is resolved via a try/except that prefers `lmcache.integration.vllm.vllm_multi_process_adapter` (this file) over the vllm-bundled fallback, an installed LMCache wheel with the new signature guarantees the crash on every recent vllm nightly. This is why every branch forked from dev since Apr 21 20:58 has failed blend with a mysterious decoder timeout. Fix: keep the new internal convention (the class stores `parallel_strategy` and derives world_size/tp_size from it) but accept BOTH calling conventions at the __init__ boundary. A small `_coerce_parallel_strategy()` helper inspects arg types at call time: - If the second polymorphic slot is a `ParallelStrategy`, use the new (vllm_block_size, parallel_strategy) form directly. - Otherwise treat the three slots as the legacy (world_size, kv_rank, vllm_block_size) and synthesize a minimal `ParallelStrategy` from them plus the `tp_size` kwarg. Callers currently in-tree (`lmcache_mp_connector_0180.py`) already use the new form, so they keep working. The vllm-bundled caller's legacy positional shape now also works without changes, unblocking the blend test decoder. Verified locally on the K3s host with the tensormesh/cacheblend image and the PR's source: - Before fix: decoder EngineCore crash on init, port 8200 never binds, run-blend-test.sh fails at the wait_for_server step. - After fix: prefiller ready in 43s, decoder ready in 16s, shuffle_doc_qa benchmark completes, "[PASS] Blend integration test completed successfully" in the build log. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

LMCache#2994 simplified `LMCacheMPSchedulerAdapter.__init__` and `LMCacheMPWorkerAdapter.__init__` signatures, removing the legacy (world_size, kv_rank, vllm_block_size, tp_size, parallel_strategy=None) params. The vllm-bundled caller at vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py:148 still passes the legacy shape (world_size, kv_rank, vllm_block_size positionally + mq_timeout as a kwarg). After LMCache#2994, positional arg 6 lands on the new `mq_timeout` param and the kwarg collides: TypeError: LMCacheMPWorkerAdapter.__init__() got multiple values for argument 'mq_timeout' The decoder crashes during EngineCore init, never binds port 8200, and surfaces as the generic 400s wait_for_server timeout in run-blend-test.sh. Every feature branch forked from dev since Apr 21 20:58 has failed blend this way. Fix: restore the pre-LMCache#2994 signature with legacy ints + optional `parallel_strategy`, and synthesise a minimal ParallelStrategy from world_size/kv_rank/tp_size when it isn't passed. Everything downstream that reads `self.parallel_strategy` keeps working unchanged. Verified locally on the K3s host with the tensormesh/cacheblend image: prefiller ready in 43s, decoder ready in 16s, shuffle_doc_qa benchmark completes, "[PASS] Blend integration test completed successfully" in the build log. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

chunxiaozheng requested review from YaoJiayi, deng451e, maobaolong and sammshen as code owners April 10, 2026 03:38

gemini-code-assist Bot reviewed Apr 10, 2026

View reviewed changes

chunxiaozheng force-pushed the mp-optimize-save branch from fb92013 to 16bcdf9 Compare April 14, 2026 01:26

cursor Bot reviewed Apr 14, 2026

View reviewed changes

ApostaC approved these changes Apr 14, 2026

View reviewed changes

sammshen approved these changes Apr 14, 2026

View reviewed changes

chunxiaozheng enabled auto-merge (squash) April 15, 2026 01:54

github-actions Bot added the full Run comprehensive tests on this PR label Apr 15, 2026

[step3] remove unnecessary code in mp adapter

9772504

Signed-off-by: idellzheng <idellzheng@tencent.com>

chunxiaozheng force-pushed the mp-optimize-save branch from 82e4b85 to 9772504 Compare April 17, 2026 01:31

Merge branch 'dev' into mp-optimize-save

67aca0e

chunxiaozheng merged commit 87ee8a2 into LMCache:dev Apr 18, 2026
36 of 38 checks passed

maobaolong pushed a commit to maobaolong/LMCache that referenced this pull request Apr 20, 2026

[step3] remove unnecessary code in mp adapter (LMCache#2994)

d814290

Signed-off-by: idellzheng <idellzheng@tencent.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[step3] remove unnecessary code in mp adapter#2994

[step3] remove unnecessary code in mp adapter#2994
chunxiaozheng merged 2 commits intoLMCache:devfrom
chunxiaozheng:mp-optimize-save

chunxiaozheng commented Apr 10, 2026 •

edited by cursor Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 14, 2026

Uh oh!

ApostaC left a comment

Uh oh!

chunxiaozheng commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chunxiaozheng commented Apr 10, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 14, 2026

Choose a reason for hiding this comment

Constructor signature breaks existing callers in same repo

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

chunxiaozheng commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chunxiaozheng commented Apr 10, 2026 •

edited by cursor Bot

Loading