Skip to content

[step3] remove unnecessary code in mp adapter#2994

Merged
chunxiaozheng merged 2 commits intoLMCache:devfrom
chunxiaozheng:mp-optimize-save
Apr 18, 2026
Merged

[step3] remove unnecessary code in mp adapter#2994
chunxiaozheng merged 2 commits intoLMCache:devfrom
chunxiaozheng:mp-optimize-save

Conversation

@chunxiaozheng
Copy link
Copy Markdown
Collaborator

@chunxiaozheng chunxiaozheng commented Apr 10, 2026

step1: #2935
step2: vllm-project/vllm#38810
step3: must merge after step2


Note

Medium Risk
Changes constructor signatures and removes legacy fallback fields (world_size, kv_rank, tp_size), which can break callers if not updated. Runtime behavior is mostly the same but now assumes parallel_strategy is always provided for keying and MLA/PP-group logic.

Overview
Simplifies the vLLM multiprocess adapter by removing backwards-compatibility paths that allowed parallel_strategy to be optional.

LMCacheMPSchedulerAdapter and LMCacheMPWorkerAdapter now require vllm_block_size and a non-optional ParallelStrategy, and all world_size/tp_size/worker_id accessors and MLA-related checks unconditionally derive from that strategy (dropping stored world_size, kv_rank, and tp_size defaults).

Reviewed by Cursor Bugbot for commit 67aca0e. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the LMCacheMPSchedulerAdapter and related components to mandate the use of a ParallelStrategy object. It removes legacy parameters such as world_size, kv_rank, and tp_size from constructors, along with their corresponding internal attributes and fallback logic. The properties for accessing these values have been simplified to directly reference the parallel_strategy, improving code cleanliness and removing redundant checks. I have no feedback to provide.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 16bcdf9. Configure here.

context: zmq.Context,
model_name: str,
world_size: int = 1,
kv_rank: int = 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructor signature breaks existing callers in same repo

High Severity

Removing world_size, kv_rank, and tp_size parameters from LMCacheMPSchedulerAdapter.__init__ and LMCacheMPWorkerAdapter.__init__ breaks the only in-repo callers in lmcache_mp_connector_0180.py. The create_scheduler_adapter and create_worker_adapter functions still pass the old positional arguments (world_size, kv_rank, block_size), causing world_size to land in vllm_block_size, kv_rank (an int) to land in parallel_strategy (expects ParallelStrategy), and block_size to collide with the keyword mq_timeout, producing a TypeError at construction time.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 16bcdf9. Configure here.

Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! May need to wait after tonight for CI to build a new base image using latest vllm

@chunxiaozheng
Copy link
Copy Markdown
Collaborator Author

LGTM! May need to wait after tonight for CI to build a new base image using latest vllm

okay

@chunxiaozheng chunxiaozheng enabled auto-merge (squash) April 15, 2026 01:54
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Apr 15, 2026
Signed-off-by: idellzheng <idellzheng@tencent.com>
@chunxiaozheng chunxiaozheng merged commit 87ee8a2 into LMCache:dev Apr 18, 2026
36 of 38 checks passed
maobaolong pushed a commit to maobaolong/LMCache that referenced this pull request Apr 20, 2026
Signed-off-by: idellzheng <idellzheng@tencent.com>
sammshen added a commit to sammshen/LMCache that referenced this pull request Apr 22, 2026
The blend test's decoder has been failing with a 400s port-8200
timeout since LMCache#2994 landed (Apr 18). The actual error hidden in the
decoder log is:

    TypeError: LMCacheMPWorkerAdapter.__init__() got
    multiple values for argument 'mq_timeout'

Root cause: LMCache#2994 simplified `LMCacheMPSchedulerAdapter.__init__` and
`LMCacheMPWorkerAdapter.__init__` to take positional args
`(..., vllm_block_size, parallel_strategy, mq_timeout, ...)`, removing
the backward-compat `(world_size, kv_rank, vllm_block_size, tp_size,
parallel_strategy=None)` form. The vllm-bundled caller at
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py
line 148 was not updated -- it still passes the legacy shape:

    LMCacheMPWorkerAdapter(
        server_url, zmq_context, model_name,
        world_size, kv_rank, vllm_block_size,
        mq_timeout=mq_timeout,
        heartbeat_interval=heartbeat_interval,
    )

After LMCache#2994, positional arg 6 (cache_config.block_size) lands on the
`mq_timeout` parameter, and then the explicit `mq_timeout=` kwarg
collides -- hence "multiple values". The decoder crashes during
EngineCore init and never binds its HTTP port, which surfaces as the
generic `wait_for_server` 400s timeout up in run-blend-test.sh.

Because `lmcache_mp_connector.py` is imported lazily and the actual
class is resolved via a try/except that prefers
`lmcache.integration.vllm.vllm_multi_process_adapter` (this file) over
the vllm-bundled fallback, an installed LMCache wheel with the new
signature guarantees the crash on every recent vllm nightly. This is
why every branch forked from dev since Apr 21 20:58 has failed blend
with a mysterious decoder timeout.

Fix: keep the new internal convention (the class stores
`parallel_strategy` and derives world_size/tp_size from it) but
accept BOTH calling conventions at the __init__ boundary. A small
`_coerce_parallel_strategy()` helper inspects arg types at call
time:

- If the second polymorphic slot is a `ParallelStrategy`, use the new
  (vllm_block_size, parallel_strategy) form directly.
- Otherwise treat the three slots as the legacy
  (world_size, kv_rank, vllm_block_size) and synthesize a
  minimal `ParallelStrategy` from them plus the `tp_size` kwarg.

Callers currently in-tree (`lmcache_mp_connector_0180.py`) already
use the new form, so they keep working. The vllm-bundled caller's
legacy positional shape now also works without changes, unblocking
the blend test decoder.

Verified locally on the K3s host with the tensormesh/cacheblend
image and the PR's source:
- Before fix: decoder EngineCore crash on init, port 8200 never
  binds, run-blend-test.sh fails at the wait_for_server step.
- After fix: prefiller ready in 43s, decoder ready in 16s,
  shuffle_doc_qa benchmark completes, "[PASS] Blend integration
  test completed successfully" in the build log.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
sammshen added a commit to sammshen/LMCache that referenced this pull request Apr 22, 2026
LMCache#2994 simplified `LMCacheMPSchedulerAdapter.__init__` and
`LMCacheMPWorkerAdapter.__init__` signatures, removing the legacy
(world_size, kv_rank, vllm_block_size, tp_size, parallel_strategy=None)
params. The vllm-bundled caller at
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py:148
still passes the legacy shape (world_size, kv_rank, vllm_block_size
positionally + mq_timeout as a kwarg). After LMCache#2994, positional arg 6
lands on the new `mq_timeout` param and the kwarg collides:

    TypeError: LMCacheMPWorkerAdapter.__init__() got
    multiple values for argument 'mq_timeout'

The decoder crashes during EngineCore init, never binds port 8200,
and surfaces as the generic 400s wait_for_server timeout in
run-blend-test.sh. Every feature branch forked from dev since
Apr 21 20:58 has failed blend this way.

Fix: restore the pre-LMCache#2994 signature with legacy ints + optional
`parallel_strategy`, and synthesise a minimal ParallelStrategy from
world_size/kv_rank/tp_size when it isn't passed. Everything
downstream that reads `self.parallel_strategy` keeps working
unchanged.

Verified locally on the K3s host with the tensormesh/cacheblend
image: prefiller ready in 43s, decoder ready in 16s, shuffle_doc_qa
benchmark completes, "[PASS] Blend integration test completed
successfully" in the build log.

Signed-off-by: Samuel Shen <slshen@tensormesh.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants