[Chore][CI] Force vLLM Model Runner V1 in the PD comprehensive test by sammshen · Pull Request #3502 · LMCache/LMCache

sammshen · 2026-06-03T01:04:28Z

The PD comprehensive test (k3-comprehensive-test) started timing out on 2026-06-02 when the floating vLLM nightly bumped from 0.22.1rc1.dev91 to .dev95. That delta includes vllm-project/vllm#43458, which enables Model Runner V2 (MRV2) by default for Llama/Mistral dense models -- including meta-llama/Llama-3.2-1B-Instruct, the model used by pd.yaml.

MRV2 is incompatible with KV cache connectors, so the prefiller/decoder (both driven through LMCacheConnectorV1) hang on the first request and the job is killed by the 30-minute timeout. The vLLM servers start fine; only the disaggregated request path hangs.

Pin the PD prefiller and decoder to MRV1 via VLLM_USE_V2_MODEL_RUNNER=0 until MRV2 supports KV connectors / reliably falls back to MRV1.

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

The PD comprehensive test (k3-comprehensive-test) started timing out on 2026-06-02 when the floating vLLM nightly bumped from 0.22.1rc1.dev91 to .dev95. That delta includes vllm-project/vllm#43458, which enables Model Runner V2 (MRV2) by default for Llama/Mistral dense models -- including meta-llama/Llama-3.2-1B-Instruct, the model used by pd.yaml. MRV2 is incompatible with KV cache connectors, so the prefiller/decoder (both driven through LMCacheConnectorV1) hang on the first request and the job is killed by the 30-minute timeout. The vLLM servers start fine; only the disaggregated request path hangs. Pin the PD prefiller and decoder to MRV1 via VLLM_USE_V2_MODEL_RUNNER=0 until MRV2 supports KV connectors / reliably falls back to MRV1. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

gemini-code-assist

Code Review

This pull request updates the Buildkite configuration in .buildkite/configs/pd.yaml to force the use of vLLM Model Runner V1 (VLLM_USE_V2_MODEL_RUNNER=0) for both the docker-prefiller and docker-decoder environments. This change prevents hangs caused by incompatibilities between Model Runner V2 and KV cache connectors. No review comments were provided, and the changes are clear and well-documented.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

hlin99

LGTM！

deng451e

6

…MCache#3502) [CI/CD] Force vLLM Model Runner V1 in the PD comprehensive test The PD comprehensive test (k3-comprehensive-test) started timing out on 2026-06-02 when the floating vLLM nightly bumped from 0.22.1rc1.dev91 to .dev95. That delta includes vllm-project/vllm#43458, which enables Model Runner V2 (MRV2) by default for Llama/Mistral dense models -- including meta-llama/Llama-3.2-1B-Instruct, the model used by pd.yaml. MRV2 is incompatible with KV cache connectors, so the prefiller/decoder (both driven through LMCacheConnectorV1) hang on the first request and the job is killed by the 30-minute timeout. The vLLM servers start fine; only the disaggregated request path hangs. Pin the PD prefiller and decoder to MRV1 via VLLM_USE_V2_MODEL_RUNNER=0 until MRV2 supports KV connectors / reliably falls back to MRV1. Signed-off-by: Samuel Shen <slshen@tensormesh.ai>

sammshen requested review from ApostaC, deng451e and hickeyma as code owners June 3, 2026 01:04

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

sammshen added the full Run comprehensive tests on this PR label Jun 3, 2026

hlin99 approved these changes Jun 3, 2026

View reviewed changes

deng451e approved these changes Jun 3, 2026

View reviewed changes

deng451e enabled auto-merge (squash) June 3, 2026 01:45

deng451e merged commit 9fbd742 into LMCache:dev Jun 3, 2026
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chore][CI] Force vLLM Model Runner V1 in the PD comprehensive test#3502

[Chore][CI] Force vLLM Model Runner V1 in the PD comprehensive test#3502
deng451e merged 1 commit into
LMCache:devfrom
sammshen:fix/pd-force-mrv1-model-runner

sammshen commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

hlin99 left a comment

Uh oh!

deng451e left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sammshen commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

hlin99 left a comment

Choose a reason for hiding this comment

Uh oh!

deng451e left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants