[MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup. by maobaolong · Pull Request #2798 · LMCache/LMCache

maobaolong · 2026-03-17T08:09:20Z

What this PR does / why we need it:

sequenceDiagram
    participant V as vLLM Engine
    participant W as WorkerAdapter
    participant H as HeartbeatThread
    participant S as LMCache Server

    Note over V: Model loading + DeepGEMM warmup (~5min)
    V->>W: __init__()
    Note over W: Heartbeat NOT started yet.<br/>No PING, no false alarms.

    V->>W: register_kv_caches()
    W->>S: REGISTER_KV_CACHE
    S-->>W: OK
    W->>H: _start_heartbeat()
    Note over H: Heartbeat begins (double-checked locking)

    loop Every 10s
        H->>S: PING
        S-->>H: PONG → healthy ✅
    end

    Note over S: 💥 Server crashes
    H->>S: PING (timeout)
    Note over H: unhealthy → entering degraded mode

    Note over S: 🔄 Server recovers
    H->>S: PING
    S-->>H: PONG
    Note over H: healthy again → resuming normal operation ✅

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

…tart vllm for a huge module warmup. Signed-off-by: baoloongmao <baoloongmao@tencent.com>

gemini-code-assist · 2026-03-17T08:09:37Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the initialization of the heartbeat thread within the LMCache vLLM integration adapters. By delaying the start of the heartbeat mechanism until vLLM is fully initialized and ready to serve requests, it addresses issues where the system might be marked as unhealthy prematurely during long model warmup phases. This change improves the robustness and reliability of the LMCache server's health monitoring, especially when dealing with large language models.

Highlights

Delayed Heartbeat Thread Start: The heartbeat thread, which monitors the LMCache server's health, is no longer started immediately upon initialization of the VLLMMultiProcessAdapter and VLLMAdapter classes. This prevents premature unhealthy states during vLLM's potentially long warmup period for large models.
Lazy Heartbeat Initialization for Multi-Process Adapter: For the VLLMMultiProcessAdapter, the heartbeat thread is now lazily started on the first maybe_submit_lookup_request call, ensuring vLLM is ready to handle requests before health checks begin.
Post-Warmup Heartbeat Initialization for VLLM Adapter: For the VLLMAdapter, the heartbeat thread is now explicitly started after the register_kv_caches method completes, signifying that vLLM has finished model loading, KV cache allocation, and warmup.
Health State Transition Logging: A warning message is now logged when the LMCache server transitions from an unhealthy to a healthy state, providing clearer operational feedback.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

lmcache/integration/vllm/vllm_multi_process_adapter.py
- Added a warning log when the LMCache server becomes healthy again after being unhealthy.
- Modified the __init__ method in VLLMMultiProcessAdapter to defer the immediate start of the heartbeat thread.
- Introduced a new private method _ensure_heartbeat_started to lazily initialize and start the heartbeat thread on its first invocation.
- Integrated _ensure_heartbeat_started into maybe_submit_lookup_request to trigger heartbeat activation upon the first lookup.
- Updated the __init__ method in VLLMAdapter to also defer the immediate start of its heartbeat thread.
- Refactored the ConnectionError message formatting within register_kv_caches.
- Added a new private method _start_heartbeat to idempotently initialize and start the heartbeat thread.
- Called _start_heartbeat within register_kv_caches to ensure the heartbeat begins only after vLLM's KV caches are registered and warmup is complete.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to delay the start of the heartbeat thread to prevent the system from being marked as unhealthy during long vLLM warmups. The approach of lazily starting the thread is sound. However, I've identified a couple of race conditions in the implementation of the lazy initialization logic for the heartbeat threads in both LMCacheMPSchedulerAdapter and LMCacheMPWorkerAdapter. These could lead to multiple heartbeat threads being started if the initialization methods are called concurrently. I've provided suggestions to make these methods thread-safe using a lock. Additionally, I've pointed out a minor style regression where a modern f-string was replaced with older %-style formatting.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

ApostaC

LGTM!

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

chunxiaozheng

lgtm

…tart vllm for a huge module warmup. (LMCache#2798) * [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup. Signed-off-by: baoloongmao <baoloongmao@tencent.com>

…tart vllm for a huge module warmup. (LMCache#2798) * [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup. Signed-off-by: baoloongmao <baoloongmao@tencent.com> Signed-off-by: Aaron Wu <aaron.wu@dell.com>

…tart vllm for a huge module warmup. (LMCache#2798) * [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup. Signed-off-by: baoloongmao <baoloongmao@tencent.com>

[MP]: Support delay start heartbeat thread to avoid unhealthy while s…

e57e28a

…tart vllm for a huge module warmup. Signed-off-by: baoloongmao <baoloongmao@tencent.com>

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py Outdated

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py Outdated

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py

maobaolong requested a review from ApostaC March 17, 2026 08:11

Fix gemini's comment

269eabb

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

ApostaC approved these changes Mar 18, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) March 18, 2026 01:21

github-actions Bot added the full Run comprehensive tests on this PR label Mar 18, 2026

Fix UT

ed4d33e

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

chunxiaozheng approved these changes Mar 18, 2026

View reviewed changes

ApostaC merged commit 26ae274 into LMCache:dev Mar 18, 2026
26 of 28 checks passed

maobaolong mentioned this pull request Apr 3, 2026

[MP] Lazy start heartbeat thread when first req coming #2943

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.#2798

[MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.#2798
ApostaC merged 3 commits intoLMCache:devfrom
maobaolong:health_check

maobaolong commented Mar 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ApostaC left a comment

Uh oh!

chunxiaozheng left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maobaolong commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

chunxiaozheng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maobaolong commented Mar 17, 2026 •

edited

Loading