Skip to content

[MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.#2798

Merged
ApostaC merged 3 commits intoLMCache:devfrom
maobaolong:health_check
Mar 18, 2026
Merged

[MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.#2798
ApostaC merged 3 commits intoLMCache:devfrom
maobaolong:health_check

Conversation

@maobaolong
Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong commented Mar 17, 2026

What this PR does / why we need it:

sequenceDiagram
    participant V as vLLM Engine
    participant W as WorkerAdapter
    participant H as HeartbeatThread
    participant S as LMCache Server

    Note over V: Model loading + DeepGEMM warmup (~5min)
    V->>W: __init__()
    Note over W: Heartbeat NOT started yet.<br/>No PING, no false alarms.

    V->>W: register_kv_caches()
    W->>S: REGISTER_KV_CACHE
    S-->>W: OK
    W->>H: _start_heartbeat()
    Note over H: Heartbeat begins (double-checked locking)

    loop Every 10s
        H->>S: PING
        S-->>H: PONG → healthy ✅
    end

    Note over S: 💥 Server crashes
    H->>S: PING (timeout)
    Note over H: unhealthy → entering degraded mode

    Note over S: 🔄 Server recovers
    H->>S: PING
    S-->>H: PONG
    Note over H: healthy again → resuming normal operation ✅

Loading

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

…tart vllm for a huge module warmup.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the initialization of the heartbeat thread within the LMCache vLLM integration adapters. By delaying the start of the heartbeat mechanism until vLLM is fully initialized and ready to serve requests, it addresses issues where the system might be marked as unhealthy prematurely during long model warmup phases. This change improves the robustness and reliability of the LMCache server's health monitoring, especially when dealing with large language models.

Highlights

  • Delayed Heartbeat Thread Start: The heartbeat thread, which monitors the LMCache server's health, is no longer started immediately upon initialization of the VLLMMultiProcessAdapter and VLLMAdapter classes. This prevents premature unhealthy states during vLLM's potentially long warmup period for large models.
  • Lazy Heartbeat Initialization for Multi-Process Adapter: For the VLLMMultiProcessAdapter, the heartbeat thread is now lazily started on the first maybe_submit_lookup_request call, ensuring vLLM is ready to handle requests before health checks begin.
  • Post-Warmup Heartbeat Initialization for VLLM Adapter: For the VLLMAdapter, the heartbeat thread is now explicitly started after the register_kv_caches method completes, signifying that vLLM has finished model loading, KV cache allocation, and warmup.
  • Health State Transition Logging: A warning message is now logged when the LMCache server transitions from an unhealthy to a healthy state, providing clearer operational feedback.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • lmcache/integration/vllm/vllm_multi_process_adapter.py
    • Added a warning log when the LMCache server becomes healthy again after being unhealthy.
    • Modified the __init__ method in VLLMMultiProcessAdapter to defer the immediate start of the heartbeat thread.
    • Introduced a new private method _ensure_heartbeat_started to lazily initialize and start the heartbeat thread on its first invocation.
    • Integrated _ensure_heartbeat_started into maybe_submit_lookup_request to trigger heartbeat activation upon the first lookup.
    • Updated the __init__ method in VLLMAdapter to also defer the immediate start of its heartbeat thread.
    • Refactored the ConnectionError message formatting within register_kv_caches.
    • Added a new private method _start_heartbeat to idempotently initialize and start the heartbeat thread.
    • Called _start_heartbeat within register_kv_caches to ensure the heartbeat begins only after vLLM's KV caches are registered and warmup is complete.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to delay the start of the heartbeat thread to prevent the system from being marked as unhealthy during long vLLM warmups. The approach of lazily starting the thread is sound. However, I've identified a couple of race conditions in the implementation of the lazy initialization logic for the heartbeat threads in both LMCacheMPSchedulerAdapter and LMCacheMPWorkerAdapter. These could lead to multiple heartbeat threads being started if the initialization methods are called concurrently. I've provided suggestions to make these methods thread-safe using a lock. Additionally, I've pointed out a minor style regression where a modern f-string was replaced with older %-style formatting.

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py Outdated
Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py Outdated
Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py
@maobaolong maobaolong requested a review from ApostaC March 17, 2026 08:11
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC enabled auto-merge (squash) March 18, 2026 01:21
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 18, 2026
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Copy link
Copy Markdown
Collaborator

@chunxiaozheng chunxiaozheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ApostaC ApostaC merged commit 26ae274 into LMCache:dev Mar 18, 2026
26 of 28 checks passed
hyunyul-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Mar 20, 2026
…tart vllm for a huge module warmup. (LMCache#2798)

* [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 20, 2026
…tart vllm for a huge module warmup. (LMCache#2798)

* [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Signed-off-by: Aaron Wu <aaron.wu@dell.com>
deng451e pushed a commit to deng451e/LMCache that referenced this pull request Mar 21, 2026
…tart vllm for a huge module warmup. (LMCache#2798)

* [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
deng451e pushed a commit to deng451e/LMCache that referenced this pull request Mar 25, 2026
…tart vllm for a huge module warmup. (LMCache#2798)

* [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
deng451e pushed a commit to deng451e/LMCache that referenced this pull request Mar 27, 2026
…tart vllm for a huge module warmup. (LMCache#2798)

* [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…tart vllm for a huge module warmup. (LMCache#2798)

* [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…tart vllm for a huge module warmup. (LMCache#2798)

* [MP]: Support delay start heartbeat thread to avoid unhealthy while start vllm for a huge module warmup.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants