Skip to content

fix: relax worker port count assertion#2867

Merged
DongDongJu merged 2 commits intoLMCache:devfrom
can-sun:fix/relax-worker-port-assertion
Apr 1, 2026
Merged

fix: relax worker port count assertion#2867
DongDongJu merged 2 commits intoLMCache:devfrom
can-sun:fix/relax-worker-port-assertion

Conversation

@can-sun
Copy link
Copy Markdown
Contributor

@can-sun can-sun commented Mar 25, 2026

Problem

The strict == assertions on lmcache_worker_ports length in LMCacheWorker reject valid configurations where more ports are provided than the effective world_size or worker count.

This is problematic in two scenarios:

  1. MLA models reduce the effective world_size (vLLM divides by tp_size), so users providing ports based on total GPU count (TP × PP) hit the assertion. For example, on a p5 instance with 8 GPUs, providing 8 ports fails when MLA reduces world_size to 1.
  2. Config reuse across instance types — a single config with 8 ports (for p5) fails on g5.24xlarge which only has 4 GPUs.

Fix

Change == to >= in both assertions in worker.py:

  • len(config.lmcache_worker_ports) >= metadata.world_size
  • len(config.lmcache_worker_ports) >= len(lmcache_worker_ids)

Extra ports are simply unused — they are never bound to sockets or registered with the controller. The port selection logic indexes by worker_id or lmcache_worker_ids.index(), so extra entries are harmless. This is consistent with how P2P ports (p2p_init_ports, p2p_lookup_ports) already work.

Split out from #2732 per reviewer feedback.


Note

Low Risk
Low risk: only loosens config validation for lmcache_worker_ports without changing runtime socket binding logic; main risk is allowing misconfigured port lists that are too short for the selected workers to slip through until index errors occur elsewhere.

Overview
Relaxes LMCacheWorker startup validation so lmcache_worker_ports can contain more entries than the effective world_size or explicit lmcache_worker_ids list.

The strict == length checks are replaced with >=, with updated inline documentation explaining why extra ports are safe (especially for MLA where vLLM may report a reduced world_size).

Written by Cursor Bugbot for commit 617c865. This will update automatically on new commits. Configure here.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the flexibility of LMCacheWorker port configurations by relaxing strict assertion checks. Previously, the system would reject valid setups if the number of provided lmcache_worker_ports did not exactly match the effective world_size or worker count. This change addresses issues encountered with MLA models and configuration reuse across varied instance types, allowing users to provide more ports than strictly required without causing assertion failures.

Highlights

  • Port Assertion Relaxation: The strict equality ("==") assertions for lmcache_worker_ports length in LMCacheWorker have been relaxed to greater-than-or-equal (">="). This allows for more flexible port configurations, preventing errors in scenarios like MLA models reducing world_size or reusing configurations across different instance types.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies lmcache/v1/cache_controller/worker.py to relax the assertion conditions for lmcache_worker_ports. The changes allow the number of configured worker ports to be greater than or equal to the number of active workers, providing more flexibility in port allocation. There is no feedback to provide.

@can-sun can-sun changed the title Relax worker port count assertion fix: Relax worker port count assertion Mar 25, 2026
@can-sun can-sun changed the title fix: Relax worker port count assertion fix: relax worker port count assertion Mar 25, 2026
@sammshen
Copy link
Copy Markdown
Contributor

I think this makes sense but could you add some documentation in the code on exactly how the number of worker ports is decided and the semantics of the world_size for different models above the conditional?

Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So IIUC, this is user error/fault which is that user configured more than actual # of gpu needed at workload.
Is this true? then not sure that why we strictly design this at the first time.
@ApostaC Do you have any idea on this?

@sammshen
Copy link
Copy Markdown
Contributor

@cursor review

@cursor
Copy link
Copy Markdown

cursor Bot commented Mar 31, 2026

Skipping Bugbot: Bugbot is disabled for this repository. Visit the Bugbot dashboard to update your settings.

@sammshen
Copy link
Copy Markdown
Contributor

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread lmcache/v1/cache_controller/worker.py
Signed-off-by: Can Sun <sucan@amazon.com>
@can-sun can-sun force-pushed the fix/relax-worker-port-assertion branch from 7f8ada7 to ee262c6 Compare April 1, 2026 16:35
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@DongDongJu DongDongJu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DongDongJu DongDongJu enabled auto-merge (squash) April 1, 2026 19:56
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Apr 1, 2026
@DongDongJu DongDongJu merged commit 521c23b into LMCache:dev Apr 1, 2026
34 of 35 checks passed
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
Relax worker port count assertion

Signed-off-by: Can Sun <sucan@amazon.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
Relax worker port count assertion

Signed-off-by: Can Sun <sucan@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants