fix: relax worker port count assertion#2867
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the flexibility of Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request modifies lmcache/v1/cache_controller/worker.py to relax the assertion conditions for lmcache_worker_ports. The changes allow the number of configured worker ports to be greater than or equal to the number of active workers, providing more flexibility in port allocation. There is no feedback to provide.
|
I think this makes sense but could you add some documentation in the code on exactly how the number of worker ports is decided and the semantics of the world_size for different models above the conditional? |
DongDongJu
left a comment
There was a problem hiding this comment.
So IIUC, this is user error/fault which is that user configured more than actual # of gpu needed at workload.
Is this true? then not sure that why we strictly design this at the first time.
@ApostaC Do you have any idea on this?
|
@cursor review |
|
Skipping Bugbot: Bugbot is disabled for this repository. Visit the Bugbot dashboard to update your settings. |
|
@cursor review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Signed-off-by: Can Sun <sucan@amazon.com>
7f8ada7 to
ee262c6
Compare
Relax worker port count assertion Signed-off-by: Can Sun <sucan@amazon.com>
Relax worker port count assertion Signed-off-by: Can Sun <sucan@amazon.com>

Problem
The strict
==assertions onlmcache_worker_portslength inLMCacheWorkerreject valid configurations where more ports are provided than the effectiveworld_sizeor worker count.This is problematic in two scenarios:
world_size(vLLM divides bytp_size), so users providing ports based on total GPU count (TP × PP) hit the assertion. For example, on a p5 instance with 8 GPUs, providing 8 ports fails when MLA reducesworld_sizeto 1.g5.24xlargewhich only has 4 GPUs.Fix
Change
==to>=in both assertions inworker.py:len(config.lmcache_worker_ports) >= metadata.world_sizelen(config.lmcache_worker_ports) >= len(lmcache_worker_ids)Extra ports are simply unused — they are never bound to sockets or registered with the controller. The port selection logic indexes by
worker_idorlmcache_worker_ids.index(), so extra entries are harmless. This is consistent with how P2P ports (p2p_init_ports,p2p_lookup_ports) already work.Split out from #2732 per reviewer feedback.
Note
Low Risk
Low risk: only loosens config validation for
lmcache_worker_portswithout changing runtime socket binding logic; main risk is allowing misconfigured port lists that are too short for the selected workers to slip through until index errors occur elsewhere.Overview
Relaxes
LMCacheWorkerstartup validation solmcache_worker_portscan contain more entries than the effectiveworld_sizeor explicitlmcache_worker_idslist.The strict
==length checks are replaced with>=, with updated inline documentation explaining why extra ports are safe (especially for MLA where vLLM may report a reducedworld_size).Written by Cursor Bugbot for commit 617c865. This will update automatically on new commits. Configure here.