[Bugfix]: fix get_num_heads for MLA format#2941
Conversation
MLA format (NL_X_NB_BS_HS) absorbs heads into the hidden dim, so get_num_heads should return 1 instead of raising ValueError. This was preventing all MLA models (e.g. DeepSeek-V2-Lite) from launching.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| elif gpu_kv_format == lmc_ops.GPUKVFormat.NL_X_NB_BS_HS: | ||
| raise ValueError(_ATTRIBUTE_NOT_EXIST_ERROR.format(format=gpu_kv_format)) | ||
| # MLA: heads are absorbed into hidden dim, so num_heads = 1 | ||
| return 1 |
There was a problem hiding this comment.
Bug fix lacks required regression test
Low Severity
This bug fix for get_num_heads with NL_X_NB_BS_HS format has no accompanying regression test. The project's AGENTS.md and review rules require that bug fixes include corresponding tests to prevent regressions. The PR's own "this PR contains unit tests" checkbox is unchecked.
Triggered by project rule: LMCache Code Review Style Guide
There was a problem hiding this comment.
Code Review
This pull request updates the get_num_heads utility in lmcache/v1/gpu_connector/utils.py to support the MLA GPU KV format by returning 1 instead of raising a ValueError. While the logic change is correct, the reviewer noted that a regression test should be included to comply with the repository style guide regarding bug fixes.
| # MLA: heads are absorbed into hidden dim, so num_heads = 1 | ||
| return 1 |
There was a problem hiding this comment.
The fix correctly handles the MLA format by returning 1 instead of raising a ValueError. However, according to the repository style guide (line 39), bug fixes should include regression tests. Please consider adding a unit test to verify this behavior and prevent future regressions, especially since this issue was blocking model launches for MLA-based models like DeepSeek-V2.
References
- Bug fixes should include regression tests (line 39). (link)
maobaolong
left a comment
There was a problem hiding this comment.
LGTM @sammshen Thanks for this quick fix!


MLA format (NL_X_NB_BS_HS) absorbs heads into the hidden dim, so get_num_heads should return 1 instead of raising ValueError. This was preventing all MLA models (e.g. DeepSeek-V2-Lite) from launching.
What this PR does / why we need it:
Special notes for your reviewers:
If applicable:
Note
Low Risk
Low risk: changes a single helper to stop raising and to return a constant for the
NL_X_NB_BS_HS(MLA) KV format, affecting only head-count introspection for MLA models.Overview
Fixes
get_num_headsto handle the MLA KV cache format (GPUKVFormat.NL_X_NB_BS_HS) by returning1(heads folded into hidden dim) instead of raisingValueError, unblocking MLA-based model startup.Written by Cursor Bugbot for commit 4b74496. This will update automatically on new commits. Configure here.