[Bugfix]: fix get_num_heads for MLA format by sammshen · Pull Request #2941 · LMCache/LMCache

sammshen · 2026-04-03T05:53:18Z

MLA format (NL_X_NB_BS_HS) absorbs heads into the hidden dim, so get_num_heads should return 1 instead of raising ValueError. This was preventing all MLA models (e.g. DeepSeek-V2-Lite) from launching.

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Note

Low Risk
Low risk: changes a single helper to stop raising and to return a constant for the NL_X_NB_BS_HS (MLA) KV format, affecting only head-count introspection for MLA models.

Overview
Fixes get_num_heads to handle the MLA KV cache format (GPUKVFormat.NL_X_NB_BS_HS) by returning 1 (heads folded into hidden dim) instead of raising ValueError, unblocking MLA-based model startup.

^{Written by Cursor Bugbot for commit 4b74496. This will update automatically on new commits. Configure here.}

MLA format (NL_X_NB_BS_HS) absorbs heads into the hidden dim, so get_num_heads should return 1 instead of raising ValueError. This was preventing all MLA models (e.g. DeepSeek-V2-Lite) from launching.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-04-03T05:54:18Z

    elif gpu_kv_format == lmc_ops.GPUKVFormat.NL_X_NB_BS_HS:
-        raise ValueError(_ATTRIBUTE_NOT_EXIST_ERROR.format(format=gpu_kv_format))
+        # MLA: heads are absorbed into hidden dim, so num_heads = 1
+        return 1


Bug fix lacks required regression test

Low Severity

This bug fix for get_num_heads with NL_X_NB_BS_HS format has no accompanying regression test. The project's AGENTS.md and review rules require that bug fixes include corresponding tests to prevent regressions. The PR's own "this PR contains unit tests" checkbox is unchecked.

^{Triggered by project rule: LMCache Code Review Style Guide}

gemini-code-assist

Code Review

This pull request updates the get_num_heads utility in lmcache/v1/gpu_connector/utils.py to support the MLA GPU KV format by returning 1 instead of raising a ValueError. While the logic change is correct, the reviewer noted that a regression test should be included to comply with the repository style guide regarding bug fixes.

gemini-code-assist · 2026-04-03T05:55:35Z

+        # MLA: heads are absorbed into hidden dim, so num_heads = 1
+        return 1


The fix correctly handles the MLA format by returning 1 instead of raising a ValueError. However, according to the repository style guide (line 39), bug fixes should include regression tests. Please consider adding a unit test to verify this behavior and prevent future regressions, especially since this issue was blocking model launches for MLA-based models like DeepSeek-V2.

References

Bug fixes should include regression tests (line 39). ^(link)

deng451e

LGTM

maobaolong

LGTM @sammshen Thanks for this quick fix!

[Bugfix]: fix get_num_heads for MLA format

4b74496

MLA format (NL_X_NB_BS_HS) absorbs heads into the hidden dim, so get_num_heads should return 1 instead of raising ValueError. This was preventing all MLA models (e.g. DeepSeek-V2-Lite) from launching.

sammshen requested review from deng451e, kobe0938 and maobaolong April 3, 2026 05:54

cursor Bot reviewed Apr 3, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 3, 2026

View reviewed changes

deng451e approved these changes Apr 3, 2026

View reviewed changes

maobaolong approved these changes Apr 3, 2026

View reviewed changes

maobaolong enabled auto-merge (squash) April 3, 2026 06:09

github-actions Bot added the full Run comprehensive tests on this PR label Apr 3, 2026

maobaolong merged commit 45d4d36 into LMCache:dev Apr 3, 2026
36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix]: fix get_num_heads for MLA format#2941

[Bugfix]: fix get_num_heads for MLA format#2941
maobaolong merged 1 commit intoLMCache:devfrom
sammshen:fix-mla-fmt

sammshen commented Apr 3, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 3, 2026

Uh oh!

deng451e left a comment

Uh oh!

maobaolong left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# MLA: heads are absorbed into hidden dim, so num_heads = 1
		return 1

Conversation

sammshen commented Apr 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 3, 2026

Choose a reason for hiding this comment

Bug fix lacks required regression test

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

deng451e left a comment

Choose a reason for hiding this comment

Uh oh!

maobaolong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sammshen commented Apr 3, 2026 •

edited by cursor Bot

Loading