Check KV4 compatibility with attention backends and add KV4 support to the attention_backend doc by JackChuang · Pull Request #14467 · sgl-project/sglang

JackChuang · 2025-12-05T02:11:32Z

Motivation

Prevent users from using KV4 with incompatible attention backends by clearly documenting supported backends and enforcing runtime checks.

Improves reliability and reduces runtime errors by ensuring users cannot accidentally run KV4 with unsupported attention backends.

Modifications

Ensure code executes after default settings are set by placing it in server_args.py instead of server_args.py.

Description / Changes:
1. Backend documentation updates
• Added FP4 KV cache column to the MLA/MHA backend table.
• Clarifies which backend combinations are supported with FP4 KV caches.
2. ServerArgs updates
• Added _handle_kv4_compatibility() to check KV4 compatibility with attention backends at runtime.
• Logs warnings for potential edge-case incompatibilities.
• Adds assertions to enforce correct decode_attention_backend for FA4 + MLA/MHA and non-FA4 + MLA/MHA setups.
• Raises an error if KV4 is used on non-CUDA platforms.

Testing

The compatibility results are tested on B200 (sm100), using Qwen3-235B-A22B as MHA and DeepSeek-R1-0528-FP4 as MLA to test.

Next (WIP)

Test kv4 with fa3 and flashmla backend on sm90 to complete the table. I will send another PR for this.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-05T02:11:37Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Fridge003 · 2025-12-09T07:27:31Z

/tag-and-rerun-ci keep rerunning

JackChuang · 2025-12-09T07:35:43Z

@Fridge003 I've fixed the code according to your comments and I've also rebased to origin/main and fixed conflicts as well. Please check again. Thank you~

Fridge003 · 2025-12-09T08:52:10Z

@JackChuang Please fix this https://github.com/sgl-project/sglang/actions/runs/20055608966/job/57520174252?pr=14467

JackChuang · 2025-12-11T01:50:00Z

@JackChuang Please fix this https://github.com/sgl-project/sglang/actions/runs/20055608966/job/57520174252?pr=14467

I've fixed the failing stage-a-test-1 test. Please check again. Thanks.

… FP4 note - Introduce FP4 KV cache support in the backend matrix. - Add note on FA4 + KV4 scenario. Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>

- Add _handle_kv4_compatibility() to validate backend choices for KV4 scenarios. - Warns on potential edge-case incompatibilities. - Asserts correct decode_attention_backend for FA4 + MLA/MHA and non-FA4 + MLA/MHA setups. - Raises error if KV4 is used on non-CUDA platforms. Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>

…o the attention_backend doc (sgl-project#14467) Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>

JackChuang requested review from Fridge003, Ying1123, hnyls2002, ispobock and merrymercy as code owners December 5, 2025 02:11

github-actions Bot added the documentation Improvements or additions to documentation label Dec 5, 2025

Fridge003 reviewed Dec 5, 2025

View reviewed changes

Comment thread python/sglang/srt/model_executor/model_runner.py Outdated

JackChuang mentioned this pull request Dec 9, 2025

[Feature] Multiple KVCache Quantization Enhancements #10083

Closed

2 tasks

JackChuang force-pushed the horenc/kv4_backend_compat_on_main_release branch from 0f97cf2 to b2a2c7f Compare December 9, 2025 07:12

Fridge003 approved these changes Dec 9, 2025

View reviewed changes

github-actions Bot added the run-ci label Dec 9, 2025

JackChuang force-pushed the horenc/kv4_backend_compat_on_main_release branch from b2a2c7f to fe51449 Compare December 9, 2025 07:34

JackChuang force-pushed the horenc/kv4_backend_compat_on_main_release branch from fe51449 to b580da2 Compare December 9, 2025 09:00

JackChuang added 2 commits December 12, 2025 00:49

Add FP4 KV cache column to MHA and MLA backend table and update FA4 +…

dbe0342

… FP4 note - Introduce FP4 KV cache support in the backend matrix. - Add note on FA4 + KV4 scenario. Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>

JackChuang force-pushed the horenc/kv4_backend_compat_on_main_release branch from b580da2 to 03eacf3 Compare December 12, 2025 00:50

Fridge003 merged commit 10146af into sgl-project:main Dec 12, 2025
61 of 89 checks passed

JackChuang mentioned this pull request Dec 12, 2025

Add KV4-capable backend flashmla and update server args #14989

Merged

6 tasks

Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 17, 2025

Check KV4 compatibility with attention backends and add KV4 support t…

4a6964b

…o the attention_backend doc (sgl-project#14467) Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

Check KV4 compatibility with attention backends and add KV4 support t…

58e8618

…o the attention_backend doc (sgl-project#14467) Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check KV4 compatibility with attention backends and add KV4 support to the attention_backend doc#14467

Check KV4 compatibility with attention backends and add KV4 support to the attention_backend doc#14467
Fridge003 merged 2 commits intosgl-project:mainfrom
bytedance-iaas:horenc/kv4_backend_compat_on_main_release

JackChuang commented Dec 5, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Dec 5, 2025

Uh oh!

Uh oh!

Fridge003 commented Dec 9, 2025 •

edited by b8zhong

Loading

Uh oh!

JackChuang commented Dec 9, 2025 •

edited

Loading

Uh oh!

Fridge003 commented Dec 9, 2025

Uh oh!

JackChuang commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JackChuang commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Testing

Next (WIP)

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 5, 2025

Uh oh!

Uh oh!

Fridge003 commented Dec 9, 2025 • edited by b8zhong Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackChuang commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fridge003 commented Dec 9, 2025

Uh oh!

JackChuang commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JackChuang commented Dec 5, 2025 •

edited

Loading

Fridge003 commented Dec 9, 2025 •

edited by b8zhong

Loading

JackChuang commented Dec 9, 2025 •

edited

Loading