[HiCache][HybridModel]: Support mamba state offloading & HybridCacheController#20457
Merged
xiezhq-hermann merged 33 commits intosgl-project:mainfrom Mar 24, 2026
Merged
Conversation
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
dde5a4c to
3556ed6
Compare
hzh0425
commented
Mar 15, 2026
ispobock
reviewed
Mar 16, 2026
hzh0425
commented
Mar 16, 2026
Collaborator
Author
|
/tag-and-rerun-ci |
c10a89a to
ec983d3
Compare
xiezhq-hermann
approved these changes
Mar 21, 2026
Collaborator
Author
|
/rerun-stage stage-c-test-8-gpu-h200 |
Contributor
|
✅ Triggered |
Contributor
ShangmingCai
approved these changes
Mar 24, 2026
Collaborator
Author
adityavaid
pushed a commit
to adityavaid/sglang
that referenced
this pull request
Mar 24, 2026
…ontroller (sgl-project#20457) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: ispobock <ispobaoke@gmail.com>
adityavaid
pushed a commit
to adityavaid/sglang
that referenced
this pull request
Mar 24, 2026
…ontroller (sgl-project#20457) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: ispobock <ispobaoke@gmail.com>
0-693
pushed a commit
to 0-693/sglang
that referenced
this pull request
Mar 25, 2026
…ontroller (sgl-project#20457) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: ispobock <ispobaoke@gmail.com>
5 tasks
johnnycxm
pushed a commit
to johnnycxm/sglang
that referenced
this pull request
Mar 25, 2026
…ontroller (sgl-project#20457) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: ispobock <ispobaoke@gmail.com>
johnnycxm
pushed a commit
to johnnycxm/sglang
that referenced
this pull request
Mar 25, 2026
…ontroller (sgl-project#20457) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: ispobock <ispobaoke@gmail.com>
25 tasks
JustinTong0323
pushed a commit
to JustinTong0323/sglang
that referenced
this pull request
Apr 7, 2026
…ontroller (sgl-project#20457) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: ispobock <ispobaoke@gmail.com>
2 tasks
parasol-aser
pushed a commit
to parasol-aser/sglang
that referenced
this pull request
Apr 11, 2026
Implements the HiCacheStorage v2 interface for the 3FS backend so that hybrid models (Mamba/linear-attention, and in the future DSA) can offload both KV pages and auxiliary per-pool state to 3FS via HybridCacheController. - Introduce _Hf3fsPoolEngine: a per-pool bundle of (file, client list, executor, metadata client, rank namespace, is_zero_copy, skip_backup) so each registered host pool has its own 3FS file and metadata scope. - Construct the KV engine in __init__ so v1 callers keep working unchanged. - Implement register_mem_host_pool_v2 to lazily allocate auxiliary (MAMBA/...) engines with their own preallocated files, clients and metadata namespaces. Idempotent and order-agnostic. - Implement batch_exists_v2 / batch_get_v2 / batch_set_v2 mirroring the HiCacheFile semantics, including ALL_PAGES and TRAILING_PAGES hit policies, min-across-pools final hit, and per-pool result dicts. - Refactor _batch_get / _batch_set to take an engine argument so both v1 and v2 entry points share the same IO core. - Key namespacing: auxiliary pools prefix the metadata key with the pool name, KV keeps the bare key for backwards compatibility. MHA zero-copy -k/-v suffixing remains strictly KV-scoped. - Per-pool skip_backup so MLA rank>0 still skips KV but backs up MAMBA on every rank. Fix a pre-existing bug where skip_backup returned a scalar True instead of a per-key list. - close() now iterates all engines; _engines is populated before the SIGTERM handler is installed. Test plan: - New test/registered/hicache/test_hicache_storage_3fs_hybrid.py uses the mock HF3FS client to cover: construction sanity, KV-only v2 fallback, ALL_PAGES and TRAILING_PAGES exists semantics, v2 set/get round-trip, MHA zero-copy + mamba interplay, MLA skip_backup KV-only scoping, partial-pool failure, and a no-pool error contract. - Extended test_hicache_storage_3fs_backend.py with TestHf3fsBackendHybrid end-to-end test for a hybrid model, gated on model availability. Scope: PoolName.KV + PoolName.MAMBA. DSA is deferred until a caller exists (see PLAN.md §3 and Appendix B). Tracking issue: sgl-project#22572 Reference PRs: sgl-project#21259, sgl-project#20457 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
13 tasks
yhyang201
pushed a commit
to yhyang201/sglang
that referenced
this pull request
Apr 22, 2026
…ontroller (sgl-project#20457) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com> Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: ispobock <ispobaoke@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


TODO:
Startup
Accuracy Tests
AIME 2025, repeat 16 test:
first round:
second round (with flush_cache)
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci