Here is the roadmap for refactoring SGLang's memory caching system (mem cache v2).
Design Goals
Support arbitrary feature combinations regarding the KV cache management sub-system.
- Prefix caching
- Hierarchical caching
- PD disaggregation
- Request retraction/abortion
- Hybrid and Sparse attention
- Spec Decode
Road Map
- Release Initial Code
- Existing code cleaning
- New interface + chunk cache + triton backend
- Prefix Cache Reusing
- Radix Cache
- SWA Radix Cache
- Mamaba Radix Cache
- KV cache transfer
- PD disaggregation
- Hierarchical KV Cache
- KV cache offloading for Retraction
- More attention to backend integration
Here is the roadmap for refactoring SGLang's memory caching system (mem cache v2).
Design Goals
Support arbitrary feature combinations regarding the KV cache management sub-system.
Road Map
(overlap, non-overlap) x (page>=1) x (spec, non-spec, spec v2) x (retract, finished)#12224