Motivation
The current radix cache implementations (RadixCache, MambaRadixCache, SWARadixCache, etc.) share a large amount of logic but are maintained as separate, diverged copies. This leads to code duplication, inconsistent behavior, and high maintenance burden when extending cache functionality to new model types (e.g., hybrid linear models, sliding window attention models).
The goal of this refactor is to unify the hybrid radix cache hierarchy around a common base interface, making it easier to:
- Add new cache variants without duplicating logic
- Ensure consistency across all cache implementations
- Enable future extensions (e.g., HiCache, PD disaggregation, new model architectures)
@hzh0425 @yizhang2077 @pansicheng @ispobock
Progress
Stage 0: Unify Radix Tree Interface
Stage 1: Support Unified HybridRadixTree V2
Stage 2: Tree Interface Cleanup and Optimization
Stage 3: Long-Term Rewrite
Related Issues
Motivation
The current radix cache implementations (RadixCache, MambaRadixCache, SWARadixCache, etc.) share a large amount of logic but are maintained as separate, diverged copies. This leads to code duplication, inconsistent behavior, and high maintenance burden when extending cache functionality to new model types (e.g., hybrid linear models, sliding window attention models).
The goal of this refactor is to unify the hybrid radix cache hierarchy around a common base interface, making it easier to:
@hzh0425 @yizhang2077 @pansicheng @ispobock
Progress
Stage 0: Unify Radix Tree Interface
Stage 1: Support Unified HybridRadixTree V2
StreamingSessioninUnifiedRadixCache#23202Stage 2: Tree Interface Cleanup and Optimization
Stage 3: Long-Term Rewrite
Related Issues
swa_radix_cache.py#13742