Feature request
[2025-06-26] Edited
Option 1 – on-demand pages. When SGLang asks for a KV page, we call the KVBM adapter, wrap that page in a KVBM Block, and let KVBM persist it to NIXL. Same as vllm's mega PR

Option 2 – pre-alloc. Have KVBM hand SGLang a single 20 GB “free_pages” region up front.
[2025-05-24] First draft
Currently, SGLang manages its own KV pages, which live only on the GPU -- there is no spill-to-CPU or remote tier, which KVBM enables, yet.
Vikram Sharma: We need to create something like the patch for vllm for SGLang. kkranen: We'd eventually prefer a full PR to SGLang itself, but we want to prove value to them first!
Plan
The plan is to create a thin “connector” layer that turns RadixKVManager’s callbacks (alloc_block, free_block, “block-stored”, “block-evicted” …) into the KVBM API calls such as
- device_pool.get_mutable_block() → returns a MutableBlock handle
- block.commit(); block.register() → publish StoreEvent
- block.drop() or block.reset() → publish RemoveEvent
Detailed plan
1: Create a file "dynamo/llm/sglang_kv_events.py"
A tiny KVCacheEventManager (90 LOC) that wraps the existing dynamo_llm_init / dynamo_kv_event_publish_* C API with ctypes.
2: Patch file container/deps/sglang/sglang_v0.4.x-dynamo-kv.patch
Initialise one KVCacheEventManager per engine (needs namespace, component name, rank, page_size).
Inside RadixKVManager:
- after allocate_block() → kv_pub.enqueue_stored_event(parent, block)
- inside evict_block() → kv_pub.enqueue_removed_event(block.hash)
3: Optional allocator hook (behind env-flag DYNAMO_SGLANG_KVBM_ALLOC=1)
dynamo/llm/sglang_kv_allocator.py implements a subclass of SGLang’s block allocator that calls kvbm.device_pool.get_mutable_block() → commit() → register(). The patch swaps to this allocator when the flag is set.
Describe the problem you're encountering
Don't have the sglang integration for kvbm yet.
Describe alternatives you've tried
No response
Feature request
[2025-06-26] Edited
Option 1 – on-demand pages. When SGLang asks for a KV page, we call the KVBM adapter, wrap that page in a KVBM Block, and let KVBM persist it to NIXL. Same as vllm's mega PR
Option 2 – pre-alloc. Have KVBM hand SGLang a single 20 GB “free_pages” region up front.
[2025-05-24] First draft
Currently, SGLang manages its own KV pages, which live only on the GPU -- there is no spill-to-CPU or remote tier, which KVBM enables, yet.
Vikram Sharma: We need to create something like the patch for vllm for SGLang. kkranen: We'd eventually prefer a full PR to SGLang itself, but we want to prove value to them first!
Plan
The plan is to create a thin “connector” layer that turns RadixKVManager’s callbacks (alloc_block, free_block, “block-stored”, “block-evicted” …) into the KVBM API calls such as
Detailed plan
1: Create a file "dynamo/llm/sglang_kv_events.py"
A tiny KVCacheEventManager (90 LOC) that wraps the existing dynamo_llm_init / dynamo_kv_event_publish_* C API with ctypes.
2: Patch file container/deps/sglang/sglang_v0.4.x-dynamo-kv.patch
Initialise one KVCacheEventManager per engine (needs namespace, component name, rank, page_size).
Inside RadixKVManager:
3: Optional allocator hook (behind env-flag DYNAMO_SGLANG_KVBM_ALLOC=1)
dynamo/llm/sglang_kv_allocator.py implements a subclass of SGLang’s block allocator that calls kvbm.device_pool.get_mutable_block() → commit() → register(). The patch swaps to this allocator when the flag is set.
Describe the problem you're encountering
Don't have the sglang integration for kvbm yet.
Describe alternatives you've tried
No response