[FEATURE]: sglang kvbm integration

### Feature request

#### [2025-06-26] Edited

**Option 1 – on-demand pages.** When SGLang asks for a KV page, we call the KVBM adapter, wrap that page in a KVBM Block, and let KVBM persist it to NIXL. Same as [vllm's mega PR](https://github.com/ai-dynamo/dynamo/compare/main...ryan/vllm-mega-locality)

![Image](https://github.com/user-attachments/assets/5448e4ad-fea8-4569-a8c5-c48d7a844f76)



**Option 2 – pre-alloc.** Have KVBM hand SGLang a single 20 GB “free_pages” region up front.



#### [2025-05-24] First draft

Currently, SGLang manages its own KV pages, which live only on the GPU -- there is no spill-to-CPU or remote tier, which KVBM enables, yet.

Vikram Sharma: We need to create something like [the patch for vllm](https://github.com/ai-dynamo/dynamo/blob/6d9aac776b715381d41586617e8bf34130f08611/container/deps/vllm/vllm_v0.8.4-dynamo-kv-disagg-patch.patch#L4) for SGLang. kkranen: We'd eventually prefer a full PR to SGLang itself, but we want to prove value to them first!

### Plan
The plan is to create a thin “connector” layer that turns RadixKVManager’s callbacks (alloc_block, free_block, “block-stored”, “block-evicted” …) into the KVBM API calls such as
- device_pool.get_mutable_block() → returns a MutableBlock handle
- block.commit(); block.register() → publish StoreEvent
- block.drop() or block.reset() → publish RemoveEvent


### Detailed plan

#### 1: Create a file "dynamo/llm/sglang_kv_events.py"
A tiny KVCacheEventManager (90 LOC) that wraps the existing dynamo_llm_init / dynamo_kv_event_publish_* C API with ctypes.


#### 2: Patch file container/deps/sglang/sglang_v0.4.x-dynamo-kv.patch
Initialise one KVCacheEventManager per engine (needs namespace, component name, rank, page_size).
Inside RadixKVManager:
- after allocate_block() → kv_pub.enqueue_stored_event(parent, block)
- inside evict_block() → kv_pub.enqueue_removed_event(block.hash)


#### 3: Optional allocator hook (behind env-flag DYNAMO_SGLANG_KVBM_ALLOC=1)
dynamo/llm/sglang_kv_allocator.py implements a subclass of SGLang’s block allocator that calls kvbm.device_pool.get_mutable_block() → commit() → register(). The patch swaps to this allocator when the flag is set.

### Describe the problem you're encountering

Don't have the sglang integration for kvbm yet.

### Describe alternatives you've tried

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE]: sglang kvbm integration #1199

Feature request

[2025-06-26] Edited

[2025-05-24] First draft

Plan

Detailed plan

1: Create a file "dynamo/llm/sglang_kv_events.py"

2: Patch file container/deps/sglang/sglang_v0.4.x-dynamo-kv.patch

3: Optional allocator hook (behind env-flag DYNAMO_SGLANG_KVBM_ALLOC=1)

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE]: sglang kvbm integration #1199

Description

Feature request

[2025-06-26] Edited

[2025-05-24] First draft

Plan

Detailed plan

1: Create a file "dynamo/llm/sglang_kv_events.py"

2: Patch file container/deps/sglang/sglang_v0.4.x-dynamo-kv.patch

3: Optional allocator hook (behind env-flag DYNAMO_SGLANG_KVBM_ALLOC=1)

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions