Skip to content

[FEATURE]: sglang kvbm integration #1199

@faradawn

Description

@faradawn

Feature request

[2025-06-26] Edited

Option 1 – on-demand pages. When SGLang asks for a KV page, we call the KVBM adapter, wrap that page in a KVBM Block, and let KVBM persist it to NIXL. Same as vllm's mega PR

Image

Option 2 – pre-alloc. Have KVBM hand SGLang a single 20 GB “free_pages” region up front.

[2025-05-24] First draft

Currently, SGLang manages its own KV pages, which live only on the GPU -- there is no spill-to-CPU or remote tier, which KVBM enables, yet.

Vikram Sharma: We need to create something like the patch for vllm for SGLang. kkranen: We'd eventually prefer a full PR to SGLang itself, but we want to prove value to them first!

Plan

The plan is to create a thin “connector” layer that turns RadixKVManager’s callbacks (alloc_block, free_block, “block-stored”, “block-evicted” …) into the KVBM API calls such as

  • device_pool.get_mutable_block() → returns a MutableBlock handle
  • block.commit(); block.register() → publish StoreEvent
  • block.drop() or block.reset() → publish RemoveEvent

Detailed plan

1: Create a file "dynamo/llm/sglang_kv_events.py"

A tiny KVCacheEventManager (90 LOC) that wraps the existing dynamo_llm_init / dynamo_kv_event_publish_* C API with ctypes.

2: Patch file container/deps/sglang/sglang_v0.4.x-dynamo-kv.patch

Initialise one KVCacheEventManager per engine (needs namespace, component name, rank, page_size).
Inside RadixKVManager:

  • after allocate_block() → kv_pub.enqueue_stored_event(parent, block)
  • inside evict_block() → kv_pub.enqueue_removed_event(block.hash)

3: Optional allocator hook (behind env-flag DYNAMO_SGLANG_KVBM_ALLOC=1)

dynamo/llm/sglang_kv_allocator.py implements a subclass of SGLang’s block allocator that calls kvbm.device_pool.get_mutable_block() → commit() → register(). The patch swaps to this allocator when the flag is set.

Describe the problem you're encountering

Don't have the sglang integration for kvbm yet.

Describe alternatives you've tried

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions