[Feature] KV Cache offloading to remote devices -- NVIDA KVBM

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

As far as I know, SGLang manages its own KV pages, and has no spill-to-remote-tier, like cloud storage, which NVIDIA KV Block manager and NIXL enable. I am leaning towards the first implementation below. Would love to hear how the SGLang team thinks!

**Option 1 – on-demand pages.** When SGLang asks for a KV page, we call the KVBM adapter, wrap that page in a KVBM Block, and let KVBM persist it to NIXL. Same as [vllm's mega PR](https://github.com/ai-dynamo/dynamo/compare/main...ryan/vllm-mega-locality)

![Image](https://github.com/user-attachments/assets/0fab2065-22e1-4598-818c-2a61ab8d4207)

**Option 2 – pre-alloc.** Have KVBM hand SGLang a single 20 GB “free_pages” region up front.



### Related resources

Implmenetation on the nvidia side: https://github.com/ai-dynamo/dynamo/issues/1199

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] KV Cache offloading to remote devices -- NVIDA KVBM #7576

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] KV Cache offloading to remote devices -- NVIDA KVBM #7576

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions