Skip to content

[Feature] KV Cache offloading to remote devices -- NVIDA KVBM #7576

@faradawn

Description

@faradawn

Checklist

Motivation

As far as I know, SGLang manages its own KV pages, and has no spill-to-remote-tier, like cloud storage, which NVIDIA KV Block manager and NIXL enable. I am leaning towards the first implementation below. Would love to hear how the SGLang team thinks!

Option 1 – on-demand pages. When SGLang asks for a KV page, we call the KVBM adapter, wrap that page in a KVBM Block, and let KVBM persist it to NIXL. Same as vllm's mega PR

Image

Option 2 – pre-alloc. Have KVBM hand SGLang a single 20 GB “free_pages” region up front.

Related resources

Implmenetation on the nvidia side: ai-dynamo/dynamo#1199

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions