Checklist
Motivation
As far as I know, SGLang manages its own KV pages, and has no spill-to-remote-tier, like cloud storage, which NVIDIA KV Block manager and NIXL enable. I am leaning towards the first implementation below. Would love to hear how the SGLang team thinks!
Option 1 – on-demand pages. When SGLang asks for a KV page, we call the KVBM adapter, wrap that page in a KVBM Block, and let KVBM persist it to NIXL. Same as vllm's mega PR

Option 2 – pre-alloc. Have KVBM hand SGLang a single 20 GB “free_pages” region up front.
Related resources
Implmenetation on the nvidia side: ai-dynamo/dynamo#1199
Checklist
Motivation
As far as I know, SGLang manages its own KV pages, and has no spill-to-remote-tier, like cloud storage, which NVIDIA KV Block manager and NIXL enable. I am leaning towards the first implementation below. Would love to hear how the SGLang team thinks!
Option 1 – on-demand pages. When SGLang asks for a KV page, we call the KVBM adapter, wrap that page in a KVBM Block, and let KVBM persist it to NIXL. Same as vllm's mega PR
Option 2 – pre-alloc. Have KVBM hand SGLang a single 20 GB “free_pages” region up front.
Related resources
Implmenetation on the nvidia side: ai-dynamo/dynamo#1199