[Roadmap] Prefill-Decode Disaggregation Roadmap (2026 Q2)

## Motivation
Prefill-Decode Disaggregation has been an important and widely adopted feature for LLM inference production.
Although we have almost made it compatible with different model architectures, all parallel strategies, and almost all advanced features, there are always new models/features coming up, which we also want to support in PD mode.

This issue will act as a roadmap or a tracker for those who are interested in the latest progress of this feature (at least it will include some related experimental features we will conduct in 2026 Q2).

## Interface refactoring
- [x] Support query dp rank from bootstrap server, which allows us to achieve better DP load balance https://github.com/sgl-project/sglang/pull/19168
- [x] Refactor PD decode KV receiver bootstrap for better readability and bugfix https://github.com/sgl-project/sglang/issues/21680 https://github.com/sgl-project/sglang/pull/21299  @weireweire 


## Advanced feature and usage
 - [x] Context parallel compatibility after refactor https://github.com/sgl-project/sglang/pull/19504 https://github.com/sgl-project/sglang/pull/19765
 - [x] GPU staging buffer for accelerating heterogeneous TP KV transfer https://github.com/sgl-project/sglang/pull/19890 @YAMY1234 
   - For MHA/GQA models, there are many small pieces of KVCache transfer for a heterogeneous TP setup of PD, which will slow down the transfer process. This feature will gather the KV heads at prefill, then send the kvcache to decode by uniting them as a large piece through a ring-based staging buffer (really helpful for GB platforms).
 - [x] Compatibility with Hisparse https://github.com/sgl-project/sglang/pull/21591 @hzh0425 
   - This feature will allow us to send kvcache from prefill HBM to decode host memory when enabling hisparse for NSA models (DeepSeek V3.2/GLM 5). It will help us to get a much bigger batch size on the decode side since we only keep the topk sparse KVCache in HBM, and make all other KVCache stored in the host memory.
 - [ ] Prefill send delta KVCache while Decode fetch prefix from Hicache for agentic use cases @hzh0425  
 - [x] Decode RadixCache https://github.com/sgl-project/sglang/pull/19746 @ishandhanani 
   - For agentic scenarios, decode can now use radix cache to reuse shared prefixes and request only the delta KV from prefill instead of transferring the full prefix on every turn.
 - [ ] Better compatibility with Spec decode: support heterogeneous target/draft model architectures https://github.com/sgl-project/sglang/pull/20698
   - This feature will allow us to use different draft architectures from target models when enabling speculative decoding or MTP (specforge/torchspec might need it).
 - [ ] Layerwise KVCache transfer https://github.com/sgl-project/sglang/pull/19931 @zhangxiaolei123456 @UNIDY2002 https://github.com/sgl-project/sglang/pull/23515
 - [ ] Support Prefill PP + MTP for PD Disaggregation mode https://github.com/sgl-project/sglang/issues/23162

## Routing strategy enhancement for agentic serving
 - [x] Support heterogeneous instances in the same (Prefill/Decode) pool while simultaneously enabling seamless pairing with heterogeneous (Decode/Prefill) instances. (We have supported this from the very beginning.)
 - [ ] Support extended roles in RBG, (long context prefill instance, short context prefill instance, long context decode instance, short context decode instance) @cheyang Do we support this now?
 - [ ] Support session-based load-balanced-aware routing strategy in sglang router @whybeyoung 

## New model support:
 - [x] Qwen3.5: hybrid linear attention support PP+PD https://github.com/sgl-project/sglang/pull/21448 https://github.com/sgl-project/sglang/pull/19254 @zhangxiaolei123456 @sufeng-buaa 

More will be added. Feel free to comment any feature requests down here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] Prefill-Decode Disaggregation Roadmap (2026 Q2) #21703

Motivation

Interface refactoring

Advanced feature and usage

Routing strategy enhancement for agentic serving

New model support:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] Prefill-Decode Disaggregation Roadmap (2026 Q2) #21703

Description

Motivation

Interface refactoring

Advanced feature and usage

Routing strategy enhancement for agentic serving

New model support:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions