Skip to content

[Roadmap] Prefill-Decode Disaggregation Roadmap (2026 Q2) #21703

@ShangmingCai

Description

@ShangmingCai

Motivation

Prefill-Decode Disaggregation has been an important and widely adopted feature for LLM inference production.
Although we have almost made it compatible with different model architectures, all parallel strategies, and almost all advanced features, there are always new models/features coming up, which we also want to support in PD mode.

This issue will act as a roadmap or a tracker for those who are interested in the latest progress of this feature (at least it will include some related experimental features we will conduct in 2026 Q2).

Interface refactoring

Advanced feature and usage

Routing strategy enhancement for agentic serving

  • Support heterogeneous instances in the same (Prefill/Decode) pool while simultaneously enabling seamless pairing with heterogeneous (Decode/Prefill) instances. (We have supported this from the very beginning.)
  • Support extended roles in RBG, (long context prefill instance, short context prefill instance, long context decode instance, short context decode instance) @cheyang Do we support this now?
  • Support session-based load-balanced-aware routing strategy in sglang router @whybeyoung

New model support:

More will be added. Feel free to comment any feature requests down here.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions