Motivation
Prefill-Decode Disaggregation has been an important and widely adopted feature for LLM inference production.
Although we have almost made it compatible with different model architectures, all parallel strategies, and almost all advanced features, there are always new models/features coming up, which we also want to support in PD mode.
This issue will act as a roadmap or a tracker for those who are interested in the latest progress of this feature (at least it will include some related experimental features we will conduct in 2026 Q2).
Interface refactoring
Advanced feature and usage
Routing strategy enhancement for agentic serving
New model support:
More will be added. Feel free to comment any feature requests down here.
Motivation
Prefill-Decode Disaggregation has been an important and widely adopted feature for LLM inference production.
Although we have almost made it compatible with different model architectures, all parallel strategies, and almost all advanced features, there are always new models/features coming up, which we also want to support in PD mode.
This issue will act as a roadmap or a tracker for those who are interested in the latest progress of this feature (at least it will include some related experimental features we will conduct in 2026 Q2).
Interface refactoring
Advanced feature and usage
Routing strategy enhancement for agentic serving
New model support:
More will be added. Feel free to comment any feature requests down here.