[PD] Support P/D role conversion of running sglang server#9325
Open
LLLL114 wants to merge 62 commits intosgl-project:mainfrom
Open
[PD] Support P/D role conversion of running sglang server#9325LLLL114 wants to merge 62 commits intosgl-project:mainfrom
LLLL114 wants to merge 62 commits intosgl-project:mainfrom
Conversation
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Collaborator
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Contributor
|
I am interested in this feature. Is there an expected time by when it will be merged in? Thanks |
Collaborator
@kartikx This is a good feature, but currently it can not support PD with different TP/DP, so it can only serve limited scenes like small models with the same TP. We might consider integrating checkpoint engine to see in the future to test if we can fast redistribute the weight, but it might be no difference compared to shutting down and restarting another one with checkpoint engine. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
parent issue: #8210
Currently, user can only adjust the P/D node ratio by killing and restarting the server.
This PR provide a
httpinterface, which can reuse the loaded model weights and convert P/D role without killing server.Modifications
http_server.pyandmini_lb.pyto support pass convert req in running server.DecodeReqToTokenPoolin prefill and decode node. This allows to reusecuda graph runnerandattn_backendwhen convert between P and D.threading eventto break event loop inscheduler,MooncakeKVManagerandhicache.Accuracy Tests
launch 2P2D of Qwen3-1.7B with
--enable-pd-convert:default:
use the req to convert P/D:
Benchmarking and Profiling
Checklist