feat(examples/sglang): add Intel XPU deployment configs#1
Closed
yao531441 wants to merge 1521 commits into
Closed
Conversation
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
…ynamo#9818) Signed-off-by: VincyZhang <wenxin.zhang@intel.com>
…dynamo#9974) Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com> Co-authored-by: Neelay Shah <neelays@nvidia.com>
…i-dynamo#10089) Signed-off-by: Anant Sharma <anants@nvidia.com>
…ker (DIS-2107) (ai-dynamo#9830) Signed-off-by: Tzu-Ling <tzulingk@nvidia.com>
ai-dynamo#10074) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… divergences (ai-dynamo#9971) Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
…10070) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhang, Wenxin <wenxin.zhang@intel.com> Signed-off-by: VincyZhang <wenxin.zhang@intel.com> Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com>
…ocs (ai-dynamo#9982) Signed-off-by: Shaoting-Feng <shaotingf@tensormesh.ai> Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: dynamo-ops <170655669+dynamo-ops@users.noreply.github.com> Signed-off-by: Anant Sharma <anants@nvidia.com> Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com> Co-authored-by: Anant Sharma <anants@nvidia.com> Co-authored-by: Indrajit Bhosale <iamindrajitb@gmail.com>
…i-dynamo#10053) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
…o#9952) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
…as ✗ in parity tables (ai-dynamo#10504) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
… fields (ai-dynamo#10255) Signed-off-by: Ishan Dhanani <ishandhanani@gmail.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com> Co-authored-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: carlory <baofa.fan@daocloud.io>
Add demand-driven Python KV router response consumption and lifecycle E2E coverage for slot tracking, replication, and disaggregated roles.
…ling (ai-dynamo#10422) Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: Dan Gil <dagil@nvidia.com>
…cking (ai-dynamo#10508) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…harness (ai-dynamo#10535) Signed-off-by: nnshah1 <neelays@nvidia.com>
Signed-off-by: carlory <baofa.fan@daocloud.io>
…ew works (ai-dynamo#10073) (ai-dynamo#10288) Signed-off-by: zhongdaor <zhongdaor@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-dynamo#10092) Signed-off-by: jooe0824 <jooe0824@sk.com> Co-authored-by: jooe0824 <jooe0824@sk.com>
…0502) Signed-off-by: Anant Sharma <anants@nvidia.com>
…i-dynamo#10518) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
…mo#10551) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ynamogh-8749) (ai-dynamo#10037) Signed-off-by: nnshah1 <neelays@nvidia.com>
…worker set (ai-dynamo#10503) Signed-off-by: Jie Hao <jihao@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Compress single-threaded radix recovery dumps and suppress mocker KV events when prefix caching is disabled.\n\nCloses ai-dynamo#10524
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview:
Add Kubernetes deployment templates for SGLang on Intel XPU GPUs using Dynamic Resource Allocation (DRA).
Details:
This PR adds XPU-specific deployment configurations for SGLang backend, aligning with the vLLM XPU configs structure from PR ai-dynamo#9253.
New files:
deploy/xpu/agg_xpu_dra.yaml: Aggregated deployment with single workerdeploy/xpu/disagg_xpu_dra.yaml: Disaggregated deployment with DRAdeploy/xpu/disagg_planner_xpu_dra.yaml: Disaggregated with Global Plannerdeploy/xpu/disagg_xpu.yaml: Traditional device plugin (gpu.intel.com/xe)deploy/xpu/README.md: XPU-specific deployment guideKey configurations for XPU disaggregated KV transfer:
hostIPC: truefor ZE_IPC shared memory communicationUCX_TLS=ze_ipc,ze_copy,tcp,cma,sysv,posix,selffor UCX transport layerResourceClaimTemplatefor GPU allocation via Kubernetes DRAZE_AFFINITY_MASKwith DRA - it conflicts and causes SIGSEGVWhere should the reviewer start?
examples/backends/sglang/deploy/xpu/README.md- Overview and prerequisitesexamples/backends/sglang/deploy/xpu/disagg_xpu_dra.yaml- Core disaggregated configRelated Issues: