Skip to content

feat(examples/sglang): add Intel XPU deployment configs#1

Closed
yao531441 wants to merge 1521 commits into
mainfrom
feat/sglang-xpu-deploy-configs
Closed

feat(examples/sglang): add Intel XPU deployment configs#1
yao531441 wants to merge 1521 commits into
mainfrom
feat/sglang-xpu-deploy-configs

Conversation

@yao531441

Copy link
Copy Markdown
Owner

Overview:

Add Kubernetes deployment templates for SGLang on Intel XPU GPUs using Dynamic Resource Allocation (DRA).

Details:

This PR adds XPU-specific deployment configurations for SGLang backend, aligning with the vLLM XPU configs structure from PR ai-dynamo#9253.

New files:

  • deploy/xpu/agg_xpu_dra.yaml: Aggregated deployment with single worker
  • deploy/xpu/disagg_xpu_dra.yaml: Disaggregated deployment with DRA
  • deploy/xpu/disagg_planner_xpu_dra.yaml: Disaggregated with Global Planner
  • deploy/xpu/disagg_xpu.yaml: Traditional device plugin (gpu.intel.com/xe)
  • deploy/xpu/README.md: XPU-specific deployment guide

Key configurations for XPU disaggregated KV transfer:

  • hostIPC: true for ZE_IPC shared memory communication
  • UCX_TLS=ze_ipc,ze_copy,tcp,cma,sysv,posix,self for UCX transport layer
  • ResourceClaimTemplate for GPU allocation via Kubernetes DRA
  • Note: Do not set ZE_AFFINITY_MASK with DRA - it conflicts and causes SIGSEGV

Where should the reviewer start?

  • examples/backends/sglang/deploy/xpu/README.md - Overview and prerequisites
  • examples/backends/sglang/deploy/xpu/disagg_xpu_dra.yaml - Core disaggregated config

Related Issues:

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

VincyZhang and others added 17 commits May 28, 2026 11:06
…dynamo#9974)

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
…ker (DIS-2107) (ai-dynamo#9830)

Signed-off-by: Tzu-Ling <tzulingk@nvidia.com>
ai-dynamo#10074)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… divergences (ai-dynamo#9971)

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
…10070)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhang, Wenxin <wenxin.zhang@intel.com>
Signed-off-by: VincyZhang <wenxin.zhang@intel.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com>
…ocs (ai-dynamo#9982)

Signed-off-by: Shaoting-Feng <shaotingf@tensormesh.ai>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: dynamo-ops <170655669+dynamo-ops@users.noreply.github.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Indrajit Bhosale <iamindrajitb@gmail.com>
…i-dynamo#10053)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
…o#9952)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tedzhouhk and others added 29 commits June 9, 2026 14:46
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
…as ✗ in parity tables (ai-dynamo#10504)

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
… fields (ai-dynamo#10255)

Signed-off-by: Ishan Dhanani <ishandhanani@gmail.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com>
Co-authored-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: carlory <baofa.fan@daocloud.io>
Add demand-driven Python KV router response consumption and lifecycle E2E coverage for slot tracking, replication, and disaggregated roles.
…ling (ai-dynamo#10422)

Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
…cking (ai-dynamo#10508)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…harness (ai-dynamo#10535)

Signed-off-by: nnshah1 <neelays@nvidia.com>
Signed-off-by: carlory <baofa.fan@daocloud.io>
…ew works (ai-dynamo#10073) (ai-dynamo#10288)

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-dynamo#10092)

Signed-off-by: jooe0824 <jooe0824@sk.com>
Co-authored-by: jooe0824 <jooe0824@sk.com>
)

Signed-off-by: Anant Sharma <anants@nvidia.com>
…i-dynamo#10518)

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
…mo#10551)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…worker set (ai-dynamo#10503)

Signed-off-by: Jie Hao <jihao@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Compress single-threaded radix recovery dumps and suppress mocker KV events when prefix caching is disabled.\n\nCloses ai-dynamo#10524
@yao531441 yao531441 closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.