feat(examples/sglang): add Intel XPU deployment configs by yao531441 · Pull Request #1 · yao531441/dynamo

yao531441 · 2026-05-27T06:14:58Z

Overview:

Add Kubernetes deployment templates for SGLang on Intel XPU GPUs using Dynamic Resource Allocation (DRA).

Details:

This PR adds XPU-specific deployment configurations for SGLang backend, aligning with the vLLM XPU configs structure from PR ai-dynamo#9253.

New files:

deploy/xpu/agg_xpu_dra.yaml: Aggregated deployment with single worker
deploy/xpu/disagg_xpu_dra.yaml: Disaggregated deployment with DRA
deploy/xpu/disagg_planner_xpu_dra.yaml: Disaggregated with Global Planner
deploy/xpu/disagg_xpu.yaml: Traditional device plugin (gpu.intel.com/xe)
deploy/xpu/README.md: XPU-specific deployment guide

Key configurations for XPU disaggregated KV transfer:

hostIPC: true for ZE_IPC shared memory communication
UCX_TLS=ze_ipc,ze_copy,tcp,cma,sysv,posix,self for UCX transport layer
ResourceClaimTemplate for GPU allocation via Kubernetes DRA
Note: Do not set ZE_AFFINITY_MASK with DRA - it conflicts and causes SIGSEGV

Where should the reviewer start?

examples/backends/sglang/deploy/xpu/README.md - Overview and prerequisites
examples/backends/sglang/deploy/xpu/disagg_xpu_dra.yaml - Core disaggregated config

Related Issues:

Relates to PR feat(examples): add XPU DRA deployment examples (aggregation, disaggregation, tracing) ai-dynamo/dynamo#9253 (vLLM XPU deployment configs)

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

…ynamo#9818) Signed-off-by: VincyZhang <wenxin.zhang@intel.com>

…dynamo#9974) Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com> Co-authored-by: Neelay Shah <neelays@nvidia.com>

…i-dynamo#10089) Signed-off-by: Anant Sharma <anants@nvidia.com>

…ker (DIS-2107) (ai-dynamo#9830) Signed-off-by: Tzu-Ling <tzulingk@nvidia.com>

ai-dynamo#10074) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… divergences (ai-dynamo#9971) Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

…10070) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Zhang, Wenxin <wenxin.zhang@intel.com> Signed-off-by: VincyZhang <wenxin.zhang@intel.com> Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com>

…ocs (ai-dynamo#9982) Signed-off-by: Shaoting-Feng <shaotingf@tensormesh.ai> Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>

Signed-off-by: dynamo-ops <170655669+dynamo-ops@users.noreply.github.com> Signed-off-by: Anant Sharma <anants@nvidia.com> Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com> Co-authored-by: Anant Sharma <anants@nvidia.com> Co-authored-by: Indrajit Bhosale <iamindrajitb@gmail.com>

…i-dynamo#10053) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

…o#9952) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

…as ✗ in parity tables (ai-dynamo#10504) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

… fields (ai-dynamo#10255) Signed-off-by: Ishan Dhanani <ishandhanani@gmail.com>

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

… templates (ai-dynamo#10380)

Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com> Co-authored-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

Signed-off-by: carlory <baofa.fan@daocloud.io>

Add demand-driven Python KV router response consumption and lifecycle E2E coverage for slot tracking, replication, and disaggregated roles.

…ling (ai-dynamo#10422) Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

Signed-off-by: Dan Gil <dagil@nvidia.com>

…cking (ai-dynamo#10508) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ai-dynamo#10516)

…harness (ai-dynamo#10535) Signed-off-by: nnshah1 <neelays@nvidia.com>

Signed-off-by: carlory <baofa.fan@daocloud.io>

…ew works (ai-dynamo#10073) (ai-dynamo#10288) Signed-off-by: zhongdaor <zhongdaor@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-dynamo#10092) Signed-off-by: jooe0824 <jooe0824@sk.com> Co-authored-by: jooe0824 <jooe0824@sk.com>

…0502) Signed-off-by: Anant Sharma <anants@nvidia.com>

…ynamo#10529)

) Signed-off-by: Anant Sharma <anants@nvidia.com>

…i-dynamo#10518) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

…mo#10551) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ynamogh-8749) (ai-dynamo#10037) Signed-off-by: nnshah1 <neelays@nvidia.com>

…worker set (ai-dynamo#10503) Signed-off-by: Jie Hao <jihao@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ynamo#10510)

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

Compress single-threaded radix recovery dumps and suppress mocker KV events when prefix caching is disabled.\n\nCloses ai-dynamo#10524

github-actions Bot added feat documentation Improvements or additions to documentation backend::vllm backend::sglang backend::trtllm router frontend planner deployment::k8s container actions multimodal labels May 27, 2026

github-advanced-security AI found potential problems May 27, 2026

View reviewed changes

VincyZhang and others added 17 commits May 28, 2026 11:06

fix: use shared conftest for selective module skip in GMS tests (ai-d…

8294ddd

…ynamo#9818) Signed-off-by: VincyZhang <wenxin.zhang@intel.com>

fix: add catch and unwind stuff for indexer so we have a log msg (ai-…

faf0f51

…dynamo#9974) Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>

feat: context propagation metadata http side (ai-dynamo#9726)

42bef4b

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com> Co-authored-by: Neelay Shah <neelays@nvidia.com>

feat(router): propagate topology constraints to decode (ai-dynamo#9893)

a3f3c1c

ci: enable EFA builds in PR on framework or aws dockerfile changes (a…

eb98d8b

…i-dynamo#10089) Signed-off-by: Anant Sharma <anants@nvidia.com>

feat(sglang): gate chat-shaped Prometheus collectors on embedding wor…

fa1b570

…ker (DIS-2107) (ai-dynamo#9830) Signed-off-by: Tzu-Ling <tzulingk@nvidia.com>

docs(trtllm): align with upstream-container migration (ai-dynamo#9654) (

7d0be16

ai-dynamo#10074) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix: Fix/document kimi_k2 streaming reasoning and tool-calling parser…

aae9b95

… divergences (ai-dynamo#9971) Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

ci: add Datadog Test Optimization to shared-test workflow (ai-dynamo#…

6824b82

…10070) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

test: Enable additional tests on Intel XPU CI (ai-dynamo#8262)

e45f037

Signed-off-by: Zhang, Wenxin <wenxin.zhang@intel.com> Signed-off-by: VincyZhang <wenxin.zhang@intel.com> Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

ci: Shift vllm deploy tests earlier in the PR pipeline (ai-dynamo#10090)

c54e4a6

Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com>

feat(vllm): add aggregated LMCache MP-mode launch script and align d…

c226999

…ocs (ai-dynamo#9982) Signed-off-by: Shaoting-Feng <shaotingf@tensormesh.ai> Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

Fix: Issue-7900 (ai-dynamo#9715)

7af22dc

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>

fix(tests): remove unbounded xdist startup stagger from serve tests (a…

e7eb1c5

…i-dynamo#10053) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix: improve DeepSeek V3 parser parity (ai-dynamo#9813)

ae9816b

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

refactor(container): install NIXL wheel libs through helper (ai-dynam…

a0a2cab

…o#9952) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tedzhouhk and others added 29 commits June 9, 2026 14:46

feat(planner): support MTP accept length in replay (ai-dynamo#10501)

90b99d7

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

chore(parser): visualization update, render Python parser exceptions …

047f892

…as ✗ in parity tables (ai-dynamo#10504) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

refactor: move framework metadata off disaggregated_params onto typed…

fdefcfa

… fields (ai-dynamo#10255) Signed-off-by: Ishan Dhanani <ishandhanani@gmail.com>

perf(llm): reduce Responses stream overhead (ai-dynamo#10498)

d202863

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

perf(llm): reduce Anthropic messages stream overhead (ai-dynamo#10499)

14ab394

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

fix(renderer): flatten mixed text+image content for pass-through chat…

94c3c22

… templates (ai-dynamo#10380)

fix: remove failover requirement for inter-pod GMS (ai-dynamo#10378)

54318f2

Signed-off-by: Schwinn Saereesitthipitak <17022745+galletas1712@users.noreply.github.com> Co-authored-by: Dr. Stefan Schimanski <sschimanski@nvidia.com>

perf(kv-router): bucket prune expiries (ai-dynamo#10521)

5581852

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

fix: support Python 3.10 Required typing import (ai-dynamo#10396)

fde3d55

Signed-off-by: carlory <baofa.fan@daocloud.io>

test(router): add slot tracker lifecycle coverage

112406b

Add demand-driven Python KV router response consumption and lifecycle E2E coverage for slot tracking, replication, and disaggregated roles.

fix(frontend): chat_template.json fallback for vllm + sglang tool cal…

a1ea8a0

…ling (ai-dynamo#10422) Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

docs: clarify heading-case rule in the style guide (ai-dynamo#10534)

d625943

Signed-off-by: Dan Gil <dagil@nvidia.com>

test(serve): shrink heavy CI model footprints to speed up GPU test pa…

b64cd0a

…cking (ai-dynamo#10508) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(replay): offline+concurrency to queue multi-turn reqs depth first (…

ff56406

…ai-dynamo#10516)

docs(fault-tolerance): add README + warnings on fault-injection test …

943018f

…harness (ai-dynamo#10535) Signed-off-by: nnshah1 <neelays@nvidia.com>

fix: start resource counter after cache sync (ai-dynamo#10309)

2202bb2

Signed-off-by: carlory <baofa.fan@daocloud.io>

docs(fern): inject NVIDIA global-theme at publish time so local previ…

9e44d9a

…ew works (ai-dynamo#10073) (ai-dynamo#10288) Signed-off-by: zhongdaor <zhongdaor@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(profiler): pin prefill/decode replicas to 1 in autoscale mode (ai…

cb36a74

…-dynamo#10092) Signed-off-by: jooe0824 <jooe0824@sk.com> Co-authored-by: jooe0824 <jooe0824@sk.com>

chore(deps): update python-multipart and pytest versions (ai-dynamo#1…

a95e0b1

…0502) Signed-off-by: Anant Sharma <anants@nvidia.com>

fix(vllm): serialize routed_experts as base64 with start offset (ai-d…

1898840

…ynamo#10529)

fix: invoke pytest via python -m in pytest-local action (ai-dynamo#10558

265f13c

) Signed-off-by: Anant Sharma <anants@nvidia.com>

fix(deepseek): recover 3 more cases of unterminated DSML tool calls (a…

fdbc531

…i-dynamo#10518) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

ci(datadog): scope test-optimization per workflow via DD_ENV (ai-dyna…

309c919

…mo#10551) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(self-host): native preprocessor consumes resolved slug_dir (ai-d…

7ff7cff

…ynamogh-8749) (ai-dynamo#10037) Signed-off-by: nnshah1 <neelays@nvidia.com>

fix(frontend): confine request routing to namespaces with a complete …

b95f153

…worker set (ai-dynamo#10503) Signed-off-by: Jie Hao <jihao@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix(renderer): default missing tool description to empty string (ai-d…

8e7c8c6

…ynamo#10510)

perf(kv-router): reduce lookup hot-path overhead (ai-dynamo#10540)

2fecc35

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

perf(router): compress single-threaded radix recovery dumps

722e720

Compress single-threaded radix recovery dumps and suppress mocker KV events when prefix caching is disabled.\n\nCloses ai-dynamo#10524

Merge branch 'main' into feat/sglang-xpu-deploy-configs

788220d

yao531441 closed this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(examples/sglang): add Intel XPU deployment configs#1

feat(examples/sglang): add Intel XPU deployment configs#1
yao531441 wants to merge 1521 commits into
mainfrom
feat/sglang-xpu-deploy-configs

yao531441 commented May 27, 2026

Uh oh!

github-advanced-security AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

yao531441 commented May 27, 2026

Overview:

Details:

Where should the reviewer start?

Related Issues:

Uh oh!

github-advanced-security AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants