[Dependency] Upgrade to Torch 2.11.0 by b8zhong · Pull Request #21247 · sgl-project/sglang

b8zhong · 2026-03-24T02:17:40Z

Motivation

github.com/pytorch/pytorch/releases/tag/v2.11.0

Modifications

github.com//pull/18862

gemini-code-assist · 2026-03-24T02:17:44Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

b8zhong · 2026-03-24T02:18:04Z

/tag-and-rerun-ci

johnnynunez · 2026-03-24T02:42:12Z

viz @Fridge003 @merrymercy @FlamingoPg

b8zhong · 2026-03-26T15:38:09Z

/rerun-failed-ci AGAIN

nvpohanh · 2026-04-15T04:57:51Z

@b8zhong What is the ETA of this upgrade? Thanks!

The sgl_kernel path filter's extglob `sgl-kernel/**/*.!(md|txt)` puts the negation inside the extension, which requires a literal dot in the filename. Extensionless files like `sgl-kernel/Dockerfile`, `sgl-kernel/Makefile`, and `sgl-kernel/LICENSE` therefore never trip the filter — so editing only `sgl-kernel/Dockerfile` skips the `sgl-kernel-build-wheels` job and CI falls back to the pre-built PyPI wheel (recently hit by sgl-project#21247 when bumping torch to 2.11 via sgl-kernel/Dockerfile only). Move the negation to the basename level: `sgl-kernel/**/!(*.md|*.txt)` matches any file under sgl-kernel/ whose basename does not end in `.md` or `.txt`, including extensionless files. Single extglob keeps us clear of the dorny/paths-filter multi-`!` ordering bug (dorny/paths-filter#113, sgl-project#260). Applied to all five workflows that shared the pattern: pr-test, pr-test-amd, pr-test-amd-rocm720, pr-test-xeon, pr-test-xpu. Verified locally with picomatch@2.3.1 (the version dorny uses, matched with {dot: true} as in dorny/paths-filter/src/filter.ts): Dockerfile old: skip → new: match Makefile old: skip → new: match LICENSE old: skip → new: match README.md old: skip → new: skip (preserved) CMakeLists.txt old: skip → new: skip (preserved) build.sh/*.py/*.cu/*.toml: match (unchanged) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Kangyan-Zhou · 2026-05-02T05:58:45Z

/rerun-stage multimodal-gen-component-accuracy

Kangyan-Zhou · 2026-05-02T05:58:46Z

/rerun-stage multimodal-gen-component-accuracy-1-gpu

Kangyan-Zhou · 2026-05-02T05:58:47Z

/rerun-stage multimodal-gen-component-accuracy-2-gpu

Kangyan-Zhou · 2026-05-02T05:58:48Z

/rerun-stage multimodal-gen-test-1-b200

github-actions · 2026-05-02T05:59:09Z

✅ Triggered multimodal-gen-component-accuracy to run independently (skipping dependencies). View workflow run

github-actions · 2026-05-02T05:59:09Z

✅ Triggered multimodal-gen-test-2-gpu to run independently (skipping dependencies). View workflow run

github-actions · 2026-05-02T05:59:11Z

✅ Triggered multimodal-gen-test-1-gpu to run independently (skipping dependencies). View workflow run

github-actions · 2026-05-02T05:59:13Z

✅ Triggered multimodal-gen-test-1-b200 to run independently (skipping dependencies). View workflow run

github-actions · 2026-05-02T05:59:16Z

✅ Triggered multimodal-gen-component-accuracy-1-gpu to run independently (skipping dependencies). View workflow run

github-actions · 2026-05-02T05:59:20Z

✅ Triggered multimodal-gen-component-accuracy-2-gpu to run independently (skipping dependencies). View workflow run

Kangyan-Zhou · 2026-05-02T07:43:40Z

/rerun-stage multimodal-gen-test-1-gpu

github-actions · 2026-05-02T07:44:12Z

✅ Triggered multimodal-gen-test-1-gpu to run independently (skipping dependencies). View workflow run

Kangyan-Zhou · 2026-05-02T18:42:55Z

/rerun-stage multimodal-gen-test-1-gpu

github-actions · 2026-05-02T18:43:21Z

✅ Triggered multimodal-gen-test-1-gpu to run independently (skipping dependencies). View workflow run

sgl-project#24093) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…alidator Torch 2.11 ships cu130 wheels as PyPI's default, which broke two install paths in the cu12x Dockerfile branch: 1. sgl-kernel install on cu128/cu129 (Dockerfile:205) was missing --force-reinstall --no-deps, so pip resolved sglang-kernel's torch dep and pulled a cu130 torch from PyPI into a cu129 image. Made consistent with the cu126/cu130 branches. 2. The main sglang dep install relied on --extra-index-url, which isn't strong enough to force cu12x resolution when both indexes publish the same version string. Pre-install torch/torchvision/torchaudio from the cu12x index with --index-url before the main install. Also adds docker/validate_image.py, a post-build validator invoked from release-docker-dev.yml after push-by-digest. It asserts torch.version.cuda matches the matrix CUDA_VERSION, cross-checks torch's compiled-in cudnn/nccl against the installed PyPI wheel (catches silent downgrades), hard-pins cuda-python and nvidia-cublas, and smoke-imports critical packages. Modeled after pytorch/pytorch's .ci/pytorch/smoke_test pattern. Additional changes: - Default ARG CUDA_VERSION bumped to 13.0.1 (only affects ad-hoc local builds; release workflow always passes --build-arg explicitly) - nvidia-cutlass-dsl tightened from >=4.4.1 to ==4.4.2 - docker/diffusion.Dockerfile removed (no remaining references) Companion to #21247 (torch 2.11 upgrade). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Torch 2.11's wheel metadata already pins the NVIDIA libs we were force-reinstalling, and PR #21247 applied the same cleanup in scripts/ci/cuda/ci_install_dependency.sh. Align the Dockerfile: - nvidia-nccl-cu12/cu13==2.28.3: torch 2.11 ships 2.28.9 (pinning older was a silent downgrade) - nvidia-cudnn-cu12==9.16.0.29: torch 2.11 ships 9.17.1.4 (downgrade) - nvidia-cudnn-cu13==9.16.0.29: torch 2.11 ships 9.19.0.56 (downgrade) - nvidia-cublas==13.1.0.3: already pulled transitively by cuda-toolkit[cublas]==13.0.2 at the exact same version - nvidia-cutlass-dsl==4.4.2 force-reinstall: already pinned in python/pyproject.toml and resolved by the main sglang dep install Also fix a nixl duplication bug on cu13 images: the `nixl` stub package has an unconditional requires_dist on nixl-cu12>=1.0.1, so installing plain `nixl` in the essential-packages block pulled nixl-cu12 (~49 MB) onto cu13 images on top of the subsequent nixl-cu13 install. Install nixl-cu12 / nixl-cu13 directly in the per-CUDA-major block instead. Validator (docker/validate_image.py): drop the nvidia-cublas hard-pin assertion; it's now transitively pinned by torch and the Dockerfile no longer force-reinstalls it. The torch-internal cross-check for cudnn and nccl still runs and will assert against whatever torch 2.11 ships. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…u13 refs Address findings from the code review: 1. release-docker.yml: tag_config JSON was malformed (missing comma after cu130 entry, trailing comma after cu129). fromJson would fail and break every tag-pushed release. Header comment also said latest-cu139. 2. docker/Dockerfile: restore the cu12 --index-url torch pre-install that the prior 'upd' commit dropped. With #21247 landed, torch 2.11 is the PyPI default at cu130, and --extra-index-url alone won't override it when both indexes publish the same version — cu126/cu128/cu129 images would silently ship cu130 torch. 3. Update consumers of the dropped dev-cu13 / latest-cu130-runtime tags to the new naming (dev = cu13, dev-cu12 = cu12): trivy-scan-dev.yml, nightly-72-gpu-gb200.yml, release-docker-dev.yml description, _docker-cleanup-nightly.yml examples, and scripts/ci/utils/docker_build_metadata_args.py MOVING_TAGS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

b8zhong requested review from BBuf, FlamingoPg, Fridge003, HaiShaw, ishandhanani, ispobock, merrymercy, yctseng0211 and yizhang2077 as code owners March 24, 2026 02:17

github-actions Bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file sgl-kernel labels Mar 24, 2026

b8zhong changed the title ~~[Dependency]~~ [Dependency] Upgrade to Torch 2.11.0 Mar 24, 2026

github-actions Bot added the run-ci label Mar 24, 2026

b8zhong requested review from Kangyan-Zhou and bingxche as code owners March 24, 2026 02:52

Fridge003 added the high priority label Mar 24, 2026

b8zhong force-pushed the brayden/torch-211 branch 2 times, most recently from 346a818 to ca7a304 Compare March 25, 2026 14:10

b8zhong mentioned this pull request Mar 27, 2026

[Feature] Upgrade default Cuda version to 13.0 #21498

Closed

17 tasks

mattteochen mentioned this pull request Apr 2, 2026

Shaving TTFT and reducing E2E latency with torch compile #21878

Open

Kangyan-Zhou force-pushed the brayden/torch-211 branch 2 times, most recently from df89633 to adb2ead Compare April 20, 2026 02:21

Kangyan-Zhou mentioned this pull request Apr 20, 2026

fix(ci): repair path filters regressed by #21482 #23201

Merged

4 tasks

revert stage health check

c1181ac

mickqian added 7 commits May 2, 2026 17:27

Update diffusion CI thresholds for torch 2.11

3622161

Loosen qwen image edit 2511 threshold

ff1bcc0

Loosen image t2i consistency thresholds

ee2e3d3

Update 2-gpu diffusion CI thresholds

ef3b4ff

upd

148f208

Fix CLIP tokenizer RobertaProcessing fallback

c4b6b24

Fix CLIP processor RobertaProcessing fallback

8c83b48

vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026

[CI] Fix black formatting on main (unblocks PR sgl-project#21247 lint) (

219d4b7

sgl-project#24093) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Kangyan-Zhou merged commit 88bb5df into main May 2, 2026
119 of 171 checks passed

Kangyan-Zhou deleted the brayden/torch-211 branch May 2, 2026 19:25

This was referenced May 2, 2026

Align triton_kernels with Triton 3.6.0 and fix SM120 MXFP4 MoE performance #24281

Open

[CI] Keep custom sgl-kernel wheel in CUDA CI #24291

Merged

Conversation

b8zhong commented Mar 24, 2026

Motivation

Modifications

Uh oh!

gemini-code-assist Bot commented Mar 24, 2026

Uh oh!

b8zhong commented Mar 24, 2026

Uh oh!

johnnynunez commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b8zhong commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvpohanh commented Apr 15, 2026

Uh oh!

Kangyan-Zhou commented May 2, 2026

Uh oh!

Kangyan-Zhou commented May 2, 2026

Uh oh!

Kangyan-Zhou commented May 2, 2026

Uh oh!

Kangyan-Zhou commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Kangyan-Zhou commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Kangyan-Zhou commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

johnnynunez commented Mar 24, 2026 •

edited

Loading

b8zhong commented Mar 26, 2026 •

edited

Loading