[sgl-kernel] Prep for torch 2.11 upgrade and switch PyPI default to cu130#24162
[sgl-kernel] Prep for torch 2.11 upgrade and switch PyPI default to cu130#24162Kangyan-Zhou merged 3 commits intomainfrom
Conversation
Bump TORCH_VER from 2.9.1 to 2.11.0 across the CUDA build matrix in sgl-kernel/Dockerfile, and update the README install requirement to match. Torch 2.11 ships cu129 wheels (torch 2.9 did not), so the 12.9 CUDA row now correctly resolves to the cu129 PyTorch wheel index instead of falling back to cu128. This is a prerequisite for the wider torch 2.11 upgrade in #21247: sgl-kernel must produce a torch-2.11-compatible wheel before any downstream consumer can install against torch 2.11. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the "Strip +cu local version" + "Upload to PyPI" steps from build-cu129-matrix to build-cu130-matrix, and update the surrounding comment to point at cu130. Both build matrices still produce wheels that flow through upload-artifact → sgl-project/whl index, only the PyPI-uploaded variant changes. Rationale: torch 2.11 (PR #21247) targets cu13 by default, so the sgl-kernel wheel that pip installs from PyPI by name should match. cu129 wheels remain available via the sgl-project/whl index for the legacy cuda 12.9 install path (`+cu129` local version label). The strip script's regex (+cu[0-9]\+$) is already generic, so the behavior on the cu130 side is identical to the prior cu129 path — only the comment is rewritten to reference the new variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the PyTorch version to 2.11.0 in the Dockerfile and README. Feedback suggests refactoring the Dockerfile to eliminate redundant version assignments and ensuring CUDA tag mappings are robust. Additionally, it is recommended to update the minimum version in pyproject.toml to maintain consistency with the new requirement.
| 13.0) TORCH_VER=2.11.0; CU_TAG=cu130 ;; \ | ||
| 12.9) TORCH_VER=2.11.0; CU_TAG=cu129 ;; \ | ||
| 12.8) TORCH_VER=2.11.0; CU_TAG=cu128 ;; \ | ||
| *) TORCH_VER=2.11.0; CU_TAG=cu126 ;; \ |
There was a problem hiding this comment.
The TORCH_VER=2.11.0 assignment is repeated in every branch of the case statement. Additionally, the default case (line 85) maps to cu126, which may not be appropriate for all CUDA versions (e.g., 12.1 or 12.4, which are referenced elsewhere in the repository). Consider refactoring this block to reduce redundancy and ensure the CU_TAG mapping is robust for all supported CUDA versions.
|
|
||
| ## Installation | ||
| Requires torch == 2.9.1 | ||
| Requires torch == 2.11.0 |
There was a problem hiding this comment.
The required torch version is updated to 2.11.0 here, but sgl-kernel/pyproject.toml still specifies torch>=2.8.0 in its build-system.requires. To ensure consistency and prevent potential build-time issues with incompatible torch versions, consider updating the minimum version in pyproject.toml to match this new requirement.
install_sglang pulls sglang-kernel from PyPI, whose default wheel tracks one CUDA version (currently cu130 after #24162). On runners targeting a different CUDA (e.g. h20 / cu129) the cu130 wheel imports fail with `libnvrtc.so.13: cannot open shared object file`. The previous reinstall guard only triggered when the installed version carried a `+cuXYZ` local-version tag, which the PyPI default does not, so the mismatch on cu129 runners went unnoticed. Drop the comparison and just force-reinstall from `https://docs.sglang.ai/whl/${CU_VERSION}/` unconditionally — the index URL alone determines what wheel we want. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
install_sglang pulls sglang-kernel from PyPI, whose default wheel tracks one CUDA version (currently cu130 after #24162). On runners targeting a different CUDA (e.g. h20 / cu129) the cu130 wheel imports fail with `libnvrtc.so.13: cannot open shared object file`. The previous reinstall guard only triggered when the installed version carried a `+cuXYZ` local-version tag, which the PyPI default does not, so the mismatch on cu129 runners went unnoticed. Drop the comparison and just force-reinstall from `https://docs.sglang.ai/whl/${CU_VERSION}/` unconditionally — the index URL alone determines what wheel we want. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…u130 (sgl-project#24162) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Summary
Slim subset of #21247 covering only the sgl-kernel side of the torch
2.11 upgrade.
sgl-kernel/Dockerfileandsgl-kernel/README.mdto torch 2.11.0.Torch 2.11 ships cu129 wheels (torch 2.9 didn't), so the 12.9 row of
the build matrix now resolves to
cu129instead of falling back tocu128..github/workflows/release-whl-kernel.yml, move the "Strip +culocal version" + "Upload to PyPI" steps from
build-cu129-matrixto
build-cu130-matrix. cu130 becomes the PyPI-released variant;cu129 wheels remain available via the sgl-project/whl index (with
the
+cu129PEP 440 local-version label) but no longer reach PyPI.The strip-script regex (
+cu[0-9]\+$) is generic, so the cu130side is identical to the prior cu129 path.
Test plan
Release SGLang Kernelsvia workflow_dispatch withtarget=cu130; confirm the wheel lands on PyPI with no+cu130suffix on the
Version:line.target=cu129; confirm no PyPI upload happens, and thewheel still lands on sgl-project/whl with
+cu129.version.pybump, confirm the auto-pushflow publishes cu130 to PyPI as expected.
🤖 Generated with Claude Code