Skip to content

[sgl-kernel] Prep for torch 2.11 upgrade and switch PyPI default to cu130#24162

Merged
Kangyan-Zhou merged 3 commits intomainfrom
sgl-kernel-torch-211
Apr 30, 2026
Merged

[sgl-kernel] Prep for torch 2.11 upgrade and switch PyPI default to cu130#24162
Kangyan-Zhou merged 3 commits intomainfrom
sgl-kernel-torch-211

Conversation

@Kangyan-Zhou
Copy link
Copy Markdown
Collaborator

Summary

Slim subset of #21247 covering only the sgl-kernel side of the torch
2.11 upgrade.

  • Bump sgl-kernel/Dockerfile and sgl-kernel/README.md to torch 2.11.0.
    Torch 2.11 ships cu129 wheels (torch 2.9 didn't), so the 12.9 row of
    the build matrix now resolves to cu129 instead of falling back to
    cu128.
  • In .github/workflows/release-whl-kernel.yml, move the "Strip +cu
    local version" + "Upload to PyPI" steps from build-cu129-matrix
    to build-cu130-matrix. cu130 becomes the PyPI-released variant;
    cu129 wheels remain available via the sgl-project/whl index (with
    the +cu129 PEP 440 local-version label) but no longer reach PyPI.
    The strip-script regex (+cu[0-9]\+$) is generic, so the cu130
    side is identical to the prior cu129 path.

Test plan

  • Trigger Release SGLang Kernels via workflow_dispatch with
    target=cu130; confirm the wheel lands on PyPI with no +cu130
    suffix on the Version: line.
  • Same for target=cu129; confirm no PyPI upload happens, and the
    wheel still lands on sgl-project/whl with +cu129.
  • On the next sgl-kernel version.py bump, confirm the auto-push
    flow publishes cu130 to PyPI as expected.

🤖 Generated with Claude Code

Kangyan-Zhou and others added 2 commits April 30, 2026 11:19
Bump TORCH_VER from 2.9.1 to 2.11.0 across the CUDA build matrix in
sgl-kernel/Dockerfile, and update the README install requirement to
match. Torch 2.11 ships cu129 wheels (torch 2.9 did not), so the
12.9 CUDA row now correctly resolves to the cu129 PyTorch wheel index
instead of falling back to cu128.

This is a prerequisite for the wider torch 2.11 upgrade in #21247:
sgl-kernel must produce a torch-2.11-compatible wheel before any
downstream consumer can install against torch 2.11.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the "Strip +cu local version" + "Upload to PyPI" steps from
build-cu129-matrix to build-cu130-matrix, and update the surrounding
comment to point at cu130. Both build matrices still produce wheels
that flow through upload-artifact → sgl-project/whl index, only the
PyPI-uploaded variant changes.

Rationale: torch 2.11 (PR #21247) targets cu13 by default, so the
sgl-kernel wheel that pip installs from PyPI by name should match.
cu129 wheels remain available via the sgl-project/whl index for the
legacy cuda 12.9 install path (`+cu129` local version label).

The strip script's regex (+cu[0-9]\+$) is already generic, so the
behavior on the cu130 side is identical to the prior cu129 path —
only the comment is rewritten to reference the new variant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the PyTorch version to 2.11.0 in the Dockerfile and README. Feedback suggests refactoring the Dockerfile to eliminate redundant version assignments and ensuring CUDA tag mappings are robust. Additionally, it is recommended to update the minimum version in pyproject.toml to maintain consistency with the new requirement.

Comment thread sgl-kernel/Dockerfile
Comment on lines +82 to +85
13.0) TORCH_VER=2.11.0; CU_TAG=cu130 ;; \
12.9) TORCH_VER=2.11.0; CU_TAG=cu129 ;; \
12.8) TORCH_VER=2.11.0; CU_TAG=cu128 ;; \
*) TORCH_VER=2.11.0; CU_TAG=cu126 ;; \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The TORCH_VER=2.11.0 assignment is repeated in every branch of the case statement. Additionally, the default case (line 85) maps to cu126, which may not be appropriate for all CUDA versions (e.g., 12.1 or 12.4, which are referenced elsewhere in the repository). Consider refactoring this block to reduce redundancy and ensure the CU_TAG mapping is robust for all supported CUDA versions.

Comment thread sgl-kernel/README.md

## Installation
Requires torch == 2.9.1
Requires torch == 2.11.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The required torch version is updated to 2.11.0 here, but sgl-kernel/pyproject.toml still specifies torch>=2.8.0 in its build-system.requires. To ensure consistency and prevent potential build-time issues with incompatible torch versions, consider updating the minimum version in pyproject.toml to match this new requirement.

@Kangyan-Zhou Kangyan-Zhou merged commit 340efca into main Apr 30, 2026
57 of 63 checks passed
@Kangyan-Zhou Kangyan-Zhou deleted the sgl-kernel-torch-211 branch April 30, 2026 21:54
Kangyan-Zhou added a commit that referenced this pull request May 1, 2026
install_sglang pulls sglang-kernel from PyPI, whose default wheel tracks
one CUDA version (currently cu130 after #24162). On runners targeting a
different CUDA (e.g. h20 / cu129) the cu130 wheel imports fail with
`libnvrtc.so.13: cannot open shared object file`.

The previous reinstall guard only triggered when the installed version
carried a `+cuXYZ` local-version tag, which the PyPI default does not,
so the mismatch on cu129 runners went unnoticed. Drop the comparison and
just force-reinstall from `https://docs.sglang.ai/whl/${CU_VERSION}/`
unconditionally — the index URL alone determines what wheel we want.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kangyan-Zhou added a commit that referenced this pull request May 2, 2026
install_sglang pulls sglang-kernel from PyPI, whose default wheel tracks
one CUDA version (currently cu130 after #24162). On runners targeting a
different CUDA (e.g. h20 / cu129) the cu130 wheel imports fail with
`libnvrtc.so.13: cannot open shared object file`.

The previous reinstall guard only triggered when the installed version
carried a `+cuXYZ` local-version tag, which the PyPI default does not,
so the mismatch on cu129 runners went unnoticed. Drop the comparison and
just force-reinstall from `https://docs.sglang.ai/whl/${CU_VERSION}/`
unconditionally — the index URL alone determines what wheel we want.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026
…u130 (sgl-project#24162)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants