[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13 by Fridge003 · Pull Request #23119 · sgl-project/sglang

Fridge003 · 2026-04-18T04:47:44Z

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

Phase 0 scaffolding for the uv venv migration: runners can opt into a fresh per-job venv via SGLANG_CI_USE_VENV=1, which eliminates the stale CUDA .so accumulation in the runner's writable layer across toolkit bumps (e.g. cu129 -> cu130 -> cu129 revert). Toggle is off by default; legacy path is behaviorally unchanged. The install script now auto-detects CU_VERSION from nvcc in venv mode, validates host-driver >= container toolkit, guards against unsupported CUDA versions, discovers nvidia/torch .so directories for LD_LIBRARY_PATH, and runs a smoke test that asserts loaded NVIDIA libs resolve under $VIRTUAL_ENV (catching runtime shadowing that plain ldd misses). ci_install_deepep.sh now sources ci_install_dependency.sh so venv activation propagates, and replaces nvidia-smi CUDA detection with the inherited $NVCC_VER. All bare `pip` calls converted to $PIP_CMD. Adds ci_cleanup_venv.sh (best-effort post-job cleanup) and a canary workflow that forces the venv path on 1-gpu-5090; install/sanity fail loudly while the test run is continue-on-error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

If a CI job is cancelled mid-run, the per-job uv venv persists in /tmp. Add a sweep of /tmp/sglang-ci-* dirs older than 4 hours to ci_cleanup_venv.sh, complementing the per-job targeted removal.

This reverts commit e8e2c1e.

This reverts commit 3d4d7e0.

gemini-code-assist · 2026-04-18T04:47:48Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Fridge003 · 2026-04-18T04:49:16Z

/tag-and-rerun-ci

This reverts commit e978aac.

…to kangyan/ci-uv-venv-migration

Re-enables per-job uv venv (disabled in #23119) by pairing it with two changes that make deep_gemm's bf16 JIT cache work across the per-job venv: - UV_VENV="/tmp/sglang-ci-venv" — stable path so library_root (and the resulting cache-key hash) is identical across every job and container. - export DG_JIT_CACHE_DIR=/root/.cache/deep_gemm/bf16_jit_cache — redirects deep_gemm's bf16 cache out of /root/.deep_gemm/ (container writable layer) into the already-host-mounted deep_gemm subdir, so all containers on a host share compiled kernels. Both are independently required: neither alone makes cross-container cache hit work. Verified on an H200 host, two separate containers: first compile 2.0s, cross-container read 0.010s (~220x speedup). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Re-enables per-job uv venv (disabled in #23119) by pairing it with two changes that make deep_gemm's bf16 JIT cache work across the per-job venv: - UV_VENV="/tmp/sglang-ci-venv" — stable path so library_root (and the resulting cache-key hash) is identical across every job and container. - export DG_JIT_CACHE_DIR=/root/.cache/deep_gemm/bf16_jit_cache — redirects deep_gemm's bf16 cache out of /root/.deep_gemm/ (container writable layer) into the already-host-mounted deep_gemm subdir, so all containers on a host share compiled kernels. Both are independently required: neither alone makes cross-container cache hit work. Verified on an H200 host, two separate containers: first compile 2.0s, cross-container read 0.010s (~220x speedup).

Re-enables per-job uv venv (disabled in #23119) by using a stable venv path so deep_gemm's NVCC file cache remains reusable across jobs. - USE_VENV default flipped to 1 (install script + pr-test.yml env). - UV_VENV="/tmp/sglang-ci-venv" — stable across every job and container on a host. deep_gemm hashes library_root (the abspath of the deep_gemm package) into its cache key, so varying the venv path per job breaks cache reuse. Holding the path constant keeps the hash constant. - rm -rf before `uv venv` handles a stale dir from a crashed prior job. - Cleanup script fallback targets the stable path (glob no longer needed). deep_gemm's cache dir itself is already host-mounted on every runner (/root/.cache/deep_gemm/), so no additional mount or env-var change is needed — the stable venv path alone restores cross-container cache sharing. Verified on an H200 host: compile ~2s in one container, cross-container read ~0.01s in a second container at the same stable path.

Port PR sgl-project#23136 (Yuhao Yang): cudaMemcpyBatchAsync lost its failIdx parameter in CUDA 13, so the dlsym-based call was passing the stream handle at the wrong slot and segfaulting inside cuMemcpyBatchAsync_v2. Use driver_version at runtime to dispatch to either the CUDA 12 or CUDA 13 signature. With the segfault fixed, move the 7 hicache tests that were parked under test/manual in PR sgl-project#23119 and subsequent cu13 flake sweeps back into test/registered so they run in CI again: - hicache/test_hicache_storage.py - hicache/test_hicache_storage_3fs_backend.py - hicache/test_hicache_storage_file_backend.py - hicache/test_hicache_storage_mooncake_backend.py - hicache/test_hicache_storage_runtime_attach_detach.py - hicache/test_hicache_variants.py - 4-gpu-models/test_qwen35_hicache.py TODO "move back after fixed" docstrings are stripped and the register_cuda_ci call that was dropped from the mooncake backend test on its way to manual is restored. Co-Authored-By: Yuhao Yang <yhyang201@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After the main `uv pip install -e python[...]` step, runners that carried state from the pre-sgl-project#23119 (cu129) era keep `nvidia-cuda-runtime-cu12` installed as an orphan (Required-by: empty) alongside the cu13 runtime. Its libcudart.so.12 sits under `nvidia/cuda_runtime/lib/` while cu13's lives under `nvidia/cu13/lib/`. Both dirs end up on LD_LIBRARY_PATH, so cudnn_frontend_shim.h's probe for lib in ["libcudart.so.12", "libcudart.so.13"]: dlopen(lib) loads both and throws: RuntimeError: Multiple libcudart libraries found: libcudart.so.12 and libcudart.so.13 Tests hit this during server setUpClass → CUDA graph capture (e.g. test_nvfp4_gemm_sm120.py on stage-b-test-1-gpu-small). The same failure reproduces on main, so this is not PR-specific — it's a leftover cleanup step the cu13 migration missed. Fix: uninstall nvidia-cuda-runtime-cu12 right after the main install. Its install dir is disjoint from cu13's so the uninstall doesn't touch any files shared with cu13 packages (a blunter sweep of all `nvidia-*-cu12` breaks torch because several pairs share dirs under `nvidia/<name>/lib/` and uninstalling one deletes files that the cu13 variant still references through its RECORD). Reproduced and verified on 5090-novita-ci-runner-d (runner-1 container): before: libcudart.so.12 + libcudart.so.13 both loadable after : only libcudart.so.13 loadable, torch.cuda.randn works Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`modelopt_quant` and `modelopt_export_path` were removed from ModelConfig.__init__ in sgl-project#10154 (replaced by unified `quantization` flag and LoadConfig.modelopt_export_path), but the test was never updated. It stayed latent because the class is skipped when nvidia-modelopt isn't installed; sgl-project#23119 added the dep to the CI image yesterday, which exposed the failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`modelopt_quant` and `modelopt_export_path` were removed from ModelConfig.__init__ in sgl-project#10154 (replaced by unified `quantization` flag and LoadConfig.modelopt_export_path), but the test was never updated. It stayed latent because the class is skipped when nvidia-modelopt isn't installed; sgl-project#23119 added the dep to the CI image, which exposed the failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Since sgl-project#23119 flipped the `sgl-kernel-build-wheels` matrix to `cuda-version: "13.0"` only (in preparation for the CI torch upgrade), any PR touching sgl-kernel is silently reverted on H-series (SM90) runners whose test env torch is still cu129. The failure mode is invisible on the surface: 1. PR's `sgl-kernel-build-wheels` produces a cu130 wheel (artifact `wheel-python3.10-cuda13.0`). 2. H-series test jobs download that wheel into `sgl-kernel/dist/` and `ci_install_dependency.sh` installs it. 3. The script's "sgl-kernel +cuXYZ ≠ CU_VERSION" guard (correct in its intent -- a cu130 wheel is genuinely ABI-incompat with cu129 torch) then reinstalls `sglang-kernel==<ver>` from the public Artifactory index -- replacing the PR's built wheel with the main-branch wheel that the runner is compatible with. 4. Any sgl-kernel change in the PR (new kernel signatures, schema tweaks, etc.) is silently dropped. Python-side editable code keeps the PR's expectations -> `TypeError: unexpected keyword argument` at first call. Example: PR sgl-project#21985 adds `out=` to `flash_attn_with_kvcache`. The Python wrapper (editable) passes `out=`, but the reinstalled main wheel's C++ op doesn't accept it -> TypeError on `stage-c-test-8-gpu-h20`. Fix: 1. Restore `cuda-version: "12.9"` as a second matrix entry in both the x86_64 and aarch64 `sgl-kernel-build-wheels` jobs, so every PR produces BOTH cu129 and cu130 wheels. 2. Change all test-job `download-artifact` patterns from `wheel-python3.10-cuda13.0` to `wheel-python3.10-cuda*` so both wheels land in `sgl-kernel/dist/` (`merge-multiple: true` already set). 3. In `ci_install_dependency.sh`, select the wheel matching `$CU_VERSION` by name (`+${CU_VERSION}`), falling back to the previous "any matching wheel" glob if a single-CUDA wheel is all that's present -- preserves pre-sgl-project#23119 behavior for branches that haven't picked up this change. After this patch: - B200 (cu130) tests install the cu130 wheel, no reinstall. - H-series (cu129) tests install the cu129 wheel, no reinstall. - The public-index fallback only fires when the PR didn't build its own wheel (e.g. `/rerun-stage` without kernel rebuild), matching its original purpose. Cost: one extra matrix job per PR that touches sgl-kernel (~10 min on x86_64, ~10 min on aarch64). Net per-PR CI runtime change is positive for sgl-kernel PRs (no more silently-passing tests that were really running main's wheel) and zero for other PRs (matrix doesn't run when there's nothing to build).

…gl-project#23119) Co-authored-by: Kangyan Zhou <zky314343421@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alison Shao <a.shao@wustl.edu> Co-authored-by: Mick <mickjagger19@icloud.com>

The 7 hicache tests below were moved from test/registered to test/manual in PR #23119 (cu13 upgrade) and follow-up flake sweeps because they hit the cudaMemcpyBatchAsync segfault on CUDA 13. That segfault is fixed in sglang-kernel 0.4.1.post1 (this PR), so move the tests back into test/registered: - hicache/test_hicache_storage.py - hicache/test_hicache_storage_3fs_backend.py - hicache/test_hicache_storage_file_backend.py - hicache/test_hicache_storage_mooncake_backend.py - hicache/test_hicache_storage_runtime_attach_detach.py - hicache/test_hicache_variants.py - 4-gpu-models/test_qwen35_hicache.py TODO "move back after fixed" docstrings are stripped and the register_cuda_ci call dropped from the mooncake backend test on its way to manual is restored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…gl-project#23119) Co-authored-by: Kangyan Zhou <zky314343421@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alison Shao <a.shao@wustl.edu> Co-authored-by: Mick <mickjagger19@icloud.com>

Kangyan-Zhou and others added 15 commits April 16, 2026 15:07

ci: sweep stale venvs from cancelled/crashed jobs in cleanup script

092a3c6

If a CI job is cancelled mid-run, the per-job uv venv persists in /tmp. Add a sweep of /tmp/sglang-ci-* dirs older than 4 hours to ci_cleanup_venv.sh, complementing the per-job targeted removal.

upd

d6c22ed

remove canary

54a17f7

update torch index

0814b43

upd

7e4d765

upd

4f66655

ci: reinstall sgl-kernel from cu130 index on CUDA 13.x systems

2517007

ci: add test workflow

e8e2c1e

ci: fix Python indentation in venv smoke test heredoc

3d4d7e0

upd

8e7ef9b

Revert "ci: add test workflow"

ddab115

This reverts commit e8e2c1e.

Revert "ci: fix Python indentation in venv smoke test heredoc"

45f1ee3

This reverts commit 3d4d7e0.

upd

cc15a1d

fix: add torchvision/torchaudio to uv sources for cu130 index

368a595

Fridge003 requested review from BBuf, FlamingoPg, HaiShaw, Kangyan-Zhou, bingxche, ispobock, merrymercy and yizhang2077 as code owners April 18, 2026 04:47

github-actions Bot added dependencies Pull requests that update a dependency file sgl-kernel labels Apr 18, 2026

alisonshao added the bypass-maintenance label Apr 18, 2026

github-actions Bot added the run-ci label Apr 18, 2026

Merge branch 'main' into kangyan/ci-uv-venv-migration

a303eeb

Fridge003 and others added 7 commits April 19, 2026 04:24

fix

e978aac

Revert "fix"

cfe4154

This reverts commit e978aac.

upd

e63deb2

fix

bfe7ebe

upd

b558baa

Merge remote-tracking branch 'origin/kangyan/ci-uv-venv-migration' in…

4af4288

…to kangyan/ci-uv-venv-migration

restore early fail

56d2937

Fridge003 merged commit 6ecd6f8 into main Apr 19, 2026
13 of 24 checks passed

Fridge003 deleted the kangyan/ci-uv-venv-migration branch April 19, 2026 12:32

alisonshao mentioned this pull request Apr 19, 2026

[CI] Enable uv venv with stable path for deep_gemm cache reuse #23164

Closed

2 tasks

Kangyan-Zhou mentioned this pull request Apr 19, 2026

Fix CUDA 13 cudaMemcpyBatchAsync segfault and restore hicache CI #23172

Closed

5 tasks

Kangyan-Zhou mentioned this pull request Apr 19, 2026

Fix CUDA 13 cudaMemcpyBatchAsync segfault and restore hicache CI #23183

Closed

5 tasks

Kangyan-Zhou mentioned this pull request Apr 20, 2026

Fix test_modelopt_export using stale ModelConfig kwargs #23214

Merged

2 tasks

jasperjiaguo mentioned this pull request Apr 22, 2026

ci: build sgl-kernel wheels for both cu129 and cu130 #23497

Merged

5 tasks

Kangyan-Zhou mentioned this pull request Apr 23, 2026

ci: build sgl-kernel wheels for both cu129 and cu130 #23511

Closed

5 tasks

Kangyan-Zhou mentioned this pull request Apr 28, 2026

ci: clean up stale-CUDA mooncake variant in install_extra_deps #23960

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13#23119

[CI] Add per-job uv venv isolation and upgrade CI version to Cuda 13#23119
Fridge003 merged 105 commits intomainfrom
kangyan/ci-uv-venv-migration

Fridge003 commented Apr 18, 2026

Uh oh!

gemini-code-assist Bot commented Apr 18, 2026

Uh oh!

Fridge003 commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Fridge003 commented Apr 18, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented Apr 18, 2026

Uh oh!

Fridge003 commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants