Skip to content

[CI] Fix cutlass import error: restore nvidia-cutlass-dsl force-reinstall#21182

Merged
Kangyan-Zhou merged 1 commit intomainfrom
fix/cutlass-import-ci
Mar 23, 2026
Merged

[CI] Fix cutlass import error: restore nvidia-cutlass-dsl force-reinstall#21182
Kangyan-Zhou merged 1 commit intomainfrom
fix/cutlass-import-ci

Conversation

@alisonshao
Copy link
Copy Markdown
Collaborator

@alisonshao alisonshao commented Mar 23, 2026

Summary

  • Temporary fix: PR ci: refactor CUDA dependency install script #21017 refactored the CI install script but accidentally dropped the nvidia-cutlass-dsl force-reinstall
  • This causes stage-b/c jobs to fail with ModuleNotFoundError: No module named 'cutlass' across all runner types (5090, H100, B200)
  • Root cause: Docker images ship nvidia-cutlass-dsl-libs-base 4.3.5, but pyproject.toml requires >=4.4.1. During the pip upgrade, the .pth file (which makes import cutlass work) gets deleted but is not reliably recreated — a pip race condition. The force-reinstall ensures the .pth file is always present.
  • Long-term fix: rebuild Docker images with nvidia-cutlass-dsl-libs-base 4.4.2 baked in, so the upgrade doesn't happen at CI runtime

Failing jobs: https://github.com/sgl-project/sglang/actions/runs/23418883303/job/68120071432, https://github.com/sgl-project/sglang/actions/runs/23421614705/job/68127702256

Test plan

  • Stage-b/c CI jobs pass the python3 -c 'import cutlass; import cutlass.cute;' verification step

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…tall

The CUDA dependency install refactor (#21017) accidentally removed the
force-reinstall of nvidia-cutlass-dsl. Docker images ship
nvidia-cutlass-dsl-libs-base 4.3.5; upgrading to 4.4.2 can delete the
.pth file without reliably recreating it (pip race condition), causing
stage-b/c jobs to fail with ModuleNotFoundError: No module named 'cutlass'.
@alisonshao
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@alisonshao alisonshao force-pushed the fix/cutlass-import-ci branch from 8b17c0c to 1bf3b33 Compare March 23, 2026 04:54
@Kangyan-Zhou Kangyan-Zhou merged commit 44db0c5 into main Mar 23, 2026
34 of 109 checks passed
@Kangyan-Zhou Kangyan-Zhou deleted the fix/cutlass-import-ci branch March 23, 2026 05:41
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
…tall (sgl-project#21182)

Co-authored-by: Alison Shao <alison.shao@mac.lan>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
…tall (sgl-project#21182)

Co-authored-by: Alison Shao <alison.shao@mac.lan>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
…tall (sgl-project#21182)

Co-authored-by: Alison Shao <alison.shao@mac.lan>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants