Skip to content

[CI] Add cu13.0 wheel + container builds and nightly wheel releases#3069

Merged
ApostaC merged 3 commits intoLMCache:devfrom
deng451e:wheel_variants
Apr 20, 2026
Merged

[CI] Add cu13.0 wheel + container builds and nightly wheel releases#3069
ApostaC merged 3 commits intoLMCache:devfrom
deng451e:wheel_variants

Conversation

@deng451e
Copy link
Copy Markdown
Collaborator

@deng451e deng451e commented Apr 17, 2026

Summary

  • Nightly wheels: daily nightly-wheels job publishes cu12.9 → nightly and cu13.0 → nightly-cu13 rolling GitHub Releases; install with --pre --find-links,
    no version pinning
  • Stable cu13.0: published to dedicated v{tag}-cu13 GitHub Release (not PyPI); no +cu130 suffix, installs cleanly as lmcache=={VERSION}
  • Dockerfiles: BASE_IMAGE ARG overrides NVIDIA base without file edits; BUILD_TRITON=0 makes triton opt-in; removed redundant cuda.txt pre-install
  • Docs: "Pre-release" → "Nightly" tab; cu13 install simplified; ROCm steps merged

Release workflow

       push tag / release published          cron 07:30 UTC daily
                  │                                   │
      ┌───────────┴───────────┐           ┌───────────┴───────────┐
      │      publish.yml      │           │   nightly_build.yml   │
      └───────────┬───────────┘           └───────────┬───────────┘
                  │                                   │
      ┌───────────┼───────────┐           ┌───────────┼───────────┐
      ▼           ▼           ▼           ▼           ▼           ▼
   cu12.9      cu13.0      Docker      cu12.9      cu13.0      Docker
    wheel       wheel      images       wheel       wheel      images
      │           │           │           │           │           │
      ▼           ▼           ▼           ▼           ▼           ▼
    PyPI      v{tag}-cu13  DockerHub   nightly    nightly-cu13 DockerHub
 test.pypi   GitHub Rel.  lmcache/    GitHub Rel. GitHub Rel.  :latest-nightly
(dev push)               vllm-openai             (rolling)    :latest-nightly-cu13
                         :latest
                         :{tag}

Note

Medium Risk
Medium risk because it expands release automation (new nightly wheel publishing, new cu13 release artifacts, and additional Docker image variants), which could break packaging or deployment if misconfigured.

Overview
Adds automated nightly wheel publishing: the nightly workflow now builds CUDA 12.9 and CUDA 13.0 wheels via cibuildwheel and publishes them as rolling prerelease GitHub Releases (nightly and nightly-cu13).

Extends the release pipeline to build a dedicated CUDA 13.0 (cu130) wheel artifact, publish it to a separate ${tag}-cu13 GitHub Release (not PyPI), and gate additional Docker publishing steps on that job.

Updates Docker builds to default to CUDA 12.9, allow overriding the NVIDIA base via BASE_IMAGE, pull vLLM/torch from CUDA-specific indexes, and add new CUDA 13 image targets/tags for both nightly and release; docs and pyproject.toml are updated accordingly (incl. manylinux image bump to cuda12.9).

Reviewed by Cursor Bugbot for commit ecfecf4. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the project to support CUDA 12.9 and 13.0, involving significant updates to the Dockerfiles, build configurations, and a major restructuring of the installation documentation. The changes include conditional Triton kernel installation and explicit Torch versioning in the build stages. Feedback identifies a shell word-splitting bug in the Dockerfile's package installation logic and an outdated comment in the pyproject.toml file.

Comment thread docker/Dockerfile Outdated
Comment thread pyproject.toml Outdated
Comment thread docker/Dockerfile Outdated
Comment thread .github/workflows/publish.yml Outdated
Comment thread .github/workflows/nightly_build.yml Outdated
Comment thread .github/workflows/publish.yml
Comment thread .github/workflows/nightly_build.yml
@deng451e deng451e force-pushed the wheel_variants branch 2 times, most recently from 59cdbef to 9f44377 Compare April 17, 2026 20:37
Comment thread .github/workflows/publish.yml
Comment thread docker/Dockerfile.standalone
Comment thread .github/workflows/nightly_build.yml Outdated
Comment thread .github/workflows/nightly_build.yml Outdated
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 47cda1e. Configure here.

Comment thread .github/workflows/publish.yml
deng451e and others added 2 commits April 18, 2026 01:27
publish.yml:
- Add build-cu130-artifacts job: builds cu13 wheel via cibuildwheel
  using pytorch/manylinux2_28-builder:cuda13.0, TORCH_CUDA_ARCH_LIST
  for sm 8.0–12.0, and PIP_EXTRA_INDEX_URL for cu130 torch
- Add publish-cu130-github-release job: creates v{tag}-cu13 GitHub
  Release with --latest=false so pip can resolve via --find-links
- publish-image: add cu13 stable Docker steps (image-release-cu13
  target); guard with needs.publish-cu130-github-release.result==success
  so cu130 failures skip cu13 images without blocking cu12.9 images;
  use needs.publish-pypi.result==success to decouple from cu130 failures
- Add download-r2.pytorch.org and pypi.nvidia.com to egress allowlist
  (torch cu130 downloads via CDN; cuda-bindings on NVIDIA PyPI)
- Add index.docker.io to allowlist for DockerHub pulls inside containers
- Make DockerHub login conditional on vars.DOCKERHUB_USERNAME for fork PRs

nightly_build.yml:
- Add nightly-wheels job: builds cu12.9 and cu13.0 wheels daily,
  publishes to separate rolling GitHub Releases (nightly / nightly-cu13);
  interleave build→publish per variant so a cu13 failure cannot block
  cu12.9 publishing; SETUPTOOLS_SCM_PRETEND_VERSION for dev versioning
- nightly-build: add cu13 container builds (vllm-openai + standalone);
  docker system/builder prune before each build to prevent disk exhaustion
- Add download-r2.pytorch.org and pypi.nvidia.com to egress allowlist
- Make DockerHub login conditional on vars.DOCKERHUB_USERNAME

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: deng451e <838677410@qq.com>
nightly_build.yml: interleave build→publish per CUDA variant so the
cu13 wheel build step runs after cu12.9 is already published; a cu13
build failure now skips only the cu13 publish, leaving cu12.9 unaffected

publish.yml: change publish-image guard from !failure() to
needs.publish-pypi.result=='success' so a failing (not just skipped)
publish-cu130-github-release does not block cu12.9 Docker publishing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: deng451e <838677410@qq.com>
Signed-off-by: deng451e <838677410@qq.com>
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Excited to see the 0.4.4 release!

@ApostaC ApostaC enabled auto-merge (squash) April 20, 2026 19:49
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Apr 20, 2026
@ApostaC ApostaC merged commit 9a5cc88 into LMCache:dev Apr 20, 2026
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants