Skip to content

[CI][NIXL] Fix PD CI breakage: pin nixl-cu{12,13} versions#39851

Merged
vllm-bot merged 1 commit into
vllm-project:mainfrom
ZhanqiuHu:fix/pin-nixl-backends
Apr 15, 2026
Merged

[CI][NIXL] Fix PD CI breakage: pin nixl-cu{12,13} versions#39851
vllm-bot merged 1 commit into
vllm-project:mainfrom
ZhanqiuHu:fix/pin-nixl-backends

Conversation

@ZhanqiuHu

@ZhanqiuHu ZhanqiuHu commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

nixl-cu12==1.0.1 dropped on PyPI today (19:38 UTC) and ships nixl_ep compiled against libcudart.so.12 — crashes on CUDA 13 CI runners. Our < 0.10.0 constraint only pins the meta-package, not the backends:

[2026-04-14T21:21:43Z]  + nixl==0.9.0
[2026-04-14T21:21:43Z]  + nixl-cu12==1.0.1
[2026-04-14T21:21:43Z]  + nixl-cu13==1.0.1

Temp fix: pin nixl-cu12 and nixl-cu13 to < 0.10.0. @NickLucche is working on the proper version bump in #39797 (tracking #39521).

@ZhanqiuHu ZhanqiuHu requested a review from NickLucche as a code owner April 15, 2026 01:48
@ZhanqiuHu ZhanqiuHu changed the title [NIXL] Pin nixl-cu{12,13} to fix PD CI breakage [CI][NIXL] Fix PD CI breakage: pin nixl-cu{12,13} versions Apr 15, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the requirements/kv_connectors.txt file to include explicit dependencies for nixl-cu12 and nixl-cu13. A review comment points out that adding nixl-cu12 as a global requirement is problematic as it forces unnecessary installation and bloat on CUDA 13 systems; it is suggested to move this dependency to a CI-specific configuration instead.

Comment on lines +3 to +4
nixl-cu12 >= 0.7.1, < 0.10.0
nixl-cu13 >= 0.7.1, < 0.10.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Adding nixl-cu12 as a direct requirement forces its installation on all systems using this file, including CUDA 13 environments where it is unnecessary and adds significant bloat (100MB+). If the crash on CUDA 13 CI is caused by a pre-installed version of nixl-cu12 in the environment, this pin should be moved to a CI-specific constraints file or the package should be uninstalled during CI setup. Forcing a backend for a different CUDA version on all users is a significant regression in environment hygiene.

nixl-cu13 >= 0.7.1, < 0.10.0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not super happy with having to install both like this, do you see any other option with this requirements.txt installation method @cjackal ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One (quick but dirty) idea is just install the nixl_cu1x variant with exact version number and then force-install with --no-deps option for the nixl metapackage. Like:

# requirements/kv_connectors.txt
# put the `nixl_cu1x` variant
...
nixl_cu12==0.9.0
# In container build stage
...

RUN uv pip install -r requirements/kv_connectors.txt && uv pip install `nixl==0.9.0` --no-deps

...

nixl-cu12==1.0.1 published today ships nixl_ep compiled against
libcudart.so.12, crashing on CUDA 13 CI runners. The existing
< 0.10.0 constraint only pins the meta-package, not the backends.

Signed-off-by: ZhanqiuHu <zhu@redhat.com>
@ZhanqiuHu ZhanqiuHu force-pushed the fix/pin-nixl-backends branch from ed2b8ba to 5b47c4e Compare April 15, 2026 01:50
@ProExpertProg ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 15, 2026
@cjackal

cjackal commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

This PR also fixes #36676

@ehfd

ehfd commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Fixes #39872

@vllm-bot vllm-bot merged commit 799973a into vllm-project:main Apr 15, 2026
17 of 24 checks passed

@NickLucche NickLucche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's unblock

@ehfd

ehfd commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Can we rebuild the nightly?

khluu pushed a commit that referenced this pull request Apr 18, 2026
Signed-off-by: ZhanqiuHu <zhu@redhat.com>
(cherry picked from commit 799973a)

Signed-off-by: khluu <khluu000@gmail.com>
whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…ect#39851)

Signed-off-by: ZhanqiuHu <zhu@redhat.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…ect#39851)

Signed-off-by: ZhanqiuHu <zhu@redhat.com>
(cherry picked from commit 4809252)

Signed-off-by: khluu <khluu000@gmail.com>
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
my-other-github-account pushed a commit to my-other-github-account/vllm that referenced this pull request May 15, 2026
…ect#39851)

Signed-off-by: ZhanqiuHu <zhu@redhat.com>
(cherry picked from commit 4ff86d1)

Signed-off-by: khluu <khluu000@gmail.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
Rukhaiya2004 pushed a commit to Rukhaiya2004/vllm that referenced this pull request May 23, 2026
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…ect#39851)

Signed-off-by: ZhanqiuHu <zhu@redhat.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
@ZhanqiuHu ZhanqiuHu deleted the fix/pin-nixl-backends branch June 4, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build kv-connector ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants