Bump flashinfer version to 0.6.7 by wzhao18 · Pull Request #38188 · vllm-project/vllm

wzhao18 · 2026-03-26T03:59:04Z

Purpose

Bump flashinfer version to 0.6.7

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request updates the FlashInfer library version from 0.6.6 to 0.6.7 across Dockerfiles, version configuration, and Python requirements. A review comment suggests re-evaluating the version constraint for the transitive dependency nvidia-cudnn-frontend to ensure compatibility with FlashInfer 0.6.7 and prevent potential build or runtime issues.

gemini-code-assist · 2026-03-26T04:01:33Z

+flashinfer-python==0.6.7
+flashinfer-cubin==0.6.7


This change updates flashinfer to 0.6.7, but does not update the version constraint for its transitive dependency nvidia-cudnn-frontend on line 16. The existing cap <1.19.0 was likely added for a previous version of flashinfer and may be incompatible with 0.6.7, potentially causing build failures or runtime errors. This constraint should be re-evaluated based on the requirements of flashinfer==0.6.7.

yewentao256

Are CI failures related?

wzhao18 · 2026-03-26T16:12:18Z

@yewentao256 checking right now.

wzhao18 · 2026-03-26T17:32:44Z

@yewentao256 Seems there are issues with the new version.

I tried the nemotron locally on GB300 and it produces repetitive output:

vllm serve nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 \
    --enforce-eager   \
    --max-model-len 4096   \
    --trust-remote-code   \
    --tensor-parallel-size 2   \
    --enable-expert-parallel   \
    --speculative-config '{"method":"mtp","num_speculative_tokens":5}'

yewentao256 · 2026-03-26T18:10:59Z

@yewentao256 Seems there are issues with the new version.

Yeah Please take a further look, we need to solve them before merging this PR

wzhao18 · 2026-03-26T18:15:08Z

@yewentao256 Yes of course.

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

wzhao18 · 2026-03-29T21:59:47Z

@yewentao256 Fixed the LM eval Ci failures - problem related to routing bias in trtllm-gen MoE kernels.

The rest CI failures also failed in main branch. buildkite/ci/pr/model-runner-v2-distributed-2-gpus seems unrelated/flaky.

yewentao256

LGTM, thanks for the work!
Also CC @mgoin

wzhao18 · 2026-03-29T23:22:40Z

@robertgshaw2-redhat Seems we need to add the casting for routing bias back.

wzhao18 · 2026-03-30T14:27:06Z

The routing bias cast is added because the CI showed GSM8k accuracy collapse with nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 and nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 with the new flashinfer release.

mgoin · 2026-03-30T15:16:43Z

I'm planning to get this change in #38423

wzhao18 · 2026-03-30T15:28:41Z

@mgoin Sounds good. Will close this one.

Bump flashinfer version to 0.6.7

046be10

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

claude Bot reviewed Mar 26, 2026

View reviewed changes

mergify Bot added ci/build nvidia labels Mar 26, 2026

github-project-automation Bot added this to NVIDIA Mar 26, 2026

gemini-code-assist Bot reviewed Mar 26, 2026

View reviewed changes

zyongye added the ready-run-all-tests Trigger CI with all tests for wide-ranging PRs label Mar 26, 2026

yewentao256 reviewed Mar 26, 2026

View reviewed changes

Fix trtllm moe routing bias dtype

456be52

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

wzhao18 force-pushed the wzhao/bump-fi-0.6.7 branch from 894a10e to 456be52 Compare March 29, 2026 00:04

wzhao18 requested review from mgoin and pavanimajety as code owners March 29, 2026 00:04

Merge branch 'main' into wzhao/bump-fi-0.6.7

2716019

johnnynunez mentioned this pull request Mar 29, 2026

[NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 #38423

Merged

8 tasks

Fix fp8 trtllm gen routing bias dtype

0620120

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

yewentao256 approved these changes Mar 29, 2026

View reviewed changes

github-project-automation Bot moved this to Ready in NVIDIA Mar 29, 2026

wzhao18 closed this Mar 30, 2026

github-project-automation Bot moved this from Ready to Done in NVIDIA Mar 30, 2026

		flashinfer-python==0.6.7
		flashinfer-cubin==0.6.7

Uh oh!

Conversation

wzhao18 commented Mar 26, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

wzhao18 commented Mar 26, 2026

Uh oh!

wzhao18 commented Mar 26, 2026

Uh oh!

yewentao256 commented Mar 26, 2026

Uh oh!

wzhao18 commented Mar 26, 2026

Uh oh!

wzhao18 commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

wzhao18 commented Mar 29, 2026

Uh oh!

wzhao18 commented Mar 30, 2026

Uh oh!

mgoin commented Mar 30, 2026

Uh oh!

wzhao18 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wzhao18 commented Mar 26, 2026 •

edited by github-actions Bot

Loading

wzhao18 commented Mar 29, 2026 •

edited

Loading