Skip to content

Bump Flashinfer to v0.6.1#30993

Merged
vllm-bot merged 3 commits intovllm-project:mainfrom
elvischenv:elvischenv/update-flashinfer
Jan 21, 2026
Merged

Bump Flashinfer to v0.6.1#30993
vllm-bot merged 3 commits intovllm-project:mainfrom
elvischenv:elvischenv/update-flashinfer

Conversation

@elvischenv
Copy link
Copy Markdown
Contributor

@elvischenv elvischenv commented Dec 18, 2025

Purpose

Bump Flashinfer to v0.6.1 when it is released.
API change: argument tile_tokens_dim has been removed from all TRTLLM MoE kernels.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Cursor Bugbot is generating a summary for commit a5e35ec5cd52bd4ca0c9e5a6fbb3a3b491e21ffa. Configure here.


Note

Upgrade FlashInfer to v0.6.0

  • Bumps FlashInfer to 0.6.0 in docker/Dockerfile, Dockerfile.nightly_torch (source build pinned to v0.6.0), and requirements/cuda.txt.

API updates for MoE kernels

  • Removes tile_tokens_dim from all TRTLLM MoE call sites and related helpers; deletes calculate_tile_tokens_dim and associated imports/usages across flashinfer_trtllm_moe.py, trtllm_moe.py, mxfp4.py, flashinfer_fp4_moe.py, tests.

Attention backend adjustments

  • Passes o_data_type through FlashInfer prefill/decode wrappers; updates fast-plan call to handle backend-specific arg lists (adds conditional args for fa2).

Tests

  • Updates MXFP4 MoE tests to align with new FlashInfer interfaces (removes tile sizing logic).

Written by Cursor Bugbot for commit 100b3744ddd31ce849b0bae40a87e2dbe53107e9. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request bumps the Flashinfer version to v0.6.0rc1. The changes are consistent across the Dockerfiles, requirements, and source code. The main code change is the removal of the tile_tokens_dim argument from all TRTLLM MoE kernel calls, which is in line with the API changes in the new Flashinfer version as stated in the pull request description. The related helper functions for calculating this dimension have also been correctly removed. The changes appear correct and complete for this version bump. I have not found any issues of high or critical severity.

Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!
I am thinking if we could wait a little bit until 0.6.0 formally out

Copy link
Copy Markdown
Contributor Author

@elvischenv elvischenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yewentao256 Will update to 0.6.0 when it is released. Thanks.

@jiahanc
Copy link
Copy Markdown
Contributor

jiahanc commented Dec 22, 2025

Just FYI: There is some compilation error with GCC 11, if update version, please update at least v0.6.0rc2

@pavanimajety
Copy link
Copy Markdown
Collaborator

I am in favor of adding the ready label to see if there are other failures in the CI before we switch to 0.6.0.
@elvischenv Could we please update to rc2?

@vadiklyutiy
Copy link
Copy Markdown
Collaborator

I think it's worth to run CI early because might be some fails.
Are there any objections to set ready tag in order to check the CI?

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 23, 2025
Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding this to block merging until we update to 0.6.0

@github-project-automation github-project-automation bot moved this to In review in NVIDIA Dec 23, 2025
@elvischenv elvischenv changed the title Bump Flashinfer to v0.6.0rc1 Bump Flashinfer to v0.6.0 Dec 23, 2025
@nvpohanh
Copy link
Copy Markdown
Contributor

Blocked by FlashInfer TopK functional issue: flashinfer-ai/flashinfer#2320

FlashInfer already has a fix: flashinfer-ai/flashinfer#2325

@njhill
Copy link
Copy Markdown
Member

njhill commented Jan 14, 2026

Blocked by FlashInfer TopK functional issue: flashinfer-ai/flashinfer#2320

FlashInfer already has a fix: flashinfer-ai/flashinfer#2325

This does not need to block us, we don't use the flashinfer sampler. We can just disable that test.

@elvischenv elvischenv changed the title Bump Flashinfer to v0.6.0 Bump Flashinfer to v0.6.1 Jan 14, 2026
@elvischenv
Copy link
Copy Markdown
Contributor Author

elvischenv commented Jan 15, 2026

This does not need to block us, we don't use the flashinfer sampler. We can just disable that test.

Hi @njhill, we found that 0.6.1 only fixed the sampler issue on B200 but not on L4, which is used in vLLM CI. We'd like to skip it to move forward. Skipped the test by @pytest.mark.skip.

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

Update docker/Dockerfile

Signed-off-by: Pavani Majety <pavanimajety@gmail.com>

Update to v0.6.0rc2

Co-authored-by: Pavani Majety <pavanimajety@gmail.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

Update to v0.6.0rc2

Co-authored-by: Pavani Majety <pavanimajety@gmail.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

Update to v0.6.0rc2

Co-authored-by: Pavani Majety <pavanimajety@gmail.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

update to 0.6.0

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

update to 0.6.1

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

remove tile_tokens_dim

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

fix lack of o_data_type of plan()

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>

fix fa2/fa3 API breakage

Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
@elvischenv elvischenv force-pushed the elvischenv/update-flashinfer branch from 3af5dad to 61cef9d Compare January 19, 2026 01:39
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
@elvischenv elvischenv force-pushed the elvischenv/update-flashinfer branch from 61cef9d to c4a5b24 Compare January 19, 2026 02:52
Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a question.

Also may need an approval from @njhill

Comment on lines +1688 to +1691
if self._backend == "fa2":
args.append(fixed_split_size)
args.append(disable_split_kv)
args.append(0) # num_colocated_ctas
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So FA3 doesn't support fixed_split_size?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yzh119 do you happen to know?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from flashinfer decode.py#L1065-L1089.
@nvpohanh Do you know why FA3 does not need these arguments?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they don't, they are designed for batch-invariance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break the current batch invariance test?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yewentao256 I don't think this PR will break batch invariance test because:

  • If the test was originally using FA2 backend, then it still uses FA2 backend and nothing is changed.
  • FA3 backend is enabled to support FP8 kv-cache on Hopper GPUs. Previously, we cannot even run FP8-kv-cache on Hopper GPUs.

Copy link
Copy Markdown
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, triggering more blackwell CI

@vllm-bot vllm-bot merged commit 808d6fd into vllm-project:main Jan 21, 2026
97 of 98 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Jan 21, 2026
monajafi-amd pushed a commit to monajafi-amd/vllm that referenced this pull request Jan 23, 2026
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
cwazai pushed a commit to cwazai/vllm that referenced this pull request Jan 25, 2026
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: 陈建华 <1647430658@qq.com>
lapy pushed a commit to lapy/vllm that referenced this pull request Jan 27, 2026
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
@elvischenv elvischenv deleted the elvischenv/update-flashinfer branch February 7, 2026 16:26
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

10 participants