Fix autotuner crash when input tensor is None by he-yufeng · Pull Request #2756 · flashinfer-ai/flashinfer

he-yufeng · 2026-03-11T09:27:46Z

trtllm_fp8_block_scale_routed_moe passes routing_logits=None for non-routed calls, but _prepare_input_tensors assumes all inputs are tensors and crashes in _create_tensor_like trying to access .dtype on None.

Fix: skip None inputs and preserve them as-is. This matches the existing pattern in _prepare_input_tensors_with_batches which already handles non-tensor inputs with isinstance(t, torch.Tensor) checks.

Summary by CodeRabbit

Bug Fixes
- Preserve missing inputs during input preparation so None entries are retained and not treated as tensors, preventing errors when some inputs are absent.
- Relax and align routing and token-count validations to allow empty routing data when appropriate and ensure checks use actual token counts for consistency.
Tests
- Add regression tests verifying None input handling and safe fallback behavior when routing/tuning data is absent.

coderabbitai · 2026-03-11T09:28:12Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Preserve None-valued optional inputs in autotuner input preparation and relax fused MoE routing validations by deriving token counts from hidden_states and allowing absent or empty routing-related tensors.

Changes

Cohort / File(s)	Summary
Autotuner input handling `flashinfer/autotuner.py`	`_prepare_input_tensors` now preserves `None` entries in `inputs` (appends `None` when encountered) and only calls `_create_tensor_like` when the input is non-`None` and the profile uses `DynamicDim`, preventing AttributeError on optional inputs like `routing_logits`.
Fused MoE routing & validation `flashinfer/fused_moe/core.py`	Derive `num_tokens` from `hidden_states.shape[0]` instead of `routing_logits.shape[0]`; make shape/assertion checks for `routing_logits`, `topk_ids`, and `expert_weights` conditional so absent or empty routing tensors are accepted.
Tests (regression) `tests/autotuner/test_autotuner_core.py`	Add tests validating `_prepare_input_tensors` handles `None` entries and that `choose_one(...)` in no-tuning/inference path tolerates `None` inputs and returns the provided runner with tactic `-1` when applicable.

Sequence Diagram(s)

(Skipped — changes are bug fixes and validation relaxations that do not introduce a new multi-component control flow requiring visualization.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: Add tests for the AutoTuner and fix bug in _find_nearest_profile #2617 — Modifies and tests AutoTuner input-preparation logic related to _prepare_input_tensors.

Suggested labels

run-ci

Suggested reviewers

sricketts
aleozlx
yzh119
cyx-6
bkryu
jimmyzho
nv-yunzheq

Poem

🐇 I nudged a None beneath a log,

no crash, just hush beneath the bog,
counted tokens from hidden light,
routing gaps tucked out of sight,
autotune hops home through the fog.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main fix—handling None input tensors in the autotuner to prevent crashes.
Description check	✅ Passed	The PR description provides a clear explanation of the issue, the root cause, and the fix, matching the template structure with linked issue reference.
Linked Issues check	✅ Passed	The code changes fully address issue `#2749`: autotuner now skips None inputs [autotuner.py], MoERunner derives num_tokens from hidden_states instead of routing_logits [fused_moe/core.py], and regression tests validate both code paths [test_autotuner_core.py].
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the None tensor handling issue: autotuner input preparation, MoERunner shape assertions, and corresponding test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 792a36d6-79c6-4bd0-944d-642934e658c4

📥 Commits

Reviewing files that changed from the base of the PR and between fe06b91 and af6cb712bf09e609ae590e65672a290eff2b2e42.

📒 Files selected for processing (1)

flashinfer/autotuner.py

gemini-code-assist · 2026-03-11T09:46:13Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the autotuner's input preparation logic that caused a crash when optional tensor inputs were None. By introducing a check to gracefully handle and preserve None values, the change enhances the robustness of the autotuner, preventing failures in scenarios where certain inputs are intentionally omitted.

Highlights

Bug Fix: Resolved a crash in the autotuner's _prepare_input_tensors function that occurred when None was passed as an input tensor, specifically for optional tensors like routing_logits in non-routed MoE calls.
Input Handling: Modified the _prepare_input_tensors function to explicitly check for and preserve None inputs, aligning its behavior with _prepare_input_tensors_with_batches which already handles non-tensor inputs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

flashinfer/autotuner.py
- Added a check in _prepare_input_tensors to handle None inputs gracefully, preventing crashes.

Activity

No specific activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request effectively addresses a critical crash by correctly handling None inputs in the _prepare_input_tensors function. The change ensures that optional tensors passed as None are preserved as-is, preventing _create_tensor_like from attempting to access attributes on a None object. This improves the robustness of the autotuner's input preparation process.

gemini-code-assist · 2026-03-11T09:47:43Z

+                # Some callers pass None for optional tensors (e.g. routing_logits
+                # in non-routed MoE). Preserve None as-is.
+                tensors.append(None)
+            elif any(isinstance(d, DynamicDim) for d in p):


The current fix correctly handles None inputs. However, the _create_tensor_like function expects origin_tensor to be a torch.Tensor. If inputs[i] is not None but also not a torch.Tensor (e.g., a Python scalar like an int or float), and p contains DynamicDim, calling _create_tensor_like with a non-tensor object will still lead to a crash (e.g., when trying to access .dtype). To ensure robustness and align with the pattern in _prepare_input_tensors_with_batches that uses isinstance(t, torch.Tensor) checks for non-tensor inputs, the elif condition should explicitly check if inputs[i] is a torch.Tensor before attempting to create a tensor-like object.

Suggested change

elif any(isinstance(d, DynamicDim) for d in p):

elif isinstance(inputs[i], torch.Tensor) and any(isinstance(d, DynamicDim) for d in p):

trevor-m · 2026-03-11T22:26:26Z

Hi @he-yufeng I tried your PR, but now I get this error:

  File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/moe_runner/flashinfer_trtllm.py", line 333, in fused_experts_none_to_flashinfer_trtllm_fp8
    output = trtllm_fp8_block_scale_routed_moe(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/flashinfer/fused_moe/core.py", line 2478, in trtllm_fp8_block_scale_routed_moe
    return get_trtllm_moe_sm100_module().trtllm_fp8_block_scale_moe(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/flashinfer/fused_moe/core.py", line 1683, in trtllm_fp8_block_scale_moe_op
    _, tactic = tuner.choose_one(
                ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/flashinfer/autotuner.py", line 480, in choose_one
    valid_tactics = r.get_valid_tactics(tensors, p)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/flashinfer/fused_moe/core.py", line 1043, in get_valid_tactics
    num_tokens = routing_logits.shape[0]
                 ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'shape'

he-yufeng · 2026-03-12T06:34:19Z

Thanks for testing @trevor-m! The crash in get_valid_tactics was caused by the same root issue — MoERunner.get_valid_tactics() and MoERunner.forward() both used routing_logits.shape[0] to get num_tokens, which fails when routing_logits is None.

Fixed in 9043934:

Changed both methods to use hidden_states.shape[0] (always available)
Guarded the shape assertions for topk_ids/expert_weights that can be empty(0) tensors in pre-computed routing mode

Could you try again with the latest commit?

trevor-m · 2026-03-12T20:53:51Z

@he-yufeng Thanks, it's working now

samuellees

Could you please add a smoke test for the fix?

samuellees · 2026-03-22T15:19:25Z

+                )
+            # topk_ids/expert_weights can be empty(0) when routing_logits is provided,
+            # or real tensors when pre-computed routing is used.
+            if topk_ids.numel() > 0:


expert_weights is checked like if expert_weights is not None and expert_weights.numel() > 0:

Could you keep a similar check style for topk_ids, please?

he-yufeng · 2026-03-23T11:14:05Z

Good catch, updated topk_ids check to match the expert_weights style.

samuellees · 2026-03-24T14:24:43Z

Good catch, updated topk_ids check to match the expert_weights style.

Thanks @he-yufeng ! Could you add a smoke test for your code path, please? I believe the PR will be moved forward very fast once the test is ready ^ ^

he-yufeng · 2026-03-25T02:16:35Z

Added two smoke tests in test_autotuner_core.py — one for _prepare_input_tensors and one for choose_one, both with a None optional tensor. Thanks for the nudge!

samuellees

LGTM. @he-yufeng Could you please resolve the conflict with main branch? Thanks

samuellees · 2026-03-25T09:46:38Z

/bot run

flashinfer-bot · 2026-03-25T09:47:06Z

GitLab MR !461 has been created, and the CI pipeline #46957861 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-25T17:00:51Z

[FAILED] Pipeline #46957861: 1/20 passed

samuellees · 2026-03-26T02:56:46Z

/bot run

flashinfer-bot · 2026-03-26T02:57:11Z

GitLab MR !461 has been created, and the CI pipeline #47022144 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-26T04:27:55Z

[FAILED] Pipeline #47022144: 1/20 passed

trtllm_fp8_block_scale_routed_moe passes routing_logits=None for non-routed calls, but _prepare_input_tensors assumes all inputs are tensors and crashes with AttributeError: 'NoneType' has no attribute 'dtype' in _create_tensor_like. Skip None inputs and preserve them as-is, matching the existing pattern in _prepare_input_tensors_with_batches which already handles non-tensor inputs gracefully. Fixes flashinfer-ai#2749

get_valid_tactics() and forward() both accessed routing_logits.shape[0] to get num_tokens, but routing_logits is None when pre-computed routing is used (trtllm_fp8_block_scale_routed_moe passes routing_logits=None). Use hidden_states.shape[0] instead, which is always available. Also guard the shape assertions for topk_ids/expert_weights that can be empty(0) tensors depending on the routing mode.

Cover the _prepare_input_tensors and choose_one paths when an optional tensor (e.g. routing_logits in non-routed MoE) is None, which previously caused AttributeError on .dtype/.shape.

samuellees · 2026-03-26T12:44:26Z

/bot run

flashinfer-bot · 2026-03-26T12:45:15Z

GitLab MR !461 has been updated with latest changes, and the CI pipeline #47045960 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-27T00:48:01Z

[FAILED] Pipeline #47045960: 12/20 passed

samuellees · 2026-03-27T02:00:30Z

/bot run

flashinfer-bot · 2026-03-27T02:00:39Z

GitLab MR !461 has been created, and the CI pipeline #47074819 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-27T06:12:54Z

[FAILED] Pipeline #47074819: 13/20 passed

samuellees · 2026-03-27T09:20:51Z

/bot run

flashinfer-bot · 2026-03-27T09:21:33Z

GitLab MR !461 has been created, and the CI pipeline #47092245 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-27T16:53:35Z

[FAILED] Pipeline #47092245: 11/20 passed

samuellees · 2026-03-28T03:26:44Z

Hi @he-yufeng , the CI seems passed. Some error are un-relative with this PR.
But could you take a look at the pre-commit check fail?
https://github.com/flashinfer-ai/flashinfer/actions/runs/23589405658/job/68979382814?pr=2756

This blocks some other test cases. You can run pre-commit this way:

flashinfer/CLAUDE.md

Line 20 in 31b63bc

| Run linting | `pre-commit run -a` |

Please let me know if you meet any question~

samuellees · 2026-03-30T05:13:21Z

/bot run

flashinfer-bot · 2026-03-30T05:14:01Z

GitLab MR !461 has been updated with latest changes, and the CI pipeline #47225408 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-30T09:22:56Z

[FAILED] Pipeline #47225408: 11/20 passed

he-yufeng requested review from bkryu, jimmyzho, kahyunnam, nv-yunzheq and yzh119 as code owners March 11, 2026 09:27

coderabbitai Bot reviewed Mar 11, 2026

View reviewed changes

Comment thread flashinfer/autotuner.py

gemini-code-assist Bot reviewed Mar 11, 2026

View reviewed changes

he-yufeng requested a review from IwakuraRein as a code owner March 12, 2026 06:34

nvpohanh mentioned this pull request Mar 17, 2026

[Tracking] Minimax-M2.5 (G)B200 Functional Support and Optimizations sgl-project/sglang#20022

Open

aleozlx added the op: moe label Mar 19, 2026

samuellees self-assigned this Mar 22, 2026

samuellees reviewed Mar 22, 2026

View reviewed changes

he-yufeng requested review from aleozlx, cyx-6, jiahanc, saltyminty, sricketts, yongwww and yyihuang as code owners March 23, 2026 11:05

samuellees approved these changes Mar 25, 2026

View reviewed changes

he-yufeng added 4 commits March 26, 2026 18:23

add None guard for topk_ids to match expert_weights check style

faf89bd

test: add smoke tests for autotuner with None input tensors

260ee5e

Cover the _prepare_input_tensors and choose_one paths when an optional tensor (e.g. routing_logits in non-routed MoE) is None, which previously caused AttributeError on .dtype/.shape.

he-yufeng force-pushed the fix/autotuner-none-tensor branch from 1f2473b to 260ee5e Compare March 26, 2026 10:25

Fix pre-commit: add type annotation for Optional tensors, reformat test

a6ad869

samuellees added the run-ci label Mar 30, 2026

samuellees enabled auto-merge (squash) March 30, 2026 12:53

samuellees merged commit a6796a4 into flashinfer-ai:main Mar 30, 2026
41 of 42 checks passed

coderabbitai Bot mentioned this pull request Mar 30, 2026

fix: Fix autotuner crash on meta-device tensor in trtllm_fp4_block_scale_routed_moe #2916

Merged

5 tasks

YibaiMeng mentioned this pull request Apr 28, 2026

fix: autotuner cache key mismatch for trtllm-gen FP8 block scale MoE and FP8 routed MoE #2640

Closed

5 tasks

	elif any(isinstance(d, DynamicDim) for d in p):
	elif isinstance(inputs[i], torch.Tensor) and any(isinstance(d, DynamicDim) for d in p):

Conversation

he-yufeng commented Mar 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

trevor-m commented Mar 11, 2026

Uh oh!

he-yufeng commented Mar 12, 2026

Uh oh!

trevor-m commented Mar 12, 2026

Uh oh!

samuellees left a comment

Choose a reason for hiding this comment

Uh oh!

samuellees Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

he-yufeng commented Mar 23, 2026

Uh oh!

samuellees commented Mar 24, 2026

Uh oh!

he-yufeng commented Mar 25, 2026

Uh oh!

samuellees left a comment

Choose a reason for hiding this comment

Uh oh!

samuellees commented Mar 25, 2026

Uh oh!

flashinfer-bot commented Mar 25, 2026

Uh oh!

flashinfer-bot commented Mar 25, 2026

Uh oh!

samuellees commented Mar 26, 2026

Uh oh!

flashinfer-bot commented Mar 26, 2026

Uh oh!

flashinfer-bot commented Mar 26, 2026

Uh oh!

samuellees commented Mar 26, 2026

Uh oh!

flashinfer-bot commented Mar 26, 2026

Uh oh!

flashinfer-bot commented Mar 27, 2026

Uh oh!

samuellees commented Mar 27, 2026

Uh oh!

flashinfer-bot commented Mar 27, 2026

Uh oh!

flashinfer-bot commented Mar 27, 2026

Uh oh!

samuellees commented Mar 27, 2026

he-yufeng commented Mar 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 11, 2026 •

edited

Loading

samuellees commented Mar 28, 2026 •

edited

Loading