Change warning logs to debug for unimplemented MXFP4 Linear/Attention by mgoin · Pull Request #29441 · vllm-project/vllm

mgoin · 2025-11-25T20:08:45Z

Purpose

These warnings are too loud for the user and confusing since gpt-oss doesn't have these layers quantized

(Worker_TP0 pid=3982331) INFO 11-25 20:04:34 [gpu_model_runner.py:3377] Starting to load model openai/gpt-oss-120b...
(Worker_TP1 pid=3982332) WARNING 11-25 20:04:34 [mxfp4.py:199] MXFP4 linear layer is not implemented - falling back to UnquantizedLinearMethod.
(Worker_TP1 pid=3982332) WARNING 11-25 20:04:34 [mxfp4.py:211] MXFP4 attention layer is not implemented. Skipping quantization for this layer.
(Worker_TP0 pid=3982331) WARNING 11-25 20:04:34 [mxfp4.py:199] MXFP4 linear layer is not implemented - falling back to UnquantizedLinearMethod.
(Worker_TP0 pid=3982331) WARNING 11-25 20:04:34 [mxfp4.py:211] MXFP4 attention layer is not implemented. Skipping quantization for this layer.

Not sure why they were added in #27334 as unfinished

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Michael Goin <mgoin64@gmail.com>

gemini-code-assist

Code Review

This pull request aims to reduce log noise by changing warning messages about unimplemented MXFP4 layers to debug messages. While reducing log spam is a good goal, changing these specific warnings to debug messages could hide critical information from users. If a user expects a layer to be quantized and it silently falls back to an unquantized version, it could lead to unexpected behavior and difficult-to-debug performance or memory issues. I've suggested changing the log level to info instead of debug. This would make the messages less alarming than warnings but still ensure they are visible to users by default, providing a good balance between reducing noise and keeping users informed.

vllm/model_executor/layers/quantization/mxfp4.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vllm/model_executor/layers/quantization/mxfp4.py

yewentao256

Nice find, a small comment

vllm/model_executor/layers/quantization/mxfp4.py

Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>

yewentao256

LGTM, thanks for the work!

Signed-off-by: Michael Goin <mgoin64@gmail.com>

…vllm-project#29441) Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

…vllm-project#29441) Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Change warning logs to debug for unimplemented MXFP4 Linear/Attention

4b1a282

Signed-off-by: Michael Goin <mgoin64@gmail.com>

mgoin requested review from pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners November 25, 2025 20:08

gemini-code-assist bot reviewed Nov 25, 2025

View reviewed changes

vllm/model_executor/layers/quantization/mxfp4.py Show resolved Hide resolved

vllm/model_executor/layers/quantization/mxfp4.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 25, 2025

View reviewed changes

vllm/model_executor/layers/quantization/mxfp4.py Show resolved Hide resolved

yewentao256 reviewed Nov 25, 2025

View reviewed changes

vllm/model_executor/layers/quantization/mxfp4.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/quantization/mxfp4.py Outdated Show resolved Hide resolved

mgoin and others added 2 commits November 25, 2025 15:25

Update vllm/model_executor/layers/quantization/mxfp4.py

ba4d1fc

Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>

Update vllm/model_executor/layers/quantization/mxfp4.py

ad53d26

Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>

mgoin added ready ONLY add when PR is ready to merge/full CI is needed startup-ux labels Nov 25, 2025

yewentao256 approved these changes Nov 25, 2025

View reviewed changes

Fix ,

01e1316

Signed-off-by: Michael Goin <mgoin64@gmail.com>

yewentao256 enabled auto-merge (squash) November 25, 2025 20:46

yewentao256 merged commit 7df0289 into main Nov 25, 2025
56 checks passed

yewentao256 deleted the mxfp4-warn-debug branch November 25, 2025 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change warning logs to debug for unimplemented MXFP4 Linear/Attention#29441

Change warning logs to debug for unimplemented MXFP4 Linear/Attention#29441
yewentao256 merged 4 commits intomainfrom
mxfp4-warn-debug

mgoin commented Nov 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mgoin commented Nov 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgoin commented Nov 25, 2025 •

edited by github-actions bot

Loading