[amd][gptoss] Perf gain because of block alignment by smitkadvani · Pull Request #28024 · vllm-project/vllm

smitkadvani · 2025-11-04T06:00:11Z

Summary:
Signed-off-by: Smit Kadvani smit.kadvani@gmail.com

Summary:

Following patch is from Aliasger Zaidy(azaid) and Shucai Xiao(scxiao) from AMD, and overall efforts for the integration is guided by Xiaozhu Meng(mxz297) from Meta, it boosts the performance for fused_moe kernel.
We pad to 128 for MI300 to avoid masked loads.
We pad to 256 for MI355 because we use scale preshuffling on 355 and padding to 256 is needed to enable correct preshuffle arrangement

10% Performance boost is achieved for gptoss120b on AMD mi300 machine.

Test Plan:

No eval regression is observed.

Eval on aime25

with patch

Effort Level	Score	Characters	Chars Std	Score Std
Low	0.51	1577.26	1001.32	0.49
Medium	0.79	1991.975	785.68	0.40
High	0.916	2568	1029.9	0.28

without patch

Effort Level	Score	Characters	Chars Std	Score Std
Low	0.51	1570.26	1001.32	0.49
Medium	0.79	1990.975	780.68	0.40
High	0.916	2508	1020.9	0.28

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

vllm/model_executor/layers/quantization/utils/mxfp4_utils.py

gemini-code-assist

Code Review

This pull request introduces a performance optimization for the fused_moe kernel on AMD GPUs by dynamically setting the padding alignment based on the GPU architecture. The changes replace a hardcoded padding value with a function that queries the hardware, which should improve performance as described. My review identifies a critical issue where the new utility function could cause a runtime crash if the optional triton package is not installed. I've provided a suggestion to make the code more robust by adding a check for Triton's availability.

vllm/model_executor/layers/quantization/utils/mxfp4_utils.py

Summary: Following patch is from Aliasger Zaidy(azaid) and Shucai Xiao(scxiao) from AMD, and overall efforts for the integration is guided by Xiaozhu Meng(mxz297) from Meta, it boosts the performance for fused_moe kernel. We pad to 128 for MI300 to avoid masked loads. We pad to 256 for MI355 because we use scale preshuffling on 355 and padding to 256 is needed to enable correct preshuffle arrangement 10% Performance boost is achieved for gptoss120b on AMD mi300 machine. Test Plan: No eval regression is observed. | Effort Level | Score | Characters | Chars Std | Score Std | |--------------|-------|------------|-----------|-----------| | Low | 0.51 | 1577.26 | 1001.32 | 0.49 | | Medium | 0.79 | 1991.975 | 785.68 | 0.40 | | High | 0.916 | 2568 | 1029.9 | 0.28 | | Effort Level | Score | Characters | Chars Std | Score Std | |--------------|-------|------------|-----------|-----------| | Low | 0.51 | 1570.26 | 1001.32 | 0.49 | | Medium | 0.79 | 1990.975 | 780.68 | 0.40 | | High | 0.916 | 2508 | 1020.9 | 0.28 | Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com>

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com>

heheda12345 · 2025-11-05T07:48:11Z

vllm/model_executor/layers/quantization/utils/mxfp4_utils.py

+def get_padding_alignment():
+    return (
+        256
+        if triton.runtime.driver.active.get_current_target().arch in ("gfx950",)


The default value was 256. Will it be safer to only update MI300's alignment to 128? Or do you think 128 will be faster on other architectures?

256 is needed to enable correct preshuffle arrangement and scale pre-shuffling only used in MI355, that's why i believe 128 will be faster on other architectures.

HAIAI

@smitkadvani LGTM, thanks!

heheda12345

LGTM!

Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com> Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>

smitkadvani requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners November 4, 2025 06:00

chatgpt-codex-connector bot reviewed Nov 4, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/mxfp4_utils.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 4, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/mxfp4_utils.py Outdated Show resolved Hide resolved

smitkadvani force-pushed the export-D84643814 branch 2 times, most recently from 5a537a4 to 85f3030 Compare November 4, 2025 06:12

smitkadvani changed the title ~~Perf gain because of block alignment~~ [amd][gptoss] Perf gain because of block alignment Nov 4, 2025

mergify bot added gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm labels Nov 4, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Nov 4, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Nov 4, 2025

smitkadvani force-pushed the export-D84643814 branch from 0113001 to ec4af3f Compare November 4, 2025 06:55

mergify bot added the v1 label Nov 4, 2025

pre-commit fixed

b94053d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com>

smitkadvani force-pushed the export-D84643814 branch from ec4af3f to b94053d Compare November 4, 2025 06:56

heheda12345 reviewed Nov 5, 2025

View reviewed changes

HAIAI approved these changes Nov 5, 2025

View reviewed changes

heheda12345 approved these changes Nov 5, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Nov 5, 2025

heheda12345 enabled auto-merge (squash) November 5, 2025 21:40

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 5, 2025

Merge branch 'main' into export-D84643814

1b18831

heheda12345 merged commit 11fd69d into vllm-project:main Nov 7, 2025
53 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Nov 7, 2025

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[amd][gptoss] Perf gain because of block alignment (vllm-project#28024)

f5d6915

Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com> Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>

Rohan138 mentioned this pull request Jan 14, 2026

fix pad_align for gfx942 #32307

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[amd][gptoss] Perf gain because of block alignment#28024

[amd][gptoss] Perf gain because of block alignment#28024
heheda12345 merged 3 commits intovllm-project:mainfrom
smitkadvani:export-D84643814

smitkadvani commented Nov 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

heheda12345 Nov 5, 2025

Uh oh!

smitkadvani Nov 5, 2025

Uh oh!

HAIAI left a comment

Uh oh!

heheda12345 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

smitkadvani commented Nov 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

Test Plan:

Eval on aime25

with patch

without patch

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

heheda12345 Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

smitkadvani Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

HAIAI left a comment

Choose a reason for hiding this comment

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

smitkadvani commented Nov 4, 2025 •

edited by github-actions bot

Loading