[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark by seungrokj · Pull Request #1626 · SemiAnalysisAI/InferenceX

seungrokj · 2026-05-31T03:30:50Z

Summary

Add new benchmark config dsv4-fp4-mi355x-atom-dp for DeepSeek-V4-Pro with DP-attention on MI355X using ATOM
Add new script benchmarks/single_node/dsv4_fp4_mi355x_atom_dp.sh with --enable-dp-attention --gpu-memory-utilization 0.85
Image: rocm/atom-dev:nightly_202605301523 (ATOM upstream run 26690241645, 2026-05-30)
Concurrency range: 64–1024 for both ISL 1024 and 8192

Performance vs current InferenceX (dsv4-fp4-mi355x-atom, nightly_202605130853)

ISL	OSL	Conc	InferenceX (tok/s/GPU)	ATOM DP (tok/s/GPU)	Δ%
1024	1024	64	389.30	443.01	+13.8%
1024	1024	128	601.21	774.50	+28.8%
1024	1024	256	880.78	1322.72	+50.2%
1024	1024	512	—	2028.30	—
1024	1024	1024	—	2984.23	—
8192	1024	64	1162.87	1505.66	+29.5%
8192	1024	128	1469.89	2366.74	+61.0%
8192	1024	256	704.73	3404.86	+383.1%
8192	1024	512	—	4196.99	—

Test plan

Verify dsv4_fp4_mi355x_atom_dp.sh starts atom server with --enable-dp-attention --gpu-memory-utilization 0.85
Confirm dsv4-fp4-mi355x-atom-dp config picks up the new script
Run benchmark at conc=64 and conc=256 to confirm throughput matches upstream numbers

🤖 Generated with Claude Code

Note

Low Risk
Benchmark and image-tag changes only; no application auth or production serving paths affected.

Overview
Extends the existing dsv4-fp4-mi355x-atom MI355X ATOM benchmark for DeepSeek-V4-Pro FP4 with data-parallel attention, using the same dsv4_fp4_mi355x_atom.sh launcher rather than a separate config key.

The container image moves from rocm/atom-dev:nightly_202605130853 to rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3. The search space is split into two bands per ISL: TP8, EP1, conc 1–64 without DP-attn, then dp-attn: true from conc 64–1024 (1k/1k) or 64–512 (8k/1k).

The benchmark script now builds PARALLEL_ARGS from matrix DP_ATTENTION and EP_SIZE (--enable-dp-attention, and expert parallel when EP>1), and starts the server with --gpu-memory-utilization 0.85. perf-changelog.yaml records the update.

^{Reviewed by Cursor Bugbot for commit bc53139. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add new benchmark config for DeepSeek-V4-Pro with DP-attention enabled on MI355X using ATOM. Uses image rocm/atom-dev:nightly_202605301523 with --enable-dp-attention and --gpu-memory-utilization 0.85. Concurrency range 64-1024. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-31T03:30:57Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-31T03:30:57Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-31T03:30:57Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Consolidate the DP-attention and non-DP search spaces under a single dsv4-fp4-mi355x-atom config key using the stable atom0.1.3 image - Delete the standalone dsv4_fp4_mi355x_atom_dp.sh benchmark script (DP-attention now handled by the shared glm5 script pattern) - Update glm5_fp8_mi355x_atom.sh to support DP_ATTENTION flag via PARALLEL_ARGS, enabling dp-attn and expert-parallel combinations - Update perf-changelog.yaml config-key and image reference accordingly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ript - dsv4_fp4_mi355x_atom.sh: replace EP string construction with PARALLEL_ARGS array pattern supporting DP_ATTENTION + EP_SIZE combos - glm5_fp8_mi355x_atom.sh: revert PARALLEL_ARGS back to simple -tp/$EP (glm5 does not use dp-attention) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-05-31T09:03:44Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26703436206

github-actions · 2026-05-31T13:51:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26703436206

github-actions · 2026-06-01T00:09:04Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26703436206

github-actions · 2026-06-01T01:27:51Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26729909193
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26729909193

github-actions · 2026-06-01T04:38:10Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26703436206

github-actions · 2026-06-01T06:14:22Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26703436206

github-actions · 2026-06-01T08:12:25Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26703436206

… for prefix caching Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ff26684. Configure here.}

github-actions · 2026-06-01T12:56:17Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26744570641
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26744570641

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…m/SemiAnalysisAI/InferenceX into seungrokj/dsv4-fp4-mi355x-atom-dp

github-actions · 2026-06-01T15:35:23Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26756406305
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26756406305

github-actions · 2026-06-01T16:03:46Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26756406305
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26756406305

seungrokj · 2026-06-01T16:19:24Z

@functionstackx can you approve this?

functionstackx · 2026-06-01T19:49:52Z

/reuse-sweep-run

#26383 (the DSv4 MTP graph fix) is on sglang main, not the amd/deepseek_v4 branch the rocm/sgl-dev:*-DSv4 images are cut from, so switch the MTP entry onto the mainline ROCm nightly lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260601 which carries it. Mainline omits deep_gemm; the recipe now detects that and routes the DSv4 fp8 wo_a / topk paths to their torch fallbacks (SGLANG_OPT_FP8_WO_A_GEMM=0, SGLANG_TOPK_TRANSFORM_512_TORCH=1, SGLANG_ENABLE_JIT_DEEPGEMM=0). No-op on a deep_gemm-bearing image. Resolve perf-changelog conflict: keep atom (#1626) and vllm-mtp (#1630) from main, update the sglang-mtp entry for the mainline image.

seungrokj requested a review from a team May 31, 2026 03:30

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners May 31, 2026 03:30

github-project-automation Bot added this to InferenceMAX Board May 31, 2026

perf-changelog: add pr-link for dsv4-fp4-mi355x-atom-dp

8b0bdf7

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread .github/configs/amd-master.yaml Outdated

claude Bot reviewed May 31, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

seungrokj changed the title ~~Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark~~ ]AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark May 31, 2026

seungrokj changed the title ~~]AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark~~ [AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark May 31, 2026

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread .github/configs/amd-master.yaml Outdated

seungrokj and others added 3 commits May 31, 2026 12:56

glm5_fp8_mi355x_atom.sh: remove trailing whitespace

030ede7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

glm5_fp8_mi355x_atom.sh: use -tp instead of --tp

51f749f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread benchmarks/single_node/dsv4_fp4_mi355x_atom.sh

seungrokj added the AMD label May 31, 2026

functionstackx added the full-sweep-enabled label May 31, 2026

Merge branch 'main' into seungrokj/dsv4-fp4-mi355x-atom-dp

dad70c1

[AMD] fix dsv4-fp4-mi355x-atom: reduce concurrency limit and add TODO…

ff26684

… for prefix caching Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread benchmarks/single_node/dsv4_fp4_mi355x_atom.sh

Merge branch 'main' into seungrokj/dsv4-fp4-mi355x-atom-dp

9dac269

seungrokj and others added 2 commits June 1, 2026 21:57

[AMD] dsv4-fp4-mi355x-atom: set gpu-memory-utilization to 0.85

4d5218c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'seungrokj/dsv4-fp4-mi355x-atom-dp' of https://github.co…

bc53139

…m/SemiAnalysisAI/InferenceX into seungrokj/dsv4-fp4-mi355x-atom-dp

functionstackx merged commit 99008ef into main Jun 1, 2026
89 of 91 checks passed

functionstackx deleted the seungrokj/dsv4-fp4-mi355x-atom-dp branch June 1, 2026 19:49

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 1, 2026

Conversation

seungrokj commented May 31, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance vs current InferenceX (dsv4-fp4-mi355x-atom, nightly_202605130853)

Test plan

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

seungrokj commented Jun 1, 2026

Uh oh!

functionstackx commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seungrokj commented May 31, 2026 •

edited by cursor Bot

Loading