[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark#1626
Conversation
Add new benchmark config for DeepSeek-V4-Pro with DP-attention enabled on MI355X using ATOM. Uses image rocm/atom-dev:nightly_202605301523 with --enable-dp-attention and --gpu-memory-utilization 0.85. Concurrency range 64-1024. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Consolidate the DP-attention and non-DP search spaces under a single dsv4-fp4-mi355x-atom config key using the stable atom0.1.3 image - Delete the standalone dsv4_fp4_mi355x_atom_dp.sh benchmark script (DP-attention now handled by the shared glm5 script pattern) - Update glm5_fp8_mi355x_atom.sh to support DP_ATTENTION flag via PARALLEL_ARGS, enabling dp-attn and expert-parallel combinations - Update perf-changelog.yaml config-key and image reference accordingly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ript - dsv4_fp4_mi355x_atom.sh: replace EP string construction with PARALLEL_ARGS array pattern supporting DP_ATTENTION + EP_SIZE combos - glm5_fp8_mi355x_atom.sh: revert PARALLEL_ARGS back to simple -tp/$EP (glm5 does not use dp-attention) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206 |
2 similar comments
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26729909193 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206 |
… for prefix caching Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ff26684. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26744570641 |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…m/SemiAnalysisAI/InferenceX into seungrokj/dsv4-fp4-mi355x-atom-dp
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26756406305 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26756406305 |
|
@functionstackx can you approve this? |
|
/reuse-sweep-run |
#26383 (the DSv4 MTP graph fix) is on sglang main, not the amd/deepseek_v4 branch the rocm/sgl-dev:*-DSv4 images are cut from, so switch the MTP entry onto the mainline ROCm nightly lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260601 which carries it. Mainline omits deep_gemm; the recipe now detects that and routes the DSv4 fp8 wo_a / topk paths to their torch fallbacks (SGLANG_OPT_FP8_WO_A_GEMM=0, SGLANG_TOPK_TRANSFORM_512_TORCH=1, SGLANG_ENABLE_JIT_DEEPGEMM=0). No-op on a deep_gemm-bearing image. Resolve perf-changelog conflict: keep atom (#1626) and vllm-mtp (#1630) from main, update the sglang-mtp entry for the mainline image.

Summary
dsv4-fp4-mi355x-atom-dpfor DeepSeek-V4-Pro with DP-attention on MI355X using ATOMbenchmarks/single_node/dsv4_fp4_mi355x_atom_dp.shwith--enable-dp-attention --gpu-memory-utilization 0.85rocm/atom-dev:nightly_202605301523(ATOM upstream run 26690241645, 2026-05-30)Performance vs current InferenceX (dsv4-fp4-mi355x-atom, nightly_202605130853)
Test plan
dsv4_fp4_mi355x_atom_dp.shstarts atom server with--enable-dp-attention --gpu-memory-utilization 0.85dsv4-fp4-mi355x-atom-dpconfig picks up the new script🤖 Generated with Claude Code
Note
Low Risk
Benchmark and image-tag changes only; no application auth or production serving paths affected.
Overview
Extends the existing
dsv4-fp4-mi355x-atomMI355X ATOM benchmark for DeepSeek-V4-Pro FP4 with data-parallel attention, using the samedsv4_fp4_mi355x_atom.shlauncher rather than a separate config key.The container image moves from
rocm/atom-dev:nightly_202605130853torocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3. The search space is split into two bands per ISL: TP8, EP1, conc 1–64 without DP-attn, thendp-attn: truefrom conc 64–1024 (1k/1k) or 64–512 (8k/1k).The benchmark script now builds
PARALLEL_ARGSfrom matrixDP_ATTENTIONandEP_SIZE(--enable-dp-attention, and expert parallel when EP>1), and starts the server with--gpu-memory-utilization 0.85.perf-changelog.yamlrecords the update.Reviewed by Cursor Bugbot for commit bc53139. Bugbot is set up for automated code reviews on this repo. Configure here.