[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends by AndreasKaratzas · Pull Request #34599 · vllm-project/vllm

AndreasKaratzas · 2026-02-16T04:26:53Z

This PR fixes V1 Test others test flakiness on ROCm.

test_logprobs.py

test_spec_decode_logprobs was intermittently failing on ROCm due to logprob differences between the base and speculative decode LLM that were misattributed to spec decode itself, due to ROCm skinny GEMM non-determinism --- the wvSplitK kernels in gemm_kernels.cu use persistent workgroup scheduling and wave-level shuffle reductions that produce different results across LLM instantiations, even with identical configs and seeds.

The fix disables the skinny GEMM via VLLM_ROCM_USE_SKINNY_GEMM=0 for this test. Descriptive assertion messages are added for easier future triage.

Additional cleanup: converted standalone VllmRunner instantiations throughout the file to use context managers for proper resource cleanup.

test_tree_attention.py

Parametrizes test_tree_attn_correctness over all reference attention backends available on the current platform rather than hardcoding FLASH_ATTN. On ROCm this includes TRITON_ATTN and the platform default. Adds KV cache layout adaptation (flash <-> block) so backends with different cache layouts can be used as references. Documents known incompatibilities with ROCM_ATTN (paged layout) and ROCM_AITER_FA (head count mismatch) as TODOs.

…hroughout Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…ference backends Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request addresses test flakiness on ROCm by disabling skinny GEMM in logprob tests and parametrizing tree attention tests over available backends. The changes include converting VllmRunner to use context managers for better resource management and adding KV cache layout adaptation. While the overall direction is correct, there are a few issues in the test setup that could lead to crashes or unexpected failures depending on the environment.

gemini-code-assist · 2026-02-16T04:28:57Z

tests/v1/spec_decode/test_tree_attention.py

+    for backend in AttentionBackendEnum:
+        if backend.value is not None and backend.get_path() == backend_path:
+            return backend


This loop will raise a ValueError if it encounters an AttentionBackendEnum member with an empty string value (such as TORCH_SDPA). This is because get_path() explicitly checks for non-empty paths and raises an error otherwise. This could crash test collection on platforms where such backends are present.

for backend in AttentionBackendEnum: try: if backend.get_path() == backend_path: return backend except ValueError: continue

gemini-code-assist · 2026-02-16T04:28:57Z

tests/v1/spec_decode/test_tree_attention.py

+        backends: list[AttentionBackendEnum] = []
+
+        # 1. Whatever the platform would auto-select at runtime.
+        backends.append(_get_platform_default_backend())


The platform default backend should be filtered against known incompatible backends (as documented in the TODOs below) before being added to the reference backends list. If ROCM_AITER_FA or ROCM_ATTN is selected as the default by the platform (e.g., via environment variables like VLLM_ROCM_USE_AITER), the test will fail due to the documented incompatibilities.

default_backend = _get_platform_default_backend() if default_backend not in (AttentionBackendEnum.ROCM_AITER_FA, AttentionBackendEnum.ROCM_ATTN): backends.append(default_backend)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

SageMoore

Looks reasonable. One minor nit.

SageMoore · 2026-02-19T20:30:28Z

tests/v1/sample/test_logprobs.py

+    contention. Both use identical chunked prefill settings and eager
+    mode to control for infrastructure differences.
+
+    On ROCm, the custom skinny GEMM kernels are non-deterministic


Nit: I think one block comment describing why we are disabling skinny gemms is sufficient :).

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

dosubot · 2026-02-21T04:25:30Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: joezuo <qianzhou.zuo@gmail.com>

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas added 2 commits February 15, 2026 22:23

[ROCm][CI] Fix spec decode logprobs flakiness; use context managers t…

678a986

…hroughout Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[ROCm][CI] Parametrize tree attention test over platform-available re…

840855e

…ference backends Signed-off-by: Andreas Karatzas <akaratza@amd.com>

mergify bot added rocm Related to AMD ROCm speculative-decoding v1 labels Feb 16, 2026

github-project-automation bot added this to AMD Feb 16, 2026

github-project-automation bot moved this to Todo in AMD Feb 16, 2026

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

AndreasKaratzas added 2 commits February 15, 2026 22:40

Guard against incompatible backends in reference backend selection

c078128

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_v1_others

7c6ce3a

AndreasKaratzas mentioned this pull request Feb 16, 2026

[CI Failure]: mi325_1: V1 Test others #31631

Closed

3 tasks

Merge remote-tracking branch 'origin/main' into akaratza_v1_others

d390198

AndreasKaratzas mentioned this pull request Feb 19, 2026

[ROCm] Change default settings for ROCm #33271

Closed

Merge remote-tracking branch 'origin/main' into akaratza_v1_others

d184efc

SageMoore approved these changes Feb 19, 2026

View reviewed changes

Removed duplicate comment on skinny GEMMs

cc83f29

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas mentioned this pull request Feb 19, 2026

[ROCm][CI] Added MI325 mirrors #34923

Merged

gshtras enabled auto-merge (squash) February 20, 2026 16:00

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 20, 2026

Merge remote-tracking branch 'origin/main' into akaratza_v1_others

6fbd328

vllm-bot merged commit 54254f7 into vllm-project:main Feb 21, 2026
15 of 17 checks passed

github-project-automation bot moved this from Todo to Done in AMD Feb 21, 2026

AndreasKaratzas deleted the akaratza_v1_others branch February 21, 2026 04:26

DarkLight1337 pushed a commit to DarkLight1337/vllm that referenced this pull request Feb 21, 2026

[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree at…

65cf493

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Feb 22, 2026

[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree at…

59dfd84

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026

[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree at…

7993904

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree at…

a0b7490

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree at…

d85cace

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026

[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree at…

fef1894

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026

[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree at…

f64e4d2

…tention backends (vllm-project#34599) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends#34599

[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends#34599
vllm-bot merged 8 commits intovllm-project:mainfrom
ROCm:akaratza_v1_others

AndreasKaratzas commented Feb 16, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

SageMoore left a comment

Uh oh!

SageMoore Feb 19, 2026

Uh oh!

AndreasKaratzas Feb 19, 2026

Uh oh!

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

AndreasKaratzas commented Feb 16, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

SageMoore left a comment

Choose a reason for hiding this comment

Uh oh!

SageMoore Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AndreasKaratzas commented Feb 16, 2026 •

edited by github-actions bot

Loading