Add a workaround for compilation with ROCWMMA_FATTN and gfx9 by superm1 · Pull Request #19461 · ggml-org/llama.cpp

superm1 · 2026-02-09T14:52:25Z

There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices).

The specialized types aren't resolved properly:

/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;

Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary.

Link: ROCm/rocm-libraries#4398 [1]
Link: #19269 [2]

CC @IMbackK

There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices). The specialized types aren't resolved properly: ``` /opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>' 2549 | using ARegsT = typename Impl::ARegsT; ``` Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary. Link: ROCm/rocm-libraries#4398 [1] Link: ggml-org#19269 [2] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>

IMbackK

Looks good, while i dont really like it, this is the best solution available to us.

JohannesGaessler

FYI in terms of my current priorities I'll take a crack at better AMD WMMA/MFMA support in the MMA kernel once I'm done with tensor parallelism. So hopefully rocWMMA can soon be removed as a dependency anyways.

superm1 · 2026-02-11T12:38:58Z

It looks like CI passed, can this be merged so my other ones can do test builds now?

Avoids issues with ROCm 6.4.4. Closes: ggml-org#19580 Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

…#19591) Avoids issues with ROCm 6.4.4. Closes: #19580 Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

…g#19461) There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices). The specialized types aren't resolved properly: ``` /opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>' 2549 | using ARegsT = typename Impl::ARegsT; ``` Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary. Link: ROCm/rocm-libraries#4398 [1] Link: ggml-org#19269 [2] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>

…ggml-org#19591) Avoids issues with ROCm 6.4.4. Closes: ggml-org#19580 Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

…g#19461) There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices). The specialized types aren't resolved properly: ``` /opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>' 2549 | using ARegsT = typename Impl::ARegsT; ``` Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary. Link: ROCm/rocm-libraries#4398 [1] Link: ggml-org#19269 [2] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>

…ggml-org#19591) Avoids issues with ROCm 6.4.4. Closes: ggml-org#19580 Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

…g#19461) There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices). The specialized types aren't resolved properly: ``` /opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>' 2549 | using ARegsT = typename Impl::ARegsT; ``` Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary. Link: ROCm/rocm-libraries#4398 [1] Link: ggml-org#19269 [2] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>

…ggml-org#19591) Avoids issues with ROCm 6.4.4. Closes: ggml-org#19580 Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

This was referenced Feb 9, 2026

Update ROCm docker container to 7.2 #19418

Merged

Add a build target to generate ROCm artifacts using ROCm 7.2 #19433

Merged

IMbackK approved these changes Feb 9, 2026

View reviewed changes

IMbackK requested a review from JohannesGaessler February 9, 2026 18:13

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 9, 2026

JohannesGaessler approved these changes Feb 9, 2026

View reviewed changes

JohannesGaessler merged commit 6845f7f into ggml-org:master Feb 12, 2026
78 checks passed

IMbackK mentioned this pull request Feb 12, 2026

Compile bug: ROCm 7.2 + rocwmma #19269

Closed

CISC mentioned this pull request Feb 13, 2026

Compile bug: ROCm - error: no matching function for call to 'fill_fragment' #19580

Closed

superm1 mentioned this pull request Feb 13, 2026

Adjust workaround for ROCWMMA_FATTN/GFX9 to only newer ROCm veresions #19591

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a workaround for compilation with ROCWMMA_FATTN and gfx9#19461

Add a workaround for compilation with ROCWMMA_FATTN and gfx9#19461
JohannesGaessler merged 1 commit intoggml-org:masterfrom
superm1:superm1/rocwmma-workaround

superm1 commented Feb 9, 2026

Uh oh!

IMbackK left a comment •

edited

Loading

Uh oh!

JohannesGaessler left a comment

Uh oh!

superm1 commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

superm1 commented Feb 9, 2026

Uh oh!

IMbackK left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

superm1 commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IMbackK left a comment •

edited

Loading