Add a workaround for compilation with ROCWMMA_FATTN and gfx9#19461
Merged
JohannesGaessler merged 1 commit intoggml-org:masterfrom Feb 12, 2026
Merged
Add a workaround for compilation with ROCWMMA_FATTN and gfx9#19461JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler merged 1 commit intoggml-org:masterfrom
Conversation
There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices). The specialized types aren't resolved properly: ``` /opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>' 2549 | using ARegsT = typename Impl::ARegsT; ``` Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary. Link: ROCm/rocm-libraries#4398 [1] Link: ggml-org#19269 [2] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
This was referenced Feb 9, 2026
IMbackK
approved these changes
Feb 9, 2026
JohannesGaessler
approved these changes
Feb 9, 2026
Contributor
JohannesGaessler
left a comment
There was a problem hiding this comment.
FYI in terms of my current priorities I'll take a crack at better AMD WMMA/MFMA support in the MMA kernel once I'm done with tensor parallelism. So hopefully rocWMMA can soon be removed as a dependency anyways.
Contributor
Author
|
It looks like CI passed, can this be merged so my other ones can do test builds now? |
superm1
added a commit
to superm1/llama.cpp
that referenced
this pull request
Feb 13, 2026
Avoids issues with ROCm 6.4.4. Closes: ggml-org#19580 Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
superm1
added a commit
to superm1/llama.cpp
that referenced
this pull request
Feb 13, 2026
Avoids issues with ROCm 6.4.4. Closes: ggml-org#19580 Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
JohannesGaessler
pushed a commit
that referenced
this pull request
Feb 16, 2026
liparetejas
pushed a commit
to liparetejas/llama.cpp
that referenced
this pull request
Feb 23, 2026
…g#19461) There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices). The specialized types aren't resolved properly: ``` /opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>' 2549 | using ARegsT = typename Impl::ARegsT; ``` Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary. Link: ROCm/rocm-libraries#4398 [1] Link: ggml-org#19269 [2] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
liparetejas
pushed a commit
to liparetejas/llama.cpp
that referenced
this pull request
Feb 23, 2026
…ggml-org#19591) Avoids issues with ROCm 6.4.4. Closes: ggml-org#19580 Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
bartowski1182
pushed a commit
to bartowski1182/llama.cpp
that referenced
this pull request
Mar 2, 2026
…g#19461) There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices). The specialized types aren't resolved properly: ``` /opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>' 2549 | using ARegsT = typename Impl::ARegsT; ``` Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary. Link: ROCm/rocm-libraries#4398 [1] Link: ggml-org#19269 [2] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
bartowski1182
pushed a commit
to bartowski1182/llama.cpp
that referenced
this pull request
Mar 2, 2026
…ggml-org#19591) Avoids issues with ROCm 6.4.4. Closes: ggml-org#19580 Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
ArberSephirotheca
pushed a commit
to ArberSephirotheca/llama.cpp
that referenced
this pull request
Mar 3, 2026
…g#19461) There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices). The specialized types aren't resolved properly: ``` /opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>' 2549 | using ARegsT = typename Impl::ARegsT; ``` Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary. Link: ROCm/rocm-libraries#4398 [1] Link: ggml-org#19269 [2] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
ArberSephirotheca
pushed a commit
to ArberSephirotheca/llama.cpp
that referenced
this pull request
Mar 3, 2026
…ggml-org#19591) Avoids issues with ROCm 6.4.4. Closes: ggml-org#19580 Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices).
The specialized types aren't resolved properly:
Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary.
Link: ROCm/rocm-libraries#4398 [1]
Link: #19269 [2]
CC @IMbackK