Skip to content

Add a workaround for compilation with ROCWMMA_FATTN and gfx9#19461

Merged
JohannesGaessler merged 1 commit intoggml-org:masterfrom
superm1:superm1/rocwmma-workaround
Feb 12, 2026
Merged

Add a workaround for compilation with ROCWMMA_FATTN and gfx9#19461
JohannesGaessler merged 1 commit intoggml-org:masterfrom
superm1:superm1/rocwmma-workaround

Conversation

@superm1
Copy link
Contributor

@superm1 superm1 commented Feb 9, 2026

There is an upstream problem [1] with AMD's LLVM 22 fork and rocWMMA 2.2.0 causing compilation issues on devices without native fp16 support (CDNA devices).

The specialized types aren't resolved properly:

/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;

Add a workaround to explicitly declare the types and cast when compiling with HIP and ROCWMMA_FATTN [2]. When this is actually fixed upstream some guards can be used to detect and wrap the version that has the fix to only apply when necessary.

Link: ROCm/rocm-libraries#4398 [1]
Link: #19269 [2]

CC @IMbackK

There is an upstream problem [1] with AMD's LLVM 22 fork and
rocWMMA 2.2.0 causing compilation issues on devices without
native fp16 support (CDNA devices).

The specialized types aren't resolved properly:
```
/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
```

Add a workaround to explicitly declare the types and cast when
compiling with HIP and ROCWMMA_FATTN [2].  When this is actually
fixed upstream some guards can be used to detect and wrap the
version that has the fix to only apply when necessary.

Link: ROCm/rocm-libraries#4398 [1]
Link: ggml-org#19269 [2]
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Copy link
Collaborator

@IMbackK IMbackK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, while i dont really like it, this is the best solution available to us.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 9, 2026
Copy link
Contributor

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI in terms of my current priorities I'll take a crack at better AMD WMMA/MFMA support in the MMA kernel once I'm done with tensor parallelism. So hopefully rocWMMA can soon be removed as a dependency anyways.

@superm1
Copy link
Contributor Author

superm1 commented Feb 11, 2026

It looks like CI passed, can this be merged so my other ones can do test builds now?

@JohannesGaessler JohannesGaessler merged commit 6845f7f into ggml-org:master Feb 12, 2026
78 checks passed
superm1 added a commit to superm1/llama.cpp that referenced this pull request Feb 13, 2026
Avoids issues with ROCm 6.4.4.

Closes: ggml-org#19580
Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)")
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
superm1 added a commit to superm1/llama.cpp that referenced this pull request Feb 13, 2026
Avoids issues with ROCm 6.4.4.

Closes: ggml-org#19580
Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)")
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
JohannesGaessler pushed a commit that referenced this pull request Feb 16, 2026
…#19591)

Avoids issues with ROCm 6.4.4.

Closes: #19580
Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (#19461)")

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
…g#19461)

There is an upstream problem [1] with AMD's LLVM 22 fork and
rocWMMA 2.2.0 causing compilation issues on devices without
native fp16 support (CDNA devices).

The specialized types aren't resolved properly:
```
/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
```

Add a workaround to explicitly declare the types and cast when
compiling with HIP and ROCWMMA_FATTN [2].  When this is actually
fixed upstream some guards can be used to detect and wrap the
version that has the fix to only apply when necessary.

Link: ROCm/rocm-libraries#4398 [1]
Link: ggml-org#19269 [2]

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
…ggml-org#19591)

Avoids issues with ROCm 6.4.4.

Closes: ggml-org#19580
Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)")

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
…g#19461)

There is an upstream problem [1] with AMD's LLVM 22 fork and
rocWMMA 2.2.0 causing compilation issues on devices without
native fp16 support (CDNA devices).

The specialized types aren't resolved properly:
```
/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
```

Add a workaround to explicitly declare the types and cast when
compiling with HIP and ROCWMMA_FATTN [2].  When this is actually
fixed upstream some guards can be used to detect and wrap the
version that has the fix to only apply when necessary.

Link: ROCm/rocm-libraries#4398 [1]
Link: ggml-org#19269 [2]

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
…ggml-org#19591)

Avoids issues with ROCm 6.4.4.

Closes: ggml-org#19580
Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)")

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
…g#19461)

There is an upstream problem [1] with AMD's LLVM 22 fork and
rocWMMA 2.2.0 causing compilation issues on devices without
native fp16 support (CDNA devices).

The specialized types aren't resolved properly:
```
/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
```

Add a workaround to explicitly declare the types and cast when
compiling with HIP and ROCWMMA_FATTN [2].  When this is actually
fixed upstream some guards can be used to detect and wrap the
version that has the fix to only apply when necessary.

Link: ROCm/rocm-libraries#4398 [1]
Link: ggml-org#19269 [2]

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
…ggml-org#19591)

Avoids issues with ROCm 6.4.4.

Closes: ggml-org#19580
Fixes: 6845f7f ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (ggml-org#19461)")

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants