HIP: use v_dot2_f32_f16 instruction for FA by JohannesGaessler · Pull Request #15884 · ggml-org/llama.cpp

JohannesGaessler · 2025-09-08T23:07:25Z

See https://github.com/iacopPBK/llama.cpp-gfx906 . The fork uses an instruction for FP16 multiply-add with FP32 accumulation. This PR adopts the same instruction for the tile FA kernel.

GPU	Model	FlashAttention	Microbatch size	Test	t/s `fe1c92c`	t/s d91e76574	Speedup
MI60 / MI50	gemma 2B Q4_0	Yes	16	pp16384	329.07	629.31	1.91
MI60 / MI50	gemma 2B Q4_0	Yes	32	pp16384	309.59	728.92	2.35
MI60 / MI50	gemma 2B Q4_0	Yes	512	pp16384	397.50	1412.22	3.55
MI60 / MI50	llama 1B Q4_0	Yes	16	pp16384	682.84	922.76	1.35
MI60 / MI50	llama 1B Q4_0	Yes	32	pp16384	953.88	1187.68	1.25
MI60 / MI50	llama 1B Q4_0	Yes	512	pp16384	1510.71	2278.62	1.51
MI60 / MI50	llama 8B Q4_0	Yes	16	pp16384	193.82	278.56	1.44
MI60 / MI50	llama 8B Q4_0	Yes	32	pp16384	163.34	334.36	2.05
MI60 / MI50	llama 8B Q4_0	Yes	512	pp16384	204.03	504.45	2.47

mudler · 2025-09-11T09:16:10Z

JFYI: according to my tests/CI, this seems to have broken hipblas compilation for gfx803 (at least, as the build stops there) #15936

This reverts commit 17bc5a8.

HIP: use v_dot2_f32_f16 instruction for FA

2430b31

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Sep 8, 2025

slaren approved these changes Sep 9, 2025

View reviewed changes

JohannesGaessler merged commit 17bc5a8 into ggml-org:master Sep 9, 2025
48 checks passed

mudler mentioned this pull request Sep 11, 2025

Compile bug: Failing to compile with hipblas and gfx803 #15936

Closed

LunNova mentioned this pull request Sep 15, 2025

rocmPackages: 6.3.3 -> 6.4.3 NixOS/nixpkgs#427944

Merged

18 tasks

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Sep 27, 2025

Revert "HIP: use v_dot2_f32_f16 instruction for FA (ggml-org#15884)"

4557258

This reverts commit 17bc5a8.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Sep 29, 2025

Revert "HIP: use v_dot2_f32_f16 instruction for FA (ggml-org#15884)"

c4588ac

This reverts commit 17bc5a8.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 4, 2025

Revert "HIP: use v_dot2_f32_f16 instruction for FA (ggml-org#15884)"

8b3774b

This reverts commit 17bc5a8.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 26, 2025

Revert "HIP: use v_dot2_f32_f16 instruction for FA (ggml-org#15884)"

fae9f89

This reverts commit 17bc5a8.

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

HIP: use v_dot2_f32_f16 instruction for FA (#15884)

fc2610e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIP: use v_dot2_f32_f16 instruction for FA#15884

HIP: use v_dot2_f32_f16 instruction for FA#15884
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:cuda-fma

JohannesGaessler commented Sep 8, 2025

Uh oh!

Uh oh!

mudler commented Sep 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JohannesGaessler commented Sep 8, 2025

Uh oh!

Uh oh!

mudler commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mudler commented Sep 11, 2025 •

edited

Loading