Skip to content

HIP: use v_dot2_f32_f16 instruction for FA#15884

Merged
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:cuda-fma
Sep 9, 2025
Merged

HIP: use v_dot2_f32_f16 instruction for FA#15884
JohannesGaessler merged 1 commit intoggml-org:masterfrom
JohannesGaessler:cuda-fma

Conversation

@JohannesGaessler
Copy link
Copy Markdown
Contributor

See https://github.com/iacopPBK/llama.cpp-gfx906 . The fork uses an instruction for FP16 multiply-add with FP32 accumulation. This PR adopts the same instruction for the tile FA kernel.

GPU Model FlashAttention Microbatch size Test t/s fe1c92c t/s d91e76574 Speedup
MI60 / MI50 gemma 2B Q4_0 Yes 16 pp16384 329.07 629.31 1.91
MI60 / MI50 gemma 2B Q4_0 Yes 32 pp16384 309.59 728.92 2.35
MI60 / MI50 gemma 2B Q4_0 Yes 512 pp16384 397.50 1412.22 3.55
MI60 / MI50 llama 1B Q4_0 Yes 16 pp16384 682.84 922.76 1.35
MI60 / MI50 llama 1B Q4_0 Yes 32 pp16384 953.88 1187.68 1.25
MI60 / MI50 llama 1B Q4_0 Yes 512 pp16384 1510.71 2278.62 1.51
MI60 / MI50 llama 8B Q4_0 Yes 16 pp16384 193.82 278.56 1.44
MI60 / MI50 llama 8B Q4_0 Yes 32 pp16384 163.34 334.36 2.05
MI60 / MI50 llama 8B Q4_0 Yes 512 pp16384 204.03 504.45 2.47

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Sep 8, 2025
@JohannesGaessler JohannesGaessler merged commit 17bc5a8 into ggml-org:master Sep 9, 2025
48 checks passed
@mudler
Copy link
Copy Markdown
Contributor

mudler commented Sep 11, 2025

JFYI: according to my tests/CI, this seems to have broken hipblas compilation for gfx803 (at least, as the build stops there) #15936

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Sep 27, 2025
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Sep 29, 2025
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 4, 2025
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants