Skip to content

HIP: enable WMMA-MMQ INT kernels for RDNA 3#17576

Merged
JohannesGaessler merged 5 commits intoggml-org:masterfrom
jiachengjason:feat/jiachengjason/enable_mmq_kernels_for_RDNA3
Dec 5, 2025
Merged

HIP: enable WMMA-MMQ INT kernels for RDNA 3#17576
JohannesGaessler merged 5 commits intoggml-org:masterfrom
jiachengjason:feat/jiachengjason/enable_mmq_kernels_for_RDNA3

Conversation

@jiachengjason
Copy link
Contributor

@jiachengjason jiachengjason commented Nov 28, 2025

Enabled WMMA-MMQ INT kernels for RDNA 3 architecture on AMD GPUs

Following similar approach to #17156

Using ./build/bin/llama-bench to collect the following performance results

Build command for the following performance results:
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S . -B build -DGGML_HIP=ON -DGGML_CUDA_FORCE_MMQ=OFF -DGGML_HIP_UMA=OFF -DGGML_HIP_ROCWMMA_FATTN=OFF -DGPU_TARGETS="gfx1100" -DGGML_HIP_GRAPHS=OFF -DLLAMA_CURL=OFF -DGGML_CUDA_FORCE_CUBLAS=OFF -DCMAKE_BUILD_TYPE=Release && cmake --build build --config Release -- -j 32

Popular models performance result for AMD Radeon AI PRO W7900 (gfx1100) image image
Popular models performance result for AMD Strix Halo (gfx1151) image image
All quantization performance result for AMD Radeon AI PRO W7900 (gfx1100)
GPU Model Microbatch size Test t/s 583cb83 t/s mmq_feature_branch Speedup
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 1 pp2048 119.44 119.96 1.00
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 2 pp2048 196.73 197.43 1.00
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 4 pp2048 303.89 304.92 1.00
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 8 pp2048 445.70 507.12 1.14
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 16 pp2048 634.63 804.82 1.27
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 32 pp2048 886.68 1027.74 1.16
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 64 pp2048 981.10 1445.47 1.47
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 128 pp2048 1510.05 1679.38 1.11
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 256 pp2048 2047.07 2016.22 0.98
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 512 pp2048 2012.02 2079.92 1.03
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 1024 pp2048 1945.28 2112.55 1.09
PRO W7900 Dual Slot llama 8B IQ1_S - 1.5625 bpw 2048 pp2048 1976.88 2072.28 1.05
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 1 pp2048 92.63 92.72 1.00
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 2 pp2048 156.95 156.95 1.00
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 4 pp2048 254.64 254.45 1.00
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 8 pp2048 383.33 426.76 1.11
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 16 pp2048 535.59 458.25 0.86
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 32 pp2048 783.94 906.32 1.16
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 64 pp2048 945.33 1373.64 1.45
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 128 pp2048 1490.37 1465.06 0.98
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 256 pp2048 2016.36 1728.85 0.86
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 512 pp2048 1986.67 1824.80 0.92
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 1024 pp2048 1929.90 1853.21 0.96
PRO W7900 Dual Slot llama 8B IQ2_S - 2.5 bpw 2048 pp2048 1953.60 1808.53 0.93
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 1 pp2048 95.74 95.98 1.00
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 2 pp2048 160.46 160.64 1.00
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 4 pp2048 255.90 256.54 1.00
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 8 pp2048 378.20 421.88 1.12
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 16 pp2048 547.54 453.34 0.83
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 32 pp2048 777.44 963.99 1.24
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 64 pp2048 948.91 1345.01 1.42
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 128 pp2048 1478.42 1429.39 0.97
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 256 pp2048 2027.36 1694.89 0.84
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 512 pp2048 1993.90 1788.80 0.90
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 1024 pp2048 1942.02 1827.91 0.94
PRO W7900 Dual Slot llama 8B IQ2_XS - 2.3125 bpw 2048 pp2048 1974.40 1799.10 0.91
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 1 pp2048 79.60 79.68 1.00
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 2 pp2048 139.88 139.72 1.00
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 4 pp2048 235.12 235.01 1.00
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 8 pp2048 350.25 385.55 1.10
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 16 pp2048 525.61 639.74 1.22
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 32 pp2048 761.19 671.28 0.88
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 64 pp2048 962.85 1338.24 1.39
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 128 pp2048 1492.21 1716.92 1.15
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 256 pp2048 2033.28 1989.56 0.98
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 512 pp2048 1997.97 2053.91 1.03
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 1024 pp2048 1940.79 2083.00 1.07
PRO W7900 Dual Slot llama 8B IQ2_XXS - 2.0625 bpw 2048 pp2048 1975.83 2040.61 1.03
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 1 pp2048 75.52 75.55 1.00
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 2 pp2048 134.05 134.03 1.00
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 4 pp2048 229.54 228.94 1.00
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 8 pp2048 366.06 404.29 1.10
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 16 pp2048 494.08 526.54 1.07
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 32 pp2048 757.72 587.11 0.77
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 64 pp2048 908.26 1357.06 1.49
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 128 pp2048 1452.44 1707.32 1.18
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 256 pp2048 2006.58 2000.21 1.00
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 512 pp2048 1974.82 2058.28 1.04
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 1024 pp2048 1924.67 2075.83 1.08
PRO W7900 Dual Slot llama 8B IQ3_S - 3.4375 bpw 2048 pp2048 1952.66 2011.09 1.03
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 1 pp2048 74.91 74.97 1.00
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 2 pp2048 131.88 131.83 1.00
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 4 pp2048 221.86 221.47 1.00
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 8 pp2048 344.87 379.84 1.10
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 16 pp2048 504.66 554.38 1.10
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 32 pp2048 762.33 615.27 0.81
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 64 pp2048 915.10 1367.04 1.49
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 128 pp2048 1439.75 1701.67 1.18
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 256 pp2048 1990.07 2002.38 1.01
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 512 pp2048 1963.91 2056.56 1.05
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 1024 pp2048 1917.01 2072.95 1.08
PRO W7900 Dual Slot llama 8B IQ3_S mix - 3.66 bpw 2048 pp2048 1947.19 2016.53 1.04
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 1 pp2048 82.83 82.38 0.99
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 2 pp2048 145.51 145.33 1.00
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 4 pp2048 244.41 243.89 1.00
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 8 pp2048 376.21 417.63 1.11
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 16 pp2048 525.57 593.25 1.13
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 32 pp2048 787.13 620.31 0.79
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 64 pp2048 909.63 1401.15 1.54
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 128 pp2048 1441.25 1717.14 1.19
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 256 pp2048 1992.22 1967.66 0.99
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 512 pp2048 1966.36 2002.64 1.02
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 1024 pp2048 1921.96 1950.32 1.01
PRO W7900 Dual Slot llama 8B IQ3_XS - 3.3 bpw 2048 pp2048 1952.51 1808.72 0.93
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 1 pp2048 89.59 89.49 1.00
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 2 pp2048 154.29 154.07 1.00
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 4 pp2048 253.94 253.34 1.00
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 8 pp2048 376.96 418.83 1.11
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 16 pp2048 534.30 608.85 1.14
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 32 pp2048 795.20 672.18 0.85
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 64 pp2048 907.68 1411.46 1.56
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 128 pp2048 1435.77 1727.08 1.20
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 256 pp2048 1980.00 2039.01 1.03
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 512 pp2048 1959.33 2103.84 1.07
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 1024 pp2048 1916.08 2120.41 1.11
PRO W7900 Dual Slot llama 8B IQ3_XXS - 3.0625 bpw 2048 pp2048 1952.05 2058.58 1.05
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 1 pp2048 89.84 90.14 1.00
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 2 pp2048 161.96 162.00 1.00
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 4 pp2048 283.82 284.01 1.00
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 8 pp2048 457.70 518.59 1.13
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 16 pp2048 621.81 795.53 1.28
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 32 pp2048 878.06 821.83 0.94
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 64 pp2048 878.89 1558.47 1.77
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 128 pp2048 1414.02 1858.45 1.31
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 256 pp2048 1975.53 2192.85 1.11
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 512 pp2048 1951.61 2243.64 1.15
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 1024 pp2048 1923.83 2260.91 1.18
PRO W7900 Dual Slot llama 8B IQ4_NL - 4.5 bpw 2048 pp2048 1977.10 2204.06 1.11
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 1 pp2048 94.01 94.10 1.00
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 2 pp2048 170.10 170.09 1.00
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 4 pp2048 298.65 298.65 1.00
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 8 pp2048 466.09 532.42 1.14
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 16 pp2048 634.03 824.04 1.30
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 32 pp2048 886.40 597.30 0.67
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 64 pp2048 888.15 1574.37 1.77
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 128 pp2048 1432.87 1865.24 1.30
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 256 pp2048 1983.10 2191.29 1.10
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 512 pp2048 1957.77 2237.67 1.14
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 1024 pp2048 1927.76 2258.80 1.17
PRO W7900 Dual Slot llama 8B IQ4_XS - 4.25 bpw 2048 pp2048 1966.60 2200.45 1.12
PRO W7900 Dual Slot llama 8B Q2_K_S 1 pp2048 104.03 103.86 1.00
PRO W7900 Dual Slot llama 8B Q2_K_S 2 pp2048 156.56 156.27 1.00
PRO W7900 Dual Slot llama 8B Q2_K_S 4 pp2048 204.91 204.39 1.00
PRO W7900 Dual Slot llama 8B Q2_K_S 8 pp2048 249.14 265.99 1.07
PRO W7900 Dual Slot llama 8B Q2_K_S 16 pp2048 557.05 409.44 0.74
PRO W7900 Dual Slot llama 8B Q2_K_S 32 pp2048 549.89 579.28 1.05
PRO W7900 Dual Slot llama 8B Q2_K_S 64 pp2048 921.02 818.31 0.89
PRO W7900 Dual Slot llama 8B Q2_K_S 128 pp2048 1452.22 1164.26 0.80
PRO W7900 Dual Slot llama 8B Q2_K_S 256 pp2048 1991.42 1221.65 0.61
PRO W7900 Dual Slot llama 8B Q2_K_S 512 pp2048 1985.17 1333.52 0.67
PRO W7900 Dual Slot llama 8B Q2_K_S 1024 pp2048 1942.76 1435.10 0.74
PRO W7900 Dual Slot llama 8B Q2_K_S 2048 pp2048 1975.16 1434.72 0.73
PRO W7900 Dual Slot llama 8B Q3_K_S 1 pp2048 80.44 80.66 1.00
PRO W7900 Dual Slot llama 8B Q3_K_S 2 pp2048 134.57 134.67 1.00
PRO W7900 Dual Slot llama 8B Q3_K_S 4 pp2048 197.14 197.57 1.00
PRO W7900 Dual Slot llama 8B Q3_K_S 8 pp2048 246.28 264.20 1.07
PRO W7900 Dual Slot llama 8B Q3_K_S 16 pp2048 539.32 469.68 0.87
PRO W7900 Dual Slot llama 8B Q3_K_S 32 pp2048 756.66 1011.50 1.34
PRO W7900 Dual Slot llama 8B Q3_K_S 64 pp2048 832.07 1378.54 1.66
PRO W7900 Dual Slot llama 8B Q3_K_S 128 pp2048 1331.93 1660.21 1.25
PRO W7900 Dual Slot llama 8B Q3_K_S 256 pp2048 1834.19 1937.16 1.06
PRO W7900 Dual Slot llama 8B Q3_K_S 512 pp2048 1932.29 1994.61 1.03
PRO W7900 Dual Slot llama 8B Q3_K_S 1024 pp2048 1930.39 2013.32 1.04
PRO W7900 Dual Slot llama 8B Q3_K_S 2048 pp2048 1971.61 1971.32 1.00
PRO W7900 Dual Slot llama 8B Q4_0 1 pp2048 92.59 92.85 1.00
PRO W7900 Dual Slot llama 8B Q4_0 2 pp2048 167.66 168.00 1.00
PRO W7900 Dual Slot llama 8B Q4_0 4 pp2048 293.21 293.11 1.00
PRO W7900 Dual Slot llama 8B Q4_0 8 pp2048 477.35 545.22 1.14
PRO W7900 Dual Slot llama 8B Q4_0 16 pp2048 623.04 764.53 1.23
PRO W7900 Dual Slot llama 8B Q4_0 32 pp2048 906.53 640.59 0.71
PRO W7900 Dual Slot llama 8B Q4_0 64 pp2048 895.68 1526.66 1.70
PRO W7900 Dual Slot llama 8B Q4_0 128 pp2048 1437.32 1862.50 1.30
PRO W7900 Dual Slot llama 8B Q4_0 256 pp2048 2009.97 2178.61 1.08
PRO W7900 Dual Slot llama 8B Q4_0 512 pp2048 1979.84 2221.04 1.12
PRO W7900 Dual Slot llama 8B Q4_0 1024 pp2048 1933.14 2240.34 1.16
PRO W7900 Dual Slot llama 8B Q4_0 2048 pp2048 1990.05 2178.16 1.09
PRO W7900 Dual Slot llama 8B Q4_1 1 pp2048 87.30 87.02 1.00
PRO W7900 Dual Slot llama 8B Q4_1 2 pp2048 159.37 158.97 1.00
PRO W7900 Dual Slot llama 8B Q4_1 4 pp2048 274.23 274.11 1.00
PRO W7900 Dual Slot llama 8B Q4_1 8 pp2048 478.54 545.67 1.14
PRO W7900 Dual Slot llama 8B Q4_1 16 pp2048 611.39 807.16 1.32
PRO W7900 Dual Slot llama 8B Q4_1 32 pp2048 927.59 1118.95 1.21
PRO W7900 Dual Slot llama 8B Q4_1 64 pp2048 877.06 1502.57 1.71
PRO W7900 Dual Slot llama 8B Q4_1 128 pp2048 1416.30 1653.30 1.17
PRO W7900 Dual Slot llama 8B Q4_1 256 pp2048 1989.90 2017.58 1.01
PRO W7900 Dual Slot llama 8B Q4_1 512 pp2048 1965.18 2079.99 1.06
PRO W7900 Dual Slot llama 8B Q4_1 1024 pp2048 1923.68 2116.16 1.10
PRO W7900 Dual Slot llama 8B Q4_1 2048 pp2048 1982.31 2079.10 1.05
PRO W7900 Dual Slot llama 8B Q4_K_S 1 pp2048 78.77 78.62 1.00
PRO W7900 Dual Slot llama 8B Q4_K_S 2 pp2048 126.64 126.48 1.00
PRO W7900 Dual Slot llama 8B Q4_K_S 4 pp2048 188.75 188.74 1.00
PRO W7900 Dual Slot llama 8B Q4_K_S 8 pp2048 252.05 270.04 1.07
PRO W7900 Dual Slot llama 8B Q4_K_S 16 pp2048 590.43 812.30 1.38
PRO W7900 Dual Slot llama 8B Q4_K_S 32 pp2048 807.23 1085.64 1.34
PRO W7900 Dual Slot llama 8B Q4_K_S 64 pp2048 870.47 1480.21 1.70
PRO W7900 Dual Slot llama 8B Q4_K_S 128 pp2048 1396.57 1673.81 1.20
PRO W7900 Dual Slot llama 8B Q4_K_S 256 pp2048 1931.75 2052.97 1.06
PRO W7900 Dual Slot llama 8B Q4_K_S 512 pp2048 1957.64 2109.49 1.08
PRO W7900 Dual Slot llama 8B Q4_K_S 1024 pp2048 1931.27 2142.37 1.11
PRO W7900 Dual Slot llama 8B Q4_K_S 2048 pp2048 1978.30 2106.29 1.06
PRO W7900 Dual Slot llama 8B Q5_1 1 pp2048 70.60 70.69 1.00
PRO W7900 Dual Slot llama 8B Q5_1 2 pp2048 128.57 128.67 1.00
PRO W7900 Dual Slot llama 8B Q5_1 4 pp2048 231.59 231.96 1.00
PRO W7900 Dual Slot llama 8B Q5_1 8 pp2048 428.61 482.33 1.13
PRO W7900 Dual Slot llama 8B Q5_1 16 pp2048 495.18 578.44 1.17
PRO W7900 Dual Slot llama 8B Q5_1 32 pp2048 768.89 874.56 1.14
PRO W7900 Dual Slot llama 8B Q5_1 64 pp2048 757.85 1286.75 1.70
PRO W7900 Dual Slot llama 8B Q5_1 128 pp2048 1237.01 1487.90 1.20
PRO W7900 Dual Slot llama 8B Q5_1 256 pp2048 1733.45 1885.56 1.09
PRO W7900 Dual Slot llama 8B Q5_1 512 pp2048 1873.29 1974.93 1.05
PRO W7900 Dual Slot llama 8B Q5_1 1024 pp2048 1907.48 2027.97 1.06
PRO W7900 Dual Slot llama 8B Q5_1 2048 pp2048 1960.31 1995.51 1.02
PRO W7900 Dual Slot llama 8B Q5_K_S 1 pp2048 73.36 73.33 1.00
PRO W7900 Dual Slot llama 8B Q5_K_S 2 pp2048 121.67 121.67 1.00
PRO W7900 Dual Slot llama 8B Q5_K_S 4 pp2048 184.13 183.70 1.00
PRO W7900 Dual Slot llama 8B Q5_K_S 8 pp2048 246.71 263.30 1.07
PRO W7900 Dual Slot llama 8B Q5_K_S 16 pp2048 554.75 829.08 1.49
PRO W7900 Dual Slot llama 8B Q5_K_S 32 pp2048 699.59 1116.24 1.60
PRO W7900 Dual Slot llama 8B Q5_K_S 64 pp2048 818.27 1498.33 1.83
PRO W7900 Dual Slot llama 8B Q5_K_S 128 pp2048 1332.58 1616.54 1.21
PRO W7900 Dual Slot llama 8B Q5_K_S 256 pp2048 1843.06 1968.65 1.07
PRO W7900 Dual Slot llama 8B Q5_K_S 512 pp2048 1928.32 2027.76 1.05
PRO W7900 Dual Slot llama 8B Q5_K_S 1024 pp2048 1934.12 2064.67 1.07
PRO W7900 Dual Slot llama 8B Q5_K_S 2048 pp2048 1971.71 2026.68 1.03
PRO W7900 Dual Slot llama 8B Q6_K 1 pp2048 69.29 69.18 1.00
PRO W7900 Dual Slot llama 8B Q6_K 2 pp2048 123.13 123.00 1.00
PRO W7900 Dual Slot llama 8B Q6_K 4 pp2048 197.96 197.34 1.00
PRO W7900 Dual Slot llama 8B Q6_K 8 pp2048 286.01 306.88 1.07
PRO W7900 Dual Slot llama 8B Q6_K 16 pp2048 498.59 623.24 1.25
PRO W7900 Dual Slot llama 8B Q6_K 32 pp2048 648.44 813.97 1.26
PRO W7900 Dual Slot llama 8B Q6_K 64 pp2048 834.21 1084.13 1.30
PRO W7900 Dual Slot llama 8B Q6_K 128 pp2048 1364.80 1094.80 0.80
PRO W7900 Dual Slot llama 8B Q6_K 256 pp2048 1897.44 1322.89 0.70
PRO W7900 Dual Slot llama 8B Q6_K 512 pp2048 1930.36 1450.17 0.75
PRO W7900 Dual Slot llama 8B Q6_K 1024 pp2048 1918.10 1470.46 0.77
PRO W7900 Dual Slot llama 8B Q6_K 2048 pp2048 1973.29 1488.04 0.75
PRO W7900 Dual Slot llama 8B Q8_0 1 pp2048 62.95 63.04 1.00
PRO W7900 Dual Slot llama 8B Q8_0 2 pp2048 115.78 115.89 1.00
PRO W7900 Dual Slot llama 8B Q8_0 4 pp2048 210.63 210.77 1.00
PRO W7900 Dual Slot llama 8B Q8_0 8 pp2048 383.93 422.00 1.10
PRO W7900 Dual Slot llama 8B Q8_0 16 pp2048 576.38 717.52 1.24
PRO W7900 Dual Slot llama 8B Q8_0 32 pp2048 864.31 619.38 0.72
PRO W7900 Dual Slot llama 8B Q8_0 64 pp2048 789.64 1476.07 1.87
PRO W7900 Dual Slot llama 8B Q8_0 128 pp2048 1295.11 1815.55 1.40
PRO W7900 Dual Slot llama 8B Q8_0 256 pp2048 1874.73 2173.67 1.16
PRO W7900 Dual Slot llama 8B Q8_0 512 pp2048 1891.10 2227.88 1.18
PRO W7900 Dual Slot llama 8B Q8_0 1024 pp2048 1902.79 2251.06 1.18
PRO W7900 Dual Slot llama 8B Q8_0 2048 pp2048 1964.10 2197.58 1.12
All quantization performance result for AMD Strix Halo (gfx1151)
GPU Model Microbatch size Test t/s master t/s mmq_feature_branch Speedup
Graphics llama 8B IQ1_S - 1.5625 bpw 1 pp2048 59.83 59.90 1.00
Graphics llama 8B IQ1_S - 1.5625 bpw 2 pp2048 104.71 104.66 1.00
Graphics llama 8B IQ1_S - 1.5625 bpw 4 pp2048 163.47 163.36 1.00
Graphics llama 8B IQ1_S - 1.5625 bpw 8 pp2048 217.52 269.92 1.24
Graphics llama 8B IQ1_S - 1.5625 bpw 16 pp2048 389.90 537.71 1.38
Graphics llama 8B IQ1_S - 1.5625 bpw 32 pp2048 532.61 600.36 1.13
Graphics llama 8B IQ1_S - 1.5625 bpw 64 pp2048 252.24 770.75 3.06
Graphics llama 8B IQ1_S - 1.5625 bpw 128 pp2048 436.51 854.21 1.96
Graphics llama 8B IQ1_S - 1.5625 bpw 256 pp2048 604.08 914.89 1.51
Graphics llama 8B IQ1_S - 1.5625 bpw 512 pp2048 730.25 891.78 1.22
Graphics llama 8B IQ1_S - 1.5625 bpw 1024 pp2048 808.51 888.27 1.10
Graphics llama 8B IQ1_S - 1.5625 bpw 2048 pp2048 784.05 807.68 1.03
Graphics llama 8B IQ2_XS - 2.3125 bpw 1 pp2048 49.70 49.49 1.00
Graphics llama 8B IQ2_XS - 2.3125 bpw 2 pp2048 88.14 87.73 1.00
Graphics llama 8B IQ2_XS - 2.3125 bpw 4 pp2048 139.02 138.65 1.00
Graphics llama 8B IQ2_XS - 2.3125 bpw 8 pp2048 178.29 212.21 1.19
Graphics llama 8B IQ2_XS - 2.3125 bpw 16 pp2048 324.40 304.76 0.94
Graphics llama 8B IQ2_XS - 2.3125 bpw 32 pp2048 456.98 550.06 1.20
Graphics llama 8B IQ2_XS - 2.3125 bpw 64 pp2048 250.13 703.03 2.81
Graphics llama 8B IQ2_XS - 2.3125 bpw 128 pp2048 433.34 737.41 1.70
Graphics llama 8B IQ2_XS - 2.3125 bpw 256 pp2048 602.11 789.88 1.31
Graphics llama 8B IQ2_XS - 2.3125 bpw 512 pp2048 727.09 781.97 1.08
Graphics llama 8B IQ2_XS - 2.3125 bpw 1024 pp2048 806.37 782.58 0.97
Graphics llama 8B IQ2_XS - 2.3125 bpw 2048 pp2048 782.47 724.99 0.93
Graphics llama 8B IQ2_XXS - 2.0625 bpw 1 pp2048 41.54 41.38 1.00
Graphics llama 8B IQ2_XXS - 2.0625 bpw 2 pp2048 75.11 74.86 1.00
Graphics llama 8B IQ2_XXS - 2.0625 bpw 4 pp2048 123.87 123.62 1.00
Graphics llama 8B IQ2_XXS - 2.0625 bpw 8 pp2048 164.17 191.86 1.17
Graphics llama 8B IQ2_XXS - 2.0625 bpw 16 pp2048 313.88 380.15 1.21
Graphics llama 8B IQ2_XXS - 2.0625 bpw 32 pp2048 457.55 419.77 0.92
Graphics llama 8B IQ2_XXS - 2.0625 bpw 64 pp2048 251.35 715.39 2.85
Graphics llama 8B IQ2_XXS - 2.0625 bpw 128 pp2048 435.15 832.72 1.91
Graphics llama 8B IQ2_XXS - 2.0625 bpw 256 pp2048 605.74 895.57 1.48
Graphics llama 8B IQ2_XXS - 2.0625 bpw 512 pp2048 714.66 876.64 1.23
Graphics llama 8B IQ2_XXS - 2.0625 bpw 1024 pp2048 807.75 871.60 1.08
Graphics llama 8B IQ2_XXS - 2.0625 bpw 2048 pp2048 781.91 798.25 1.02
Graphics llama 8B Q2_K_S 1 pp2048 51.50 51.49 1.00
Graphics llama 8B Q2_K_S 2 pp2048 82.63 82.75 1.00
Graphics llama 8B Q2_K_S 4 pp2048 106.98 107.28 1.00
Graphics llama 8B Q2_K_S 8 pp2048 115.64 129.42 1.12
Graphics llama 8B Q2_K_S 16 pp2048 336.98 257.13 0.76
Graphics llama 8B Q2_K_S 32 pp2048 323.61 360.17 1.11
Graphics llama 8B Q2_K_S 64 pp2048 249.79 494.88 1.98
Graphics llama 8B Q2_K_S 128 pp2048 432.75 544.20 1.26
Graphics llama 8B Q2_K_S 256 pp2048 598.52 585.13 0.98
Graphics llama 8B Q2_K_S 512 pp2048 729.62 602.31 0.83
Graphics llama 8B Q2_K_S 1024 pp2048 800.13 627.29 0.78
Graphics llama 8B Q2_K_S 2048 pp2048 778.95 591.66 0.76
Graphics llama 8B Q3_K_S 1 pp2048 42.14 42.14 1.00
Graphics llama 8B Q3_K_S 2 pp2048 72.14 71.93 1.00
Graphics llama 8B Q3_K_S 4 pp2048 103.72 103.01 0.99
Graphics llama 8B Q3_K_S 8 pp2048 115.52 128.04 1.11
Graphics llama 8B Q3_K_S 16 pp2048 326.11 313.91 0.96
Graphics llama 8B Q3_K_S 32 pp2048 442.67 589.17 1.33
Graphics llama 8B Q3_K_S 64 pp2048 243.59 736.42 3.02
Graphics llama 8B Q3_K_S 128 pp2048 423.89 864.64 2.04
Graphics llama 8B Q3_K_S 256 pp2048 590.88 923.04 1.56
Graphics llama 8B Q3_K_S 512 pp2048 717.73 912.07 1.27
Graphics llama 8B Q3_K_S 1024 pp2048 799.04 897.25 1.12
Graphics llama 8B Q3_K_S 2048 pp2048 775.37 812.76 1.05
Graphics llama 8B Q4_0 1 pp2048 39.70 39.79 1.00
Graphics llama 8B Q4_0 2 pp2048 76.18 76.37 1.00
Graphics llama 8B Q4_0 4 pp2048 139.13 139.35 1.00
Graphics llama 8B Q4_0 8 pp2048 226.14 283.05 1.25
Graphics llama 8B Q4_0 16 pp2048 374.19 494.95 1.32
Graphics llama 8B Q4_0 32 pp2048 509.56 379.15 0.74
Graphics llama 8B Q4_0 64 pp2048 243.16 808.46 3.32
Graphics llama 8B Q4_0 128 pp2048 423.11 916.38 2.17
Graphics llama 8B Q4_0 256 pp2048 590.42 979.33 1.66
Graphics llama 8B Q4_0 512 pp2048 726.45 955.42 1.32
Graphics llama 8B Q4_0 1024 pp2048 808.57 943.67 1.17
Graphics llama 8B Q4_0 2048 pp2048 779.26 855.14 1.10
Graphics llama 8B Q4_1 1 pp2048 36.48 36.52 1.00
Graphics llama 8B Q4_1 2 pp2048 71.08 71.24 1.00
Graphics llama 8B Q4_1 4 pp2048 131.92 132.17 1.00
Graphics llama 8B Q4_1 8 pp2048 219.34 274.38 1.25
Graphics llama 8B Q4_1 16 pp2048 341.72 486.38 1.42
Graphics llama 8B Q4_1 32 pp2048 491.51 623.27 1.27
Graphics llama 8B Q4_1 64 pp2048 241.80 791.80 3.27
Graphics llama 8B Q4_1 128 pp2048 420.49 815.73 1.94
Graphics llama 8B Q4_1 256 pp2048 589.14 880.82 1.50
Graphics llama 8B Q4_1 512 pp2048 722.50 872.68 1.21
Graphics llama 8B Q4_1 1024 pp2048 804.07 867.01 1.08
Graphics llama 8B Q4_1 2048 pp2048 776.24 799.71 1.03
Graphics llama 8B Q4_K_S 1 pp2048 36.85 36.88 1.00
Graphics llama 8B Q4_K_S 2 pp2048 62.35 62.45 1.00
Graphics llama 8B Q4_K_S 4 pp2048 94.11 94.07 1.00
Graphics llama 8B Q4_K_S 8 pp2048 118.60 133.25 1.12
Graphics llama 8B Q5_1 1 pp2048 30.45 30.47 1.00
Graphics llama 8B Q5_1 2 pp2048 59.52 59.56 1.00
Graphics llama 8B Q5_1 4 pp2048 112.25 112.25 1.00
Graphics llama 8B Q5_1 8 pp2048 191.34 232.75 1.22
Graphics llama 8B Q5_1 16 pp2048 267.52 342.23 1.28
Graphics llama 8B Q5_1 32 pp2048 447.40 506.85 1.13
Graphics llama 8B Q5_1 64 pp2048 228.36 696.09 3.05
Graphics llama 8B Q5_1 128 pp2048 400.89 813.09 2.03
Graphics llama 8B Q5_1 256 pp2048 566.88 884.65 1.56
Graphics llama 8B Q5_1 512 pp2048 705.10 880.81 1.25
Graphics llama 8B Q5_1 1024 pp2048 793.35 876.75 1.11
Graphics llama 8B Q5_1 2048 pp2048 769.48 803.44 1.04
Graphics llama 8B Q5_K_S 1 pp2048 33.36 33.34 1.00
Graphics llama 8B Q5_K_S 2 pp2048 58.02 57.86 1.00
Graphics llama 8B Q5_K_S 4 pp2048 90.37 90.15 1.00
Graphics llama 8B Q5_K_S 8 pp2048 115.17 128.41 1.11
Graphics llama 8B Q5_K_S 16 pp2048 328.82 508.01 1.54
Graphics llama 8B Q5_K_S 32 pp2048 431.21 615.53 1.43
Graphics llama 8B Q5_K_S 64 pp2048 235.29 792.11 3.37
Graphics llama 8B Q5_K_S 128 pp2048 411.76 864.48 2.10
Graphics llama 8B Q5_K_S 256 pp2048 578.32 919.98 1.59
Graphics llama 8B Q5_K_S 512 pp2048 717.19 908.41 1.27
Graphics llama 8B Q5_K_S 1024 pp2048 799.98 900.27 1.13
Graphics llama 8B Q5_K_S 2048 pp2048 778.09 820.54 1.05
Graphics llama 8B Q6_K 1 pp2048 29.58 29.51 1.00
Graphics llama 8B Q6_K 2 pp2048 56.47 56.37 1.00
Graphics llama 8B Q6_K 4 pp2048 102.88 102.60 1.00
Graphics llama 8B Q6_K 8 pp2048 143.89 163.08 1.13
Graphics llama 8B Q6_K 16 pp2048 294.77 383.68 1.30
Graphics llama 8B Q6_K 32 pp2048 392.14 469.49 1.20
Graphics llama 8B Q6_K 64 pp2048 238.05 592.30 2.49
Graphics llama 8B Q6_K 128 pp2048 415.60 620.97 1.49
Graphics llama 8B Q6_K 256 pp2048 582.53 658.80 1.13
Graphics llama 8B Q6_K 512 pp2048 719.82 654.72 0.91
Graphics llama 8B Q6_K 1024 pp2048 796.31 663.25 0.83
Graphics llama 8B Q6_K 2048 pp2048 773.72 624.57 0.81
Graphics llama 8B Q8_0 1 pp2048 25.02 25.00 1.00
Graphics llama 8B Q8_0 2 pp2048 48.78 48.77 1.00
Graphics llama 8B Q8_0 4 pp2048 92.37 92.37 1.00
Graphics llama 8B Q8_0 8 pp2048 166.97 198.40 1.19
Graphics llama 8B Q8_0 16 pp2048 303.03 364.41 1.20
Graphics llama 8B Q8_0 32 pp2048 484.44 374.38 0.77
Graphics llama 8B Q8_0 64 pp2048 227.11 750.26 3.30
Graphics llama 8B Q8_0 128 pp2048 398.25 873.07 2.19
Graphics llama 8B Q8_0 256 pp2048 564.91 941.85 1.67
Graphics llama 8B Q8_0 512 pp2048 705.53 925.05 1.31
Graphics llama 8B Q8_0 1024 pp2048 786.56 915.31 1.16
Graphics llama 8B Q8_0 2048 pp2048 773.68 833.31 1.08

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 28, 2025
@jiachengjason jiachengjason marked this pull request as ready for review December 1, 2025 21:36
static constexpr int ne = I * J / 32;
#elif defined(RDNA3)
static constexpr int ne = (I == 16 && J == 16) ? I * J / 32 : I * J / 16;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#endif
#endif // defined(RDNA4)

Please add comments to indicate which #if/#ifdef and #endif is closing.

Comment on lines 310 to 312
if (GGML_CUDA_CC_IS_RDNA4(cc) || GGML_CUDA_CC_IS_RDNA3(cc)) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (GGML_CUDA_CC_IS_RDNA4(cc) || GGML_CUDA_CC_IS_RDNA3(cc)) {
return true;
}
return true;

Comment on lines +1545 to +1548
A1.x[0] = 0x01010101;
A1.x[1] = 0x01010101;
A1.x[2] = 0x01010101;
A1.x[3] = 0x01010101;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A1.x[0] = 0x01010101;
A1.x[1] = 0x01010101;
A1.x[2] = 0x01010101;
A1.x[3] = 0x01010101;
#pragma unroll
for (int l = 0; l < tile_A::ne; ++l) {
A1.x[l] = 0x01010101;
}

To my understanding tile_A has 4 elements for RDNA3 but for RDNA4 it only has 2 elements. So as it is this would result in out-of-bounds writes and potential memory trampling for RDNA4.

@JohannesGaessler
Copy link
Contributor

Performance
GPU Model Microbatch size Test t/s b7157 t/s 5fbd8f5 Speedup
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 1 pp2048 64.59 64.41 1.00
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 2 pp2048 110.14 109.52 0.99
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 4 pp2048 170.87 169.85 0.99
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 8 pp2048 222.56 221.16 0.99
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 16 pp2048 395.01 417.50 1.06
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 32 pp2048 522.67 599.04 1.15
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 64 pp2048 118.49 827.85 6.99
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 128 pp2048 181.98 946.77 5.20
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 256 pp2048 235.88 997.65 4.23
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 512 pp2048 242.24 1017.22 4.20
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 1024 pp2048 239.11 1059.32 4.43
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 2048 pp2048 231.05 1064.40 4.61
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 1 pp2048 50.21 49.96 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 2 pp2048 87.32 87.10 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 4 pp2048 141.93 141.21 0.99
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 8 pp2048 175.84 175.23 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 16 pp2048 315.11 259.71 0.82
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 32 pp2048 455.38 529.14 1.16
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 64 pp2048 92.41 758.04 8.20
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 128 pp2048 167.63 809.04 4.83
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 256 pp2048 219.98 847.20 3.85
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 512 pp2048 239.01 855.62 3.58
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 1024 pp2048 233.00 898.96 3.86
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 2048 pp2048 233.02 905.77 3.89
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 1 pp2048 52.65 52.08 0.99
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 2 pp2048 89.97 89.90 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 4 pp2048 143.33 142.96 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 8 pp2048 173.30 172.72 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 16 pp2048 322.15 255.08 0.79
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 32 pp2048 454.40 541.08 1.19
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 64 pp2048 104.02 748.76 7.20
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 128 pp2048 185.61 783.51 4.22
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 256 pp2048 236.47 824.43 3.49
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 512 pp2048 237.27 837.71 3.53
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 1024 pp2048 228.08 871.67 3.82
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 2048 pp2048 232.11 879.28 3.79
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 1 pp2048 44.19 44.41 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 2 pp2048 78.58 78.43 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 4 pp2048 131.48 131.00 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 8 pp2048 160.49 159.67 0.99
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 16 pp2048 314.37 306.76 0.98
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 32 pp2048 450.01 438.71 0.97
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 64 pp2048 92.97 771.88 8.30
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 128 pp2048 172.90 927.74 5.37
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 256 pp2048 233.91 975.01 4.17
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 512 pp2048 236.49 990.81 4.19
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 1024 pp2048 240.16 1046.67 4.36
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 2048 pp2048 232.69 1051.67 4.52
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 1 pp2048 40.70 40.85 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 2 pp2048 74.26 74.87 1.01
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 4 pp2048 127.92 128.68 1.01
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 8 pp2048 166.47 167.72 1.01
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 16 pp2048 303.98 285.38 0.94
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 32 pp2048 440.40 421.93 0.96
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 64 pp2048 85.71 757.60 8.84
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 128 pp2048 185.50 937.98 5.06
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 256 pp2048 239.80 987.76 4.12
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 512 pp2048 239.71 994.13 4.15
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 1024 pp2048 233.88 1046.17 4.47
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 2048 pp2048 228.90 1050.69 4.59
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 1 pp2048 40.51 40.34 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 2 pp2048 73.22 72.52 0.99
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 4 pp2048 122.21 121.83 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 8 pp2048 156.81 156.42 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 16 pp2048 307.89 296.61 0.96
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 32 pp2048 442.49 437.80 0.99
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 64 pp2048 96.38 765.61 7.94
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 128 pp2048 170.83 942.33 5.52
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 256 pp2048 233.54 982.29 4.21
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 512 pp2048 246.83 995.86 4.03
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 1024 pp2048 233.84 1047.68 4.48
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 2048 pp2048 235.51 1051.74 4.47
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 1 pp2048 44.42 44.29 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 2 pp2048 80.10 79.81 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 4 pp2048 135.35 135.02 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 8 pp2048 171.64 171.61 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 16 pp2048 315.72 310.39 0.98
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 32 pp2048 456.56 437.37 0.96
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 64 pp2048 103.47 782.05 7.56
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 128 pp2048 179.06 960.70 5.37
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 256 pp2048 233.07 1011.86 4.34
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 512 pp2048 238.73 1021.87 4.28
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 1024 pp2048 231.24 1071.48 4.63
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 2048 pp2048 232.21 1076.43 4.64
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 1 pp2048 47.47 47.65 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 2 pp2048 84.05 83.67 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 4 pp2048 138.70 138.05 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 8 pp2048 172.52 172.28 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 16 pp2048 322.95 330.15 1.02
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 32 pp2048 463.75 463.10 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 64 pp2048 90.30 788.59 8.73
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 128 pp2048 161.00 963.40 5.98
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 256 pp2048 233.66 1013.81 4.34
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 512 pp2048 228.25 1031.85 4.52
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 1024 pp2048 222.06 1077.84 4.85
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 2048 pp2048 234.88 1081.43 4.60
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 1 pp2048 43.04 42.85 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 2 pp2048 79.87 79.55 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 4 pp2048 146.87 146.48 1.00
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 8 pp2048 227.10 225.45 0.99
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 16 pp2048 376.57 388.85 1.03
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 32 pp2048 516.50 465.42 0.90
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 64 pp2048 100.32 897.32 8.94
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 128 pp2048 176.43 1022.42 5.79
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 256 pp2048 236.72 1085.21 4.58
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 512 pp2048 233.09 1102.73 4.73
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 1024 pp2048 239.59 1150.19 4.80
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 2048 pp2048 235.59 1152.76 4.89
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 1 pp2048 45.99 46.14 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 2 pp2048 84.66 84.34 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 4 pp2048 155.84 155.24 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 8 pp2048 238.08 237.66 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 16 pp2048 377.13 414.81 1.10
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 32 pp2048 514.71 357.72 0.69
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 64 pp2048 79.21 899.88 11.36
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 128 pp2048 160.59 1036.62 6.46
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 256 pp2048 229.90 1098.13 4.78
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 512 pp2048 224.19 1120.28 5.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 1024 pp2048 230.33 1171.51 5.09
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 2048 pp2048 234.13 1165.46 4.98
Radeon 8060S Graphics llama 8B Q2_K_S 1 pp2048 55.16 55.00 1.00
Radeon 8060S Graphics llama 8B Q2_K_S 2 pp2048 81.95 82.30 1.00
Radeon 8060S Graphics llama 8B Q2_K_S 4 pp2048 109.68 109.53 1.00
Radeon 8060S Graphics llama 8B Q2_K_S 8 pp2048 107.26 108.30 1.01
Radeon 8060S Graphics llama 8B Q2_K_S 16 pp2048 331.73 216.63 0.65
Radeon 8060S Graphics llama 8B Q2_K_S 32 pp2048 308.48 364.22 1.18
Radeon 8060S Graphics llama 8B Q2_K_S 64 pp2048 125.82 503.25 4.00
Radeon 8060S Graphics llama 8B Q2_K_S 128 pp2048 173.65 553.74 3.19
Radeon 8060S Graphics llama 8B Q2_K_S 256 pp2048 231.79 590.91 2.55
Radeon 8060S Graphics llama 8B Q2_K_S 512 pp2048 241.82 614.09 2.54
Radeon 8060S Graphics llama 8B Q2_K_S 1024 pp2048 233.63 666.47 2.85
Radeon 8060S Graphics llama 8B Q2_K_S 2048 pp2048 235.24 672.05 2.86
Radeon 8060S Graphics llama 8B Q3_K_S 1 pp2048 43.11 43.27 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 2 pp2048 69.03 69.17 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 4 pp2048 77.68 77.88 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 8 pp2048 104.11 104.41 1.00
Radeon 8060S Graphics llama 8B Q3_K_S 16 pp2048 319.78 269.45 0.84
Radeon 8060S Graphics llama 8B Q3_K_S 32 pp2048 432.20 580.94 1.34
Radeon 8060S Graphics llama 8B Q3_K_S 64 pp2048 89.66 779.82 8.70
Radeon 8060S Graphics llama 8B Q3_K_S 128 pp2048 180.65 948.71 5.25
Radeon 8060S Graphics llama 8B Q3_K_S 256 pp2048 232.26 997.47 4.29
Radeon 8060S Graphics llama 8B Q3_K_S 512 pp2048 229.18 1025.74 4.48
Radeon 8060S Graphics llama 8B Q3_K_S 1024 pp2048 241.87 1068.25 4.42
Radeon 8060S Graphics llama 8B Q3_K_S 2048 pp2048 235.83 1070.42 4.54
Radeon 8060S Graphics llama 8B Q4_0 1 pp2048 44.05 44.22 1.00
Radeon 8060S Graphics llama 8B Q4_0 2 pp2048 81.13 80.86 1.00
Radeon 8060S Graphics llama 8B Q4_0 4 pp2048 150.12 149.24 0.99
Radeon 8060S Graphics llama 8B Q4_0 8 pp2048 227.02 218.37 0.96
Radeon 8060S Graphics llama 8B Q4_0 16 pp2048 368.63 377.51 1.02
Radeon 8060S Graphics llama 8B Q4_0 32 pp2048 492.68 328.47 0.67
Radeon 8060S Graphics llama 8B Q4_0 64 pp2048 96.02 827.55 8.62
Radeon 8060S Graphics llama 8B Q4_0 128 pp2048 176.73 975.52 5.52
Radeon 8060S Graphics llama 8B Q4_0 256 pp2048 236.38 1027.45 4.35
Radeon 8060S Graphics llama 8B Q4_0 512 pp2048 241.32 1050.01 4.35
Radeon 8060S Graphics llama 8B Q4_0 1024 pp2048 232.06 1091.70 4.70
Radeon 8060S Graphics llama 8B Q4_0 2048 pp2048 219.58 1090.36 4.97
Radeon 8060S Graphics llama 8B Q4_1 1 pp2048 40.32 39.83 0.99
Radeon 8060S Graphics llama 8B Q4_1 2 pp2048 76.13 74.88 0.98
Radeon 8060S Graphics llama 8B Q4_1 4 pp2048 142.82 140.18 0.98
Radeon 8060S Graphics llama 8B Q4_1 8 pp2048 229.88 220.70 0.96
Radeon 8060S Graphics llama 8B Q4_1 16 pp2048 344.82 389.91 1.13
Radeon 8060S Graphics llama 8B Q4_1 32 pp2048 481.18 592.29 1.23
Radeon 8060S Graphics llama 8B Q4_1 64 pp2048 92.41 806.43 8.73
Radeon 8060S Graphics llama 8B Q4_1 128 pp2048 182.70 852.10 4.66
Radeon 8060S Graphics llama 8B Q4_1 256 pp2048 227.37 905.45 3.98
Radeon 8060S Graphics llama 8B Q4_1 512 pp2048 247.61 913.52 3.69
Radeon 8060S Graphics llama 8B Q4_1 1024 pp2048 239.45 977.86 4.08
Radeon 8060S Graphics llama 8B Q4_1 2048 pp2048 228.53 985.37 4.31
Radeon 8060S Graphics llama 8B Q4_K_S 1 pp2048 36.91 36.97 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 2 pp2048 59.71 59.64 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 4 pp2048 90.77 90.74 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 8 pp2048 111.25 111.11 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 16 pp2048 348.88 400.24 1.15
Radeon 8060S Graphics llama 8B Q4_K_S 32 pp2048 456.45 596.97 1.31
Radeon 8060S Graphics llama 8B Q4_K_S 64 pp2048 103.69 829.45 8.00
Radeon 8060S Graphics llama 8B Q4_K_S 128 pp2048 173.50 956.40 5.51
Radeon 8060S Graphics llama 8B Q4_K_S 256 pp2048 231.08 1003.89 4.34
Radeon 8060S Graphics llama 8B Q4_K_S 512 pp2048 240.33 1024.19 4.26
Radeon 8060S Graphics llama 8B Q4_K_S 1024 pp2048 237.83 1082.11 4.55
Radeon 8060S Graphics llama 8B Q4_K_S 2048 pp2048 234.01 1083.16 4.63
Radeon 8060S Graphics llama 8B Q5_0 1 pp2048 38.27 38.19 1.00
Radeon 8060S Graphics llama 8B Q5_0 2 pp2048 70.68 70.09 0.99
Radeon 8060S Graphics llama 8B Q5_0 4 pp2048 131.39 129.59 0.99
Radeon 8060S Graphics llama 8B Q5_0 8 pp2048 215.73 207.00 0.96
Radeon 8060S Graphics llama 8B Q5_0 16 pp2048 332.22 319.15 0.96
Radeon 8060S Graphics llama 8B Q5_0 32 pp2048 469.37 263.91 0.56
Radeon 8060S Graphics llama 8B Q5_0 64 pp2048 87.59 789.16 9.01
Radeon 8060S Graphics llama 8B Q5_0 128 pp2048 177.16 953.71 5.38
Radeon 8060S Graphics llama 8B Q5_0 256 pp2048 223.48 1017.42 4.55
Radeon 8060S Graphics llama 8B Q5_0 512 pp2048 237.09 1038.12 4.38
Radeon 8060S Graphics llama 8B Q5_0 1024 pp2048 225.83 1081.58 4.79
Radeon 8060S Graphics llama 8B Q5_0 2048 pp2048 235.68 1078.49 4.58
Radeon 8060S Graphics llama 8B Q5_1 1 pp2048 33.29 33.13 1.00
Radeon 8060S Graphics llama 8B Q5_1 2 pp2048 63.68 63.22 0.99
Radeon 8060S Graphics llama 8B Q5_1 4 pp2048 122.40 122.39 1.00
Radeon 8060S Graphics llama 8B Q5_1 8 pp2048 202.96 202.54 1.00
Radeon 8060S Graphics llama 8B Q5_1 16 pp2048 269.11 274.79 1.02
Radeon 8060S Graphics llama 8B Q5_1 32 pp2048 445.14 504.62 1.13
Radeon 8060S Graphics llama 8B Q5_1 64 pp2048 77.63 747.86 9.63
Radeon 8060S Graphics llama 8B Q5_1 128 pp2048 164.50 890.15 5.41
Radeon 8060S Graphics llama 8B Q5_1 256 pp2048 220.13 950.10 4.32
Radeon 8060S Graphics llama 8B Q5_1 512 pp2048 235.44 982.28 4.17
Radeon 8060S Graphics llama 8B Q5_1 1024 pp2048 239.13 1037.66 4.34
Radeon 8060S Graphics llama 8B Q5_1 2048 pp2048 236.47 1041.81 4.41
Radeon 8060S Graphics llama 8B Q5_K_S 1 pp2048 33.26 33.14 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 2 pp2048 55.50 55.48 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 4 pp2048 86.88 86.95 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 8 pp2048 108.10 108.16 1.00
Radeon 8060S Graphics llama 8B Q5_K_S 16 pp2048 322.77 398.07 1.23
Radeon 8060S Graphics llama 8B Q5_K_S 32 pp2048 399.04 597.20 1.50
Radeon 8060S Graphics llama 8B Q5_K_S 64 pp2048 107.29 848.10 7.90
Radeon 8060S Graphics llama 8B Q5_K_S 128 pp2048 167.89 933.85 5.56
Radeon 8060S Graphics llama 8B Q5_K_S 256 pp2048 228.17 989.01 4.33
Radeon 8060S Graphics llama 8B Q5_K_S 512 pp2048 229.98 1001.43 4.35
Radeon 8060S Graphics llama 8B Q5_K_S 1024 pp2048 233.47 1059.95 4.54
Radeon 8060S Graphics llama 8B Q5_K_S 2048 pp2048 236.34 1065.00 4.51
Radeon 8060S Graphics llama 8B Q6_K 1 pp2048 32.37 32.34 1.00
Radeon 8060S Graphics llama 8B Q6_K 2 pp2048 58.31 58.16 1.00
Radeon 8060S Graphics llama 8B Q6_K 4 pp2048 104.07 103.64 1.00
Radeon 8060S Graphics llama 8B Q6_K 8 pp2048 139.88 137.91 0.99
Radeon 8060S Graphics llama 8B Q6_K 16 pp2048 295.75 322.76 1.09
Radeon 8060S Graphics llama 8B Q6_K 32 pp2048 348.83 461.68 1.32
Radeon 8060S Graphics llama 8B Q6_K 64 pp2048 78.75 606.52 7.70
Radeon 8060S Graphics llama 8B Q6_K 128 pp2048 167.87 705.62 4.20
Radeon 8060S Graphics llama 8B Q6_K 256 pp2048 226.10 739.75 3.27
Radeon 8060S Graphics llama 8B Q6_K 512 pp2048 238.97 752.91 3.15
Radeon 8060S Graphics llama 8B Q6_K 1024 pp2048 227.70 790.67 3.47
Radeon 8060S Graphics llama 8B Q6_K 2048 pp2048 221.73 802.90 3.62
Radeon 8060S Graphics llama 8B Q8_0 1 pp2048 26.36 26.46 1.00
Radeon 8060S Graphics llama 8B Q8_0 2 pp2048 51.45 51.54 1.00
Radeon 8060S Graphics llama 8B Q8_0 4 pp2048 100.68 100.50 1.00
Radeon 8060S Graphics llama 8B Q8_0 8 pp2048 176.95 175.28 0.99
Radeon 8060S Graphics llama 8B Q8_0 16 pp2048 302.59 309.96 1.02
Radeon 8060S Graphics llama 8B Q8_0 32 pp2048 467.38 360.98 0.77
Radeon 8060S Graphics llama 8B Q8_0 64 pp2048 110.18 787.00 7.14
Radeon 8060S Graphics llama 8B Q8_0 128 pp2048 182.45 932.92 5.11
Radeon 8060S Graphics llama 8B Q8_0 256 pp2048 233.55 992.30 4.25
Radeon 8060S Graphics llama 8B Q8_0 512 pp2048 245.72 1022.11 4.16
Radeon 8060S Graphics llama 8B Q8_0 1024 pp2048 237.64 1077.98 4.54
Radeon 8060S Graphics llama 8B Q8_0 2048 pp2048 234.19 1082.87 4.62

In terms of performance I think this PR would be good to merge. There are some cases around batch size 32 that have suboptimal performance but that particular batch size is comparatively less important vs. the larger ones. So I think it would be fine to merge the PR as-is and to maybe optimize that use case in a follow-up PR. (Batch sizes 1-8 are using the same code for both tests so changes there are just random noise and can be ignored, I only included them to investigate the scaling.)

@JohannesGaessler
Copy link
Contributor

(This PR still needs a rebase on top of master.)

@jiachengjason jiachengjason force-pushed the feat/jiachengjason/enable_mmq_kernels_for_RDNA3 branch 2 times, most recently from a34b76f to c9ec96c Compare December 4, 2025 05:42
@jiachengjason
Copy link
Contributor Author

(This PR still needs a rebase on top of master.)

has been done

);
#endif // defined(RDNA4)

#elif defined(RDNA3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#elif defined(RDNA3)
#elif defined(RDNA3)

To fix the EditorConfig CI.

@jiachengjason jiachengjason force-pushed the feat/jiachengjason/enable_mmq_kernels_for_RDNA3 branch from 5941226 to 685be0e Compare December 4, 2025 15:11
@JohannesGaessler JohannesGaessler merged commit 668ed76 into ggml-org:master Dec 5, 2025
78 checks passed
@CISC
Copy link
Member

CISC commented Dec 5, 2025

@hjc4869
Copy link
Contributor

hjc4869 commented Dec 5, 2025

Seems like some minor issues on RDNA4: hjc4869@df264e1

@arch-btw
Copy link
Contributor

arch-btw commented Dec 5, 2025

@jiachengjason

Thank you for this! Those are great speed up results.

I did get some errors and my build failed, could you please take a look?

log.log

@Beinsezii
Copy link
Contributor

Beinsezii commented Dec 5, 2025

On GFX1100 + ROC641 seems like this commit is causing test-backend-ops -o MUL_MAT_ID to fail with inf overflows

@jiachengjason
Copy link
Contributor Author

On GFX1100 + ROC641 seems like this commit is causing test-backend-ops -o MUL_MAT_ID to fail with inf overflows

I believe this is due to FP16/BF16 MMF kernels not been enabled yet, once this pr (#17495) get merged it should no longer get this failure

@Beinsezii
Copy link
Contributor

I believe this is due to FP16/BF16 MMF kernels not been enabled yet, once this pr (#17495) get merged it should no longer get this failure

I suppose ROCm builds will just be broken for RDNA 3 until someone finds the time to finish that PR then?

JayZenith pushed a commit to JayZenith/llama.cpp that referenced this pull request Dec 7, 2025
* enabled wmma instructions for most quantizations other than q2k

* fixed the last q2_k test case failure

* address comments: fix out of bound write for RDNA4, add comments after #endif

* clean up rebase: fix ne error in half2

* fix the EditorConfig CI
0Marble pushed a commit to 0Marble/llama.cpp that referenced this pull request Dec 18, 2025
* enabled wmma instructions for most quantizations other than q2k

* fixed the last q2_k test case failure

* address comments: fix out of bound write for RDNA4, add comments after #endif

* clean up rebase: fix ne error in half2

* fix the EditorConfig CI
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Dec 20, 2025
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* enabled wmma instructions for most quantizations other than q2k

* fixed the last q2_k test case failure

* address comments: fix out of bound write for RDNA4, add comments after #endif

* clean up rebase: fix ne error in half2

* fix the EditorConfig CI
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* enabled wmma instructions for most quantizations other than q2k

* fixed the last q2_k test case failure

* address comments: fix out of bound write for RDNA4, add comments after #endif

* clean up rebase: fix ne error in half2

* fix the EditorConfig CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants