HIP: enable WMMA-MMQ INT kernels for RDNA 3 by jiachengjason · Pull Request #17576 · ggml-org/llama.cpp

jiachengjason · 2025-11-28T15:26:11Z

Enabled WMMA-MMQ INT kernels for RDNA 3 architecture on AMD GPUs

Following similar approach to #17156

Using ./build/bin/llama-bench to collect the following performance results

Build command for the following performance results:
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" cmake -S . -B build -DGGML_HIP=ON -DGGML_CUDA_FORCE_MMQ=OFF -DGGML_HIP_UMA=OFF -DGGML_HIP_ROCWMMA_FATTN=OFF -DGPU_TARGETS="gfx1100" -DGGML_HIP_GRAPHS=OFF -DLLAMA_CURL=OFF -DGGML_CUDA_FORCE_CUBLAS=OFF -DCMAKE_BUILD_TYPE=Release && cmake --build build --config Release -- -j 32

Popular models performance result for AMD Radeon AI PRO W7900 (gfx1100)

Popular models performance result for AMD Strix Halo (gfx1151)

All quantization performance result for AMD Radeon AI PRO W7900 (gfx1100)

GPU	Model	Microbatch size	Test	t/s `583cb83`	t/s mmq_feature_branch	Speedup
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	1	pp2048	119.44	119.96	1.00
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	2	pp2048	196.73	197.43	1.00
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	4	pp2048	303.89	304.92	1.00
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	8	pp2048	445.70	507.12	1.14
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	16	pp2048	634.63	804.82	1.27
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	32	pp2048	886.68	1027.74	1.16
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	64	pp2048	981.10	1445.47	1.47
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	128	pp2048	1510.05	1679.38	1.11
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	256	pp2048	2047.07	2016.22	0.98
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	512	pp2048	2012.02	2079.92	1.03
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	1024	pp2048	1945.28	2112.55	1.09
PRO W7900 Dual Slot	llama 8B IQ1_S - 1.5625 bpw	2048	pp2048	1976.88	2072.28	1.05
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	1	pp2048	92.63	92.72	1.00
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	2	pp2048	156.95	156.95	1.00
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	4	pp2048	254.64	254.45	1.00
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	8	pp2048	383.33	426.76	1.11
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	16	pp2048	535.59	458.25	0.86
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	32	pp2048	783.94	906.32	1.16
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	64	pp2048	945.33	1373.64	1.45
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	128	pp2048	1490.37	1465.06	0.98
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	256	pp2048	2016.36	1728.85	0.86
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	512	pp2048	1986.67	1824.80	0.92
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	1024	pp2048	1929.90	1853.21	0.96
PRO W7900 Dual Slot	llama 8B IQ2_S - 2.5 bpw	2048	pp2048	1953.60	1808.53	0.93
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	1	pp2048	95.74	95.98	1.00
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	2	pp2048	160.46	160.64	1.00
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	4	pp2048	255.90	256.54	1.00
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	8	pp2048	378.20	421.88	1.12
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	16	pp2048	547.54	453.34	0.83
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	32	pp2048	777.44	963.99	1.24
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	64	pp2048	948.91	1345.01	1.42
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	128	pp2048	1478.42	1429.39	0.97
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	256	pp2048	2027.36	1694.89	0.84
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	512	pp2048	1993.90	1788.80	0.90
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	1024	pp2048	1942.02	1827.91	0.94
PRO W7900 Dual Slot	llama 8B IQ2_XS - 2.3125 bpw	2048	pp2048	1974.40	1799.10	0.91
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	1	pp2048	79.60	79.68	1.00
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	2	pp2048	139.88	139.72	1.00
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	4	pp2048	235.12	235.01	1.00
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	8	pp2048	350.25	385.55	1.10
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	16	pp2048	525.61	639.74	1.22
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	32	pp2048	761.19	671.28	0.88
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	64	pp2048	962.85	1338.24	1.39
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	128	pp2048	1492.21	1716.92	1.15
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	256	pp2048	2033.28	1989.56	0.98
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	512	pp2048	1997.97	2053.91	1.03
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	1024	pp2048	1940.79	2083.00	1.07
PRO W7900 Dual Slot	llama 8B IQ2_XXS - 2.0625 bpw	2048	pp2048	1975.83	2040.61	1.03
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	1	pp2048	75.52	75.55	1.00
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	2	pp2048	134.05	134.03	1.00
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	4	pp2048	229.54	228.94	1.00
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	8	pp2048	366.06	404.29	1.10
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	16	pp2048	494.08	526.54	1.07
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	32	pp2048	757.72	587.11	0.77
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	64	pp2048	908.26	1357.06	1.49
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	128	pp2048	1452.44	1707.32	1.18
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	256	pp2048	2006.58	2000.21	1.00
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	512	pp2048	1974.82	2058.28	1.04
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	1024	pp2048	1924.67	2075.83	1.08
PRO W7900 Dual Slot	llama 8B IQ3_S - 3.4375 bpw	2048	pp2048	1952.66	2011.09	1.03
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	1	pp2048	74.91	74.97	1.00
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	2	pp2048	131.88	131.83	1.00
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	4	pp2048	221.86	221.47	1.00
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	8	pp2048	344.87	379.84	1.10
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	16	pp2048	504.66	554.38	1.10
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	32	pp2048	762.33	615.27	0.81
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	64	pp2048	915.10	1367.04	1.49
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	128	pp2048	1439.75	1701.67	1.18
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	256	pp2048	1990.07	2002.38	1.01
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	512	pp2048	1963.91	2056.56	1.05
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	1024	pp2048	1917.01	2072.95	1.08
PRO W7900 Dual Slot	llama 8B IQ3_S mix - 3.66 bpw	2048	pp2048	1947.19	2016.53	1.04
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	1	pp2048	82.83	82.38	0.99
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	2	pp2048	145.51	145.33	1.00
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	4	pp2048	244.41	243.89	1.00
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	8	pp2048	376.21	417.63	1.11
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	16	pp2048	525.57	593.25	1.13
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	32	pp2048	787.13	620.31	0.79
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	64	pp2048	909.63	1401.15	1.54
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	128	pp2048	1441.25	1717.14	1.19
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	256	pp2048	1992.22	1967.66	0.99
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	512	pp2048	1966.36	2002.64	1.02
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	1024	pp2048	1921.96	1950.32	1.01
PRO W7900 Dual Slot	llama 8B IQ3_XS - 3.3 bpw	2048	pp2048	1952.51	1808.72	0.93
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	1	pp2048	89.59	89.49	1.00
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	2	pp2048	154.29	154.07	1.00
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	4	pp2048	253.94	253.34	1.00
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	8	pp2048	376.96	418.83	1.11
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	16	pp2048	534.30	608.85	1.14
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	32	pp2048	795.20	672.18	0.85
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	64	pp2048	907.68	1411.46	1.56
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	128	pp2048	1435.77	1727.08	1.20
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	256	pp2048	1980.00	2039.01	1.03
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	512	pp2048	1959.33	2103.84	1.07
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	1024	pp2048	1916.08	2120.41	1.11
PRO W7900 Dual Slot	llama 8B IQ3_XXS - 3.0625 bpw	2048	pp2048	1952.05	2058.58	1.05
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	1	pp2048	89.84	90.14	1.00
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	2	pp2048	161.96	162.00	1.00
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	4	pp2048	283.82	284.01	1.00
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	8	pp2048	457.70	518.59	1.13
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	16	pp2048	621.81	795.53	1.28
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	32	pp2048	878.06	821.83	0.94
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	64	pp2048	878.89	1558.47	1.77
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	128	pp2048	1414.02	1858.45	1.31
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	256	pp2048	1975.53	2192.85	1.11
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	512	pp2048	1951.61	2243.64	1.15
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	1024	pp2048	1923.83	2260.91	1.18
PRO W7900 Dual Slot	llama 8B IQ4_NL - 4.5 bpw	2048	pp2048	1977.10	2204.06	1.11
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	1	pp2048	94.01	94.10	1.00
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	2	pp2048	170.10	170.09	1.00
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	4	pp2048	298.65	298.65	1.00
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	8	pp2048	466.09	532.42	1.14
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	16	pp2048	634.03	824.04	1.30
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	32	pp2048	886.40	597.30	0.67
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	64	pp2048	888.15	1574.37	1.77
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	128	pp2048	1432.87	1865.24	1.30
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	256	pp2048	1983.10	2191.29	1.10
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	512	pp2048	1957.77	2237.67	1.14
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	1024	pp2048	1927.76	2258.80	1.17
PRO W7900 Dual Slot	llama 8B IQ4_XS - 4.25 bpw	2048	pp2048	1966.60	2200.45	1.12
PRO W7900 Dual Slot	llama 8B Q2_K_S	1	pp2048	104.03	103.86	1.00
PRO W7900 Dual Slot	llama 8B Q2_K_S	2	pp2048	156.56	156.27	1.00
PRO W7900 Dual Slot	llama 8B Q2_K_S	4	pp2048	204.91	204.39	1.00
PRO W7900 Dual Slot	llama 8B Q2_K_S	8	pp2048	249.14	265.99	1.07
PRO W7900 Dual Slot	llama 8B Q2_K_S	16	pp2048	557.05	409.44	0.74
PRO W7900 Dual Slot	llama 8B Q2_K_S	32	pp2048	549.89	579.28	1.05
PRO W7900 Dual Slot	llama 8B Q2_K_S	64	pp2048	921.02	818.31	0.89
PRO W7900 Dual Slot	llama 8B Q2_K_S	128	pp2048	1452.22	1164.26	0.80
PRO W7900 Dual Slot	llama 8B Q2_K_S	256	pp2048	1991.42	1221.65	0.61
PRO W7900 Dual Slot	llama 8B Q2_K_S	512	pp2048	1985.17	1333.52	0.67
PRO W7900 Dual Slot	llama 8B Q2_K_S	1024	pp2048	1942.76	1435.10	0.74
PRO W7900 Dual Slot	llama 8B Q2_K_S	2048	pp2048	1975.16	1434.72	0.73
PRO W7900 Dual Slot	llama 8B Q3_K_S	1	pp2048	80.44	80.66	1.00
PRO W7900 Dual Slot	llama 8B Q3_K_S	2	pp2048	134.57	134.67	1.00
PRO W7900 Dual Slot	llama 8B Q3_K_S	4	pp2048	197.14	197.57	1.00
PRO W7900 Dual Slot	llama 8B Q3_K_S	8	pp2048	246.28	264.20	1.07
PRO W7900 Dual Slot	llama 8B Q3_K_S	16	pp2048	539.32	469.68	0.87
PRO W7900 Dual Slot	llama 8B Q3_K_S	32	pp2048	756.66	1011.50	1.34
PRO W7900 Dual Slot	llama 8B Q3_K_S	64	pp2048	832.07	1378.54	1.66
PRO W7900 Dual Slot	llama 8B Q3_K_S	128	pp2048	1331.93	1660.21	1.25
PRO W7900 Dual Slot	llama 8B Q3_K_S	256	pp2048	1834.19	1937.16	1.06
PRO W7900 Dual Slot	llama 8B Q3_K_S	512	pp2048	1932.29	1994.61	1.03
PRO W7900 Dual Slot	llama 8B Q3_K_S	1024	pp2048	1930.39	2013.32	1.04
PRO W7900 Dual Slot	llama 8B Q3_K_S	2048	pp2048	1971.61	1971.32	1.00
PRO W7900 Dual Slot	llama 8B Q4_0	1	pp2048	92.59	92.85	1.00
PRO W7900 Dual Slot	llama 8B Q4_0	2	pp2048	167.66	168.00	1.00
PRO W7900 Dual Slot	llama 8B Q4_0	4	pp2048	293.21	293.11	1.00
PRO W7900 Dual Slot	llama 8B Q4_0	8	pp2048	477.35	545.22	1.14
PRO W7900 Dual Slot	llama 8B Q4_0	16	pp2048	623.04	764.53	1.23
PRO W7900 Dual Slot	llama 8B Q4_0	32	pp2048	906.53	640.59	0.71
PRO W7900 Dual Slot	llama 8B Q4_0	64	pp2048	895.68	1526.66	1.70
PRO W7900 Dual Slot	llama 8B Q4_0	128	pp2048	1437.32	1862.50	1.30
PRO W7900 Dual Slot	llama 8B Q4_0	256	pp2048	2009.97	2178.61	1.08
PRO W7900 Dual Slot	llama 8B Q4_0	512	pp2048	1979.84	2221.04	1.12
PRO W7900 Dual Slot	llama 8B Q4_0	1024	pp2048	1933.14	2240.34	1.16
PRO W7900 Dual Slot	llama 8B Q4_0	2048	pp2048	1990.05	2178.16	1.09
PRO W7900 Dual Slot	llama 8B Q4_1	1	pp2048	87.30	87.02	1.00
PRO W7900 Dual Slot	llama 8B Q4_1	2	pp2048	159.37	158.97	1.00
PRO W7900 Dual Slot	llama 8B Q4_1	4	pp2048	274.23	274.11	1.00
PRO W7900 Dual Slot	llama 8B Q4_1	8	pp2048	478.54	545.67	1.14
PRO W7900 Dual Slot	llama 8B Q4_1	16	pp2048	611.39	807.16	1.32
PRO W7900 Dual Slot	llama 8B Q4_1	32	pp2048	927.59	1118.95	1.21
PRO W7900 Dual Slot	llama 8B Q4_1	64	pp2048	877.06	1502.57	1.71
PRO W7900 Dual Slot	llama 8B Q4_1	128	pp2048	1416.30	1653.30	1.17
PRO W7900 Dual Slot	llama 8B Q4_1	256	pp2048	1989.90	2017.58	1.01
PRO W7900 Dual Slot	llama 8B Q4_1	512	pp2048	1965.18	2079.99	1.06
PRO W7900 Dual Slot	llama 8B Q4_1	1024	pp2048	1923.68	2116.16	1.10
PRO W7900 Dual Slot	llama 8B Q4_1	2048	pp2048	1982.31	2079.10	1.05
PRO W7900 Dual Slot	llama 8B Q4_K_S	1	pp2048	78.77	78.62	1.00
PRO W7900 Dual Slot	llama 8B Q4_K_S	2	pp2048	126.64	126.48	1.00
PRO W7900 Dual Slot	llama 8B Q4_K_S	4	pp2048	188.75	188.74	1.00
PRO W7900 Dual Slot	llama 8B Q4_K_S	8	pp2048	252.05	270.04	1.07
PRO W7900 Dual Slot	llama 8B Q4_K_S	16	pp2048	590.43	812.30	1.38
PRO W7900 Dual Slot	llama 8B Q4_K_S	32	pp2048	807.23	1085.64	1.34
PRO W7900 Dual Slot	llama 8B Q4_K_S	64	pp2048	870.47	1480.21	1.70
PRO W7900 Dual Slot	llama 8B Q4_K_S	128	pp2048	1396.57	1673.81	1.20
PRO W7900 Dual Slot	llama 8B Q4_K_S	256	pp2048	1931.75	2052.97	1.06
PRO W7900 Dual Slot	llama 8B Q4_K_S	512	pp2048	1957.64	2109.49	1.08
PRO W7900 Dual Slot	llama 8B Q4_K_S	1024	pp2048	1931.27	2142.37	1.11
PRO W7900 Dual Slot	llama 8B Q4_K_S	2048	pp2048	1978.30	2106.29	1.06
PRO W7900 Dual Slot	llama 8B Q5_1	1	pp2048	70.60	70.69	1.00
PRO W7900 Dual Slot	llama 8B Q5_1	2	pp2048	128.57	128.67	1.00
PRO W7900 Dual Slot	llama 8B Q5_1	4	pp2048	231.59	231.96	1.00
PRO W7900 Dual Slot	llama 8B Q5_1	8	pp2048	428.61	482.33	1.13
PRO W7900 Dual Slot	llama 8B Q5_1	16	pp2048	495.18	578.44	1.17
PRO W7900 Dual Slot	llama 8B Q5_1	32	pp2048	768.89	874.56	1.14
PRO W7900 Dual Slot	llama 8B Q5_1	64	pp2048	757.85	1286.75	1.70
PRO W7900 Dual Slot	llama 8B Q5_1	128	pp2048	1237.01	1487.90	1.20
PRO W7900 Dual Slot	llama 8B Q5_1	256	pp2048	1733.45	1885.56	1.09
PRO W7900 Dual Slot	llama 8B Q5_1	512	pp2048	1873.29	1974.93	1.05
PRO W7900 Dual Slot	llama 8B Q5_1	1024	pp2048	1907.48	2027.97	1.06
PRO W7900 Dual Slot	llama 8B Q5_1	2048	pp2048	1960.31	1995.51	1.02
PRO W7900 Dual Slot	llama 8B Q5_K_S	1	pp2048	73.36	73.33	1.00
PRO W7900 Dual Slot	llama 8B Q5_K_S	2	pp2048	121.67	121.67	1.00
PRO W7900 Dual Slot	llama 8B Q5_K_S	4	pp2048	184.13	183.70	1.00
PRO W7900 Dual Slot	llama 8B Q5_K_S	8	pp2048	246.71	263.30	1.07
PRO W7900 Dual Slot	llama 8B Q5_K_S	16	pp2048	554.75	829.08	1.49
PRO W7900 Dual Slot	llama 8B Q5_K_S	32	pp2048	699.59	1116.24	1.60
PRO W7900 Dual Slot	llama 8B Q5_K_S	64	pp2048	818.27	1498.33	1.83
PRO W7900 Dual Slot	llama 8B Q5_K_S	128	pp2048	1332.58	1616.54	1.21
PRO W7900 Dual Slot	llama 8B Q5_K_S	256	pp2048	1843.06	1968.65	1.07
PRO W7900 Dual Slot	llama 8B Q5_K_S	512	pp2048	1928.32	2027.76	1.05
PRO W7900 Dual Slot	llama 8B Q5_K_S	1024	pp2048	1934.12	2064.67	1.07
PRO W7900 Dual Slot	llama 8B Q5_K_S	2048	pp2048	1971.71	2026.68	1.03
PRO W7900 Dual Slot	llama 8B Q6_K	1	pp2048	69.29	69.18	1.00
PRO W7900 Dual Slot	llama 8B Q6_K	2	pp2048	123.13	123.00	1.00
PRO W7900 Dual Slot	llama 8B Q6_K	4	pp2048	197.96	197.34	1.00
PRO W7900 Dual Slot	llama 8B Q6_K	8	pp2048	286.01	306.88	1.07
PRO W7900 Dual Slot	llama 8B Q6_K	16	pp2048	498.59	623.24	1.25
PRO W7900 Dual Slot	llama 8B Q6_K	32	pp2048	648.44	813.97	1.26
PRO W7900 Dual Slot	llama 8B Q6_K	64	pp2048	834.21	1084.13	1.30
PRO W7900 Dual Slot	llama 8B Q6_K	128	pp2048	1364.80	1094.80	0.80
PRO W7900 Dual Slot	llama 8B Q6_K	256	pp2048	1897.44	1322.89	0.70
PRO W7900 Dual Slot	llama 8B Q6_K	512	pp2048	1930.36	1450.17	0.75
PRO W7900 Dual Slot	llama 8B Q6_K	1024	pp2048	1918.10	1470.46	0.77
PRO W7900 Dual Slot	llama 8B Q6_K	2048	pp2048	1973.29	1488.04	0.75
PRO W7900 Dual Slot	llama 8B Q8_0	1	pp2048	62.95	63.04	1.00
PRO W7900 Dual Slot	llama 8B Q8_0	2	pp2048	115.78	115.89	1.00
PRO W7900 Dual Slot	llama 8B Q8_0	4	pp2048	210.63	210.77	1.00
PRO W7900 Dual Slot	llama 8B Q8_0	8	pp2048	383.93	422.00	1.10
PRO W7900 Dual Slot	llama 8B Q8_0	16	pp2048	576.38	717.52	1.24
PRO W7900 Dual Slot	llama 8B Q8_0	32	pp2048	864.31	619.38	0.72
PRO W7900 Dual Slot	llama 8B Q8_0	64	pp2048	789.64	1476.07	1.87
PRO W7900 Dual Slot	llama 8B Q8_0	128	pp2048	1295.11	1815.55	1.40
PRO W7900 Dual Slot	llama 8B Q8_0	256	pp2048	1874.73	2173.67	1.16
PRO W7900 Dual Slot	llama 8B Q8_0	512	pp2048	1891.10	2227.88	1.18
PRO W7900 Dual Slot	llama 8B Q8_0	1024	pp2048	1902.79	2251.06	1.18
PRO W7900 Dual Slot	llama 8B Q8_0	2048	pp2048	1964.10	2197.58	1.12

All quantization performance result for AMD Strix Halo (gfx1151)

GPU	Model	Microbatch size	Test	t/s master	t/s mmq_feature_branch	Speedup
Graphics	llama 8B IQ1_S - 1.5625 bpw	1	pp2048	59.83	59.90	1.00
Graphics	llama 8B IQ1_S - 1.5625 bpw	2	pp2048	104.71	104.66	1.00
Graphics	llama 8B IQ1_S - 1.5625 bpw	4	pp2048	163.47	163.36	1.00
Graphics	llama 8B IQ1_S - 1.5625 bpw	8	pp2048	217.52	269.92	1.24
Graphics	llama 8B IQ1_S - 1.5625 bpw	16	pp2048	389.90	537.71	1.38
Graphics	llama 8B IQ1_S - 1.5625 bpw	32	pp2048	532.61	600.36	1.13
Graphics	llama 8B IQ1_S - 1.5625 bpw	64	pp2048	252.24	770.75	3.06
Graphics	llama 8B IQ1_S - 1.5625 bpw	128	pp2048	436.51	854.21	1.96
Graphics	llama 8B IQ1_S - 1.5625 bpw	256	pp2048	604.08	914.89	1.51
Graphics	llama 8B IQ1_S - 1.5625 bpw	512	pp2048	730.25	891.78	1.22
Graphics	llama 8B IQ1_S - 1.5625 bpw	1024	pp2048	808.51	888.27	1.10
Graphics	llama 8B IQ1_S - 1.5625 bpw	2048	pp2048	784.05	807.68	1.03
Graphics	llama 8B IQ2_XS - 2.3125 bpw	1	pp2048	49.70	49.49	1.00
Graphics	llama 8B IQ2_XS - 2.3125 bpw	2	pp2048	88.14	87.73	1.00
Graphics	llama 8B IQ2_XS - 2.3125 bpw	4	pp2048	139.02	138.65	1.00
Graphics	llama 8B IQ2_XS - 2.3125 bpw	8	pp2048	178.29	212.21	1.19
Graphics	llama 8B IQ2_XS - 2.3125 bpw	16	pp2048	324.40	304.76	0.94
Graphics	llama 8B IQ2_XS - 2.3125 bpw	32	pp2048	456.98	550.06	1.20
Graphics	llama 8B IQ2_XS - 2.3125 bpw	64	pp2048	250.13	703.03	2.81
Graphics	llama 8B IQ2_XS - 2.3125 bpw	128	pp2048	433.34	737.41	1.70
Graphics	llama 8B IQ2_XS - 2.3125 bpw	256	pp2048	602.11	789.88	1.31
Graphics	llama 8B IQ2_XS - 2.3125 bpw	512	pp2048	727.09	781.97	1.08
Graphics	llama 8B IQ2_XS - 2.3125 bpw	1024	pp2048	806.37	782.58	0.97
Graphics	llama 8B IQ2_XS - 2.3125 bpw	2048	pp2048	782.47	724.99	0.93
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	1	pp2048	41.54	41.38	1.00
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	2	pp2048	75.11	74.86	1.00
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	4	pp2048	123.87	123.62	1.00
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	8	pp2048	164.17	191.86	1.17
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	16	pp2048	313.88	380.15	1.21
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	32	pp2048	457.55	419.77	0.92
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	64	pp2048	251.35	715.39	2.85
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	128	pp2048	435.15	832.72	1.91
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	256	pp2048	605.74	895.57	1.48
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	512	pp2048	714.66	876.64	1.23
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	1024	pp2048	807.75	871.60	1.08
Graphics	llama 8B IQ2_XXS - 2.0625 bpw	2048	pp2048	781.91	798.25	1.02
Graphics	llama 8B Q2_K_S	1	pp2048	51.50	51.49	1.00
Graphics	llama 8B Q2_K_S	2	pp2048	82.63	82.75	1.00
Graphics	llama 8B Q2_K_S	4	pp2048	106.98	107.28	1.00
Graphics	llama 8B Q2_K_S	8	pp2048	115.64	129.42	1.12
Graphics	llama 8B Q2_K_S	16	pp2048	336.98	257.13	0.76
Graphics	llama 8B Q2_K_S	32	pp2048	323.61	360.17	1.11
Graphics	llama 8B Q2_K_S	64	pp2048	249.79	494.88	1.98
Graphics	llama 8B Q2_K_S	128	pp2048	432.75	544.20	1.26
Graphics	llama 8B Q2_K_S	256	pp2048	598.52	585.13	0.98
Graphics	llama 8B Q2_K_S	512	pp2048	729.62	602.31	0.83
Graphics	llama 8B Q2_K_S	1024	pp2048	800.13	627.29	0.78
Graphics	llama 8B Q2_K_S	2048	pp2048	778.95	591.66	0.76
Graphics	llama 8B Q3_K_S	1	pp2048	42.14	42.14	1.00
Graphics	llama 8B Q3_K_S	2	pp2048	72.14	71.93	1.00
Graphics	llama 8B Q3_K_S	4	pp2048	103.72	103.01	0.99
Graphics	llama 8B Q3_K_S	8	pp2048	115.52	128.04	1.11
Graphics	llama 8B Q3_K_S	16	pp2048	326.11	313.91	0.96
Graphics	llama 8B Q3_K_S	32	pp2048	442.67	589.17	1.33
Graphics	llama 8B Q3_K_S	64	pp2048	243.59	736.42	3.02
Graphics	llama 8B Q3_K_S	128	pp2048	423.89	864.64	2.04
Graphics	llama 8B Q3_K_S	256	pp2048	590.88	923.04	1.56
Graphics	llama 8B Q3_K_S	512	pp2048	717.73	912.07	1.27
Graphics	llama 8B Q3_K_S	1024	pp2048	799.04	897.25	1.12
Graphics	llama 8B Q3_K_S	2048	pp2048	775.37	812.76	1.05
Graphics	llama 8B Q4_0	1	pp2048	39.70	39.79	1.00
Graphics	llama 8B Q4_0	2	pp2048	76.18	76.37	1.00
Graphics	llama 8B Q4_0	4	pp2048	139.13	139.35	1.00
Graphics	llama 8B Q4_0	8	pp2048	226.14	283.05	1.25
Graphics	llama 8B Q4_0	16	pp2048	374.19	494.95	1.32
Graphics	llama 8B Q4_0	32	pp2048	509.56	379.15	0.74
Graphics	llama 8B Q4_0	64	pp2048	243.16	808.46	3.32
Graphics	llama 8B Q4_0	128	pp2048	423.11	916.38	2.17
Graphics	llama 8B Q4_0	256	pp2048	590.42	979.33	1.66
Graphics	llama 8B Q4_0	512	pp2048	726.45	955.42	1.32
Graphics	llama 8B Q4_0	1024	pp2048	808.57	943.67	1.17
Graphics	llama 8B Q4_0	2048	pp2048	779.26	855.14	1.10
Graphics	llama 8B Q4_1	1	pp2048	36.48	36.52	1.00
Graphics	llama 8B Q4_1	2	pp2048	71.08	71.24	1.00
Graphics	llama 8B Q4_1	4	pp2048	131.92	132.17	1.00
Graphics	llama 8B Q4_1	8	pp2048	219.34	274.38	1.25
Graphics	llama 8B Q4_1	16	pp2048	341.72	486.38	1.42
Graphics	llama 8B Q4_1	32	pp2048	491.51	623.27	1.27
Graphics	llama 8B Q4_1	64	pp2048	241.80	791.80	3.27
Graphics	llama 8B Q4_1	128	pp2048	420.49	815.73	1.94
Graphics	llama 8B Q4_1	256	pp2048	589.14	880.82	1.50
Graphics	llama 8B Q4_1	512	pp2048	722.50	872.68	1.21
Graphics	llama 8B Q4_1	1024	pp2048	804.07	867.01	1.08
Graphics	llama 8B Q4_1	2048	pp2048	776.24	799.71	1.03
Graphics	llama 8B Q4_K_S	1	pp2048	36.85	36.88	1.00
Graphics	llama 8B Q4_K_S	2	pp2048	62.35	62.45	1.00
Graphics	llama 8B Q4_K_S	4	pp2048	94.11	94.07	1.00
Graphics	llama 8B Q4_K_S	8	pp2048	118.60	133.25	1.12
Graphics	llama 8B Q5_1	1	pp2048	30.45	30.47	1.00
Graphics	llama 8B Q5_1	2	pp2048	59.52	59.56	1.00
Graphics	llama 8B Q5_1	4	pp2048	112.25	112.25	1.00
Graphics	llama 8B Q5_1	8	pp2048	191.34	232.75	1.22
Graphics	llama 8B Q5_1	16	pp2048	267.52	342.23	1.28
Graphics	llama 8B Q5_1	32	pp2048	447.40	506.85	1.13
Graphics	llama 8B Q5_1	64	pp2048	228.36	696.09	3.05
Graphics	llama 8B Q5_1	128	pp2048	400.89	813.09	2.03
Graphics	llama 8B Q5_1	256	pp2048	566.88	884.65	1.56
Graphics	llama 8B Q5_1	512	pp2048	705.10	880.81	1.25
Graphics	llama 8B Q5_1	1024	pp2048	793.35	876.75	1.11
Graphics	llama 8B Q5_1	2048	pp2048	769.48	803.44	1.04
Graphics	llama 8B Q5_K_S	1	pp2048	33.36	33.34	1.00
Graphics	llama 8B Q5_K_S	2	pp2048	58.02	57.86	1.00
Graphics	llama 8B Q5_K_S	4	pp2048	90.37	90.15	1.00
Graphics	llama 8B Q5_K_S	8	pp2048	115.17	128.41	1.11
Graphics	llama 8B Q5_K_S	16	pp2048	328.82	508.01	1.54
Graphics	llama 8B Q5_K_S	32	pp2048	431.21	615.53	1.43
Graphics	llama 8B Q5_K_S	64	pp2048	235.29	792.11	3.37
Graphics	llama 8B Q5_K_S	128	pp2048	411.76	864.48	2.10
Graphics	llama 8B Q5_K_S	256	pp2048	578.32	919.98	1.59
Graphics	llama 8B Q5_K_S	512	pp2048	717.19	908.41	1.27
Graphics	llama 8B Q5_K_S	1024	pp2048	799.98	900.27	1.13
Graphics	llama 8B Q5_K_S	2048	pp2048	778.09	820.54	1.05
Graphics	llama 8B Q6_K	1	pp2048	29.58	29.51	1.00
Graphics	llama 8B Q6_K	2	pp2048	56.47	56.37	1.00
Graphics	llama 8B Q6_K	4	pp2048	102.88	102.60	1.00
Graphics	llama 8B Q6_K	8	pp2048	143.89	163.08	1.13
Graphics	llama 8B Q6_K	16	pp2048	294.77	383.68	1.30
Graphics	llama 8B Q6_K	32	pp2048	392.14	469.49	1.20
Graphics	llama 8B Q6_K	64	pp2048	238.05	592.30	2.49
Graphics	llama 8B Q6_K	128	pp2048	415.60	620.97	1.49
Graphics	llama 8B Q6_K	256	pp2048	582.53	658.80	1.13
Graphics	llama 8B Q6_K	512	pp2048	719.82	654.72	0.91
Graphics	llama 8B Q6_K	1024	pp2048	796.31	663.25	0.83
Graphics	llama 8B Q6_K	2048	pp2048	773.72	624.57	0.81
Graphics	llama 8B Q8_0	1	pp2048	25.02	25.00	1.00
Graphics	llama 8B Q8_0	2	pp2048	48.78	48.77	1.00
Graphics	llama 8B Q8_0	4	pp2048	92.37	92.37	1.00
Graphics	llama 8B Q8_0	8	pp2048	166.97	198.40	1.19
Graphics	llama 8B Q8_0	16	pp2048	303.03	364.41	1.20
Graphics	llama 8B Q8_0	32	pp2048	484.44	374.38	0.77
Graphics	llama 8B Q8_0	64	pp2048	227.11	750.26	3.30
Graphics	llama 8B Q8_0	128	pp2048	398.25	873.07	2.19
Graphics	llama 8B Q8_0	256	pp2048	564.91	941.85	1.67
Graphics	llama 8B Q8_0	512	pp2048	705.53	925.05	1.31
Graphics	llama 8B Q8_0	1024	pp2048	786.56	915.31	1.16
Graphics	llama 8B Q8_0	2048	pp2048	773.68	833.31	1.08

JohannesGaessler · 2025-12-01T22:05:14Z

ggml/src/ggml-cuda/mma.cuh

        static constexpr int ne = I * J / 32;
+#elif defined(RDNA3)
+        static constexpr int ne = (I == 16 && J == 16) ? I * J / 32 : I * J / 16;
+#endif


Suggested change

#endif

#endif // defined(RDNA4)

Please add comments to indicate which #if/#ifdef and #endif is closing.

JohannesGaessler · 2025-12-01T22:09:51Z

ggml/src/ggml-cuda/mmq.cu

+        if (GGML_CUDA_CC_IS_RDNA4(cc) || GGML_CUDA_CC_IS_RDNA3(cc)) {
            return true;
        }


Suggested change

if (GGML_CUDA_CC_IS_RDNA4(cc) || GGML_CUDA_CC_IS_RDNA3(cc)) {

return true;

}

return true;

JohannesGaessler · 2025-12-01T22:14:18Z

ggml/src/ggml-cuda/mmq.cuh

                A1.x[0] = 0x01010101;
                A1.x[1] = 0x01010101;
+                A1.x[2] = 0x01010101;
+                A1.x[3] = 0x01010101;


Suggested change

A1.x[0] = 0x01010101;

A1.x[1] = 0x01010101;

A1.x[2] = 0x01010101;

A1.x[3] = 0x01010101;

#pragma unroll

for (int l = 0; l < tile_A::ne; ++l) {

A1.x[l] = 0x01010101;

}

To my understanding tile_A has 4 elements for RDNA3 but for RDNA4 it only has 2 elements. So as it is this would result in out-of-bounds writes and potential memory trampling for RDNA4.

JohannesGaessler · 2025-12-03T10:59:53Z

Performance

GPU	Model	Microbatch size	Test	t/s b7157	t/s `5fbd8f5`	Speedup
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	1	pp2048	64.59	64.41	1.00
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	2	pp2048	110.14	109.52	0.99
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	4	pp2048	170.87	169.85	0.99
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	8	pp2048	222.56	221.16	0.99
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	16	pp2048	395.01	417.50	1.06
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	32	pp2048	522.67	599.04	1.15
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	64	pp2048	118.49	827.85	6.99
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	128	pp2048	181.98	946.77	5.20
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	256	pp2048	235.88	997.65	4.23
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	512	pp2048	242.24	1017.22	4.20
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	1024	pp2048	239.11	1059.32	4.43
Radeon 8060S Graphics	llama 8B IQ1_S - 1.5625 bpw	2048	pp2048	231.05	1064.40	4.61
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	1	pp2048	50.21	49.96	1.00
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	2	pp2048	87.32	87.10	1.00
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	4	pp2048	141.93	141.21	0.99
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	8	pp2048	175.84	175.23	1.00
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	16	pp2048	315.11	259.71	0.82
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	32	pp2048	455.38	529.14	1.16
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	64	pp2048	92.41	758.04	8.20
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	128	pp2048	167.63	809.04	4.83
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	256	pp2048	219.98	847.20	3.85
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	512	pp2048	239.01	855.62	3.58
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	1024	pp2048	233.00	898.96	3.86
Radeon 8060S Graphics	llama 8B IQ2_S - 2.5 bpw	2048	pp2048	233.02	905.77	3.89
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	1	pp2048	52.65	52.08	0.99
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	2	pp2048	89.97	89.90	1.00
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	4	pp2048	143.33	142.96	1.00
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	8	pp2048	173.30	172.72	1.00
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	16	pp2048	322.15	255.08	0.79
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	32	pp2048	454.40	541.08	1.19
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	64	pp2048	104.02	748.76	7.20
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	128	pp2048	185.61	783.51	4.22
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	256	pp2048	236.47	824.43	3.49
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	512	pp2048	237.27	837.71	3.53
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	1024	pp2048	228.08	871.67	3.82
Radeon 8060S Graphics	llama 8B IQ2_XS - 2.3125 bpw	2048	pp2048	232.11	879.28	3.79
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	1	pp2048	44.19	44.41	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	2	pp2048	78.58	78.43	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	4	pp2048	131.48	131.00	1.00
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	8	pp2048	160.49	159.67	0.99
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	16	pp2048	314.37	306.76	0.98
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	32	pp2048	450.01	438.71	0.97
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	64	pp2048	92.97	771.88	8.30
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	128	pp2048	172.90	927.74	5.37
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	256	pp2048	233.91	975.01	4.17
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	512	pp2048	236.49	990.81	4.19
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	1024	pp2048	240.16	1046.67	4.36
Radeon 8060S Graphics	llama 8B IQ2_XXS - 2.0625 bpw	2048	pp2048	232.69	1051.67	4.52
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	1	pp2048	40.70	40.85	1.00
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	2	pp2048	74.26	74.87	1.01
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	4	pp2048	127.92	128.68	1.01
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	8	pp2048	166.47	167.72	1.01
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	16	pp2048	303.98	285.38	0.94
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	32	pp2048	440.40	421.93	0.96
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	64	pp2048	85.71	757.60	8.84
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	128	pp2048	185.50	937.98	5.06
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	256	pp2048	239.80	987.76	4.12
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	512	pp2048	239.71	994.13	4.15
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	1024	pp2048	233.88	1046.17	4.47
Radeon 8060S Graphics	llama 8B IQ3_S - 3.4375 bpw	2048	pp2048	228.90	1050.69	4.59
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	1	pp2048	40.51	40.34	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	2	pp2048	73.22	72.52	0.99
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	4	pp2048	122.21	121.83	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	8	pp2048	156.81	156.42	1.00
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	16	pp2048	307.89	296.61	0.96
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	32	pp2048	442.49	437.80	0.99
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	64	pp2048	96.38	765.61	7.94
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	128	pp2048	170.83	942.33	5.52
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	256	pp2048	233.54	982.29	4.21
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	512	pp2048	246.83	995.86	4.03
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	1024	pp2048	233.84	1047.68	4.48
Radeon 8060S Graphics	llama 8B IQ3_S mix - 3.66 bpw	2048	pp2048	235.51	1051.74	4.47
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	1	pp2048	44.42	44.29	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	2	pp2048	80.10	79.81	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	4	pp2048	135.35	135.02	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	8	pp2048	171.64	171.61	1.00
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	16	pp2048	315.72	310.39	0.98
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	32	pp2048	456.56	437.37	0.96
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	64	pp2048	103.47	782.05	7.56
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	128	pp2048	179.06	960.70	5.37
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	256	pp2048	233.07	1011.86	4.34
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	512	pp2048	238.73	1021.87	4.28
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	1024	pp2048	231.24	1071.48	4.63
Radeon 8060S Graphics	llama 8B IQ3_XS - 3.3 bpw	2048	pp2048	232.21	1076.43	4.64
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	1	pp2048	47.47	47.65	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	2	pp2048	84.05	83.67	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	4	pp2048	138.70	138.05	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	8	pp2048	172.52	172.28	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	16	pp2048	322.95	330.15	1.02
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	32	pp2048	463.75	463.10	1.00
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	64	pp2048	90.30	788.59	8.73
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	128	pp2048	161.00	963.40	5.98
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	256	pp2048	233.66	1013.81	4.34
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	512	pp2048	228.25	1031.85	4.52
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	1024	pp2048	222.06	1077.84	4.85
Radeon 8060S Graphics	llama 8B IQ3_XXS - 3.0625 bpw	2048	pp2048	234.88	1081.43	4.60
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	1	pp2048	43.04	42.85	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	2	pp2048	79.87	79.55	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	4	pp2048	146.87	146.48	1.00
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	8	pp2048	227.10	225.45	0.99
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	16	pp2048	376.57	388.85	1.03
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	32	pp2048	516.50	465.42	0.90
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	64	pp2048	100.32	897.32	8.94
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	128	pp2048	176.43	1022.42	5.79
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	256	pp2048	236.72	1085.21	4.58
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	512	pp2048	233.09	1102.73	4.73
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	1024	pp2048	239.59	1150.19	4.80
Radeon 8060S Graphics	llama 8B IQ4_NL - 4.5 bpw	2048	pp2048	235.59	1152.76	4.89
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	1	pp2048	45.99	46.14	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	2	pp2048	84.66	84.34	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	4	pp2048	155.84	155.24	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	8	pp2048	238.08	237.66	1.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	16	pp2048	377.13	414.81	1.10
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	32	pp2048	514.71	357.72	0.69
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	64	pp2048	79.21	899.88	11.36
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	128	pp2048	160.59	1036.62	6.46
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	256	pp2048	229.90	1098.13	4.78
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	512	pp2048	224.19	1120.28	5.00
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	1024	pp2048	230.33	1171.51	5.09
Radeon 8060S Graphics	llama 8B IQ4_XS - 4.25 bpw	2048	pp2048	234.13	1165.46	4.98
Radeon 8060S Graphics	llama 8B Q2_K_S	1	pp2048	55.16	55.00	1.00
Radeon 8060S Graphics	llama 8B Q2_K_S	2	pp2048	81.95	82.30	1.00
Radeon 8060S Graphics	llama 8B Q2_K_S	4	pp2048	109.68	109.53	1.00
Radeon 8060S Graphics	llama 8B Q2_K_S	8	pp2048	107.26	108.30	1.01
Radeon 8060S Graphics	llama 8B Q2_K_S	16	pp2048	331.73	216.63	0.65
Radeon 8060S Graphics	llama 8B Q2_K_S	32	pp2048	308.48	364.22	1.18
Radeon 8060S Graphics	llama 8B Q2_K_S	64	pp2048	125.82	503.25	4.00
Radeon 8060S Graphics	llama 8B Q2_K_S	128	pp2048	173.65	553.74	3.19
Radeon 8060S Graphics	llama 8B Q2_K_S	256	pp2048	231.79	590.91	2.55
Radeon 8060S Graphics	llama 8B Q2_K_S	512	pp2048	241.82	614.09	2.54
Radeon 8060S Graphics	llama 8B Q2_K_S	1024	pp2048	233.63	666.47	2.85
Radeon 8060S Graphics	llama 8B Q2_K_S	2048	pp2048	235.24	672.05	2.86
Radeon 8060S Graphics	llama 8B Q3_K_S	1	pp2048	43.11	43.27	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	2	pp2048	69.03	69.17	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	4	pp2048	77.68	77.88	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	8	pp2048	104.11	104.41	1.00
Radeon 8060S Graphics	llama 8B Q3_K_S	16	pp2048	319.78	269.45	0.84
Radeon 8060S Graphics	llama 8B Q3_K_S	32	pp2048	432.20	580.94	1.34
Radeon 8060S Graphics	llama 8B Q3_K_S	64	pp2048	89.66	779.82	8.70
Radeon 8060S Graphics	llama 8B Q3_K_S	128	pp2048	180.65	948.71	5.25
Radeon 8060S Graphics	llama 8B Q3_K_S	256	pp2048	232.26	997.47	4.29
Radeon 8060S Graphics	llama 8B Q3_K_S	512	pp2048	229.18	1025.74	4.48
Radeon 8060S Graphics	llama 8B Q3_K_S	1024	pp2048	241.87	1068.25	4.42
Radeon 8060S Graphics	llama 8B Q3_K_S	2048	pp2048	235.83	1070.42	4.54
Radeon 8060S Graphics	llama 8B Q4_0	1	pp2048	44.05	44.22	1.00
Radeon 8060S Graphics	llama 8B Q4_0	2	pp2048	81.13	80.86	1.00
Radeon 8060S Graphics	llama 8B Q4_0	4	pp2048	150.12	149.24	0.99
Radeon 8060S Graphics	llama 8B Q4_0	8	pp2048	227.02	218.37	0.96
Radeon 8060S Graphics	llama 8B Q4_0	16	pp2048	368.63	377.51	1.02
Radeon 8060S Graphics	llama 8B Q4_0	32	pp2048	492.68	328.47	0.67
Radeon 8060S Graphics	llama 8B Q4_0	64	pp2048	96.02	827.55	8.62
Radeon 8060S Graphics	llama 8B Q4_0	128	pp2048	176.73	975.52	5.52
Radeon 8060S Graphics	llama 8B Q4_0	256	pp2048	236.38	1027.45	4.35
Radeon 8060S Graphics	llama 8B Q4_0	512	pp2048	241.32	1050.01	4.35
Radeon 8060S Graphics	llama 8B Q4_0	1024	pp2048	232.06	1091.70	4.70
Radeon 8060S Graphics	llama 8B Q4_0	2048	pp2048	219.58	1090.36	4.97
Radeon 8060S Graphics	llama 8B Q4_1	1	pp2048	40.32	39.83	0.99
Radeon 8060S Graphics	llama 8B Q4_1	2	pp2048	76.13	74.88	0.98
Radeon 8060S Graphics	llama 8B Q4_1	4	pp2048	142.82	140.18	0.98
Radeon 8060S Graphics	llama 8B Q4_1	8	pp2048	229.88	220.70	0.96
Radeon 8060S Graphics	llama 8B Q4_1	16	pp2048	344.82	389.91	1.13
Radeon 8060S Graphics	llama 8B Q4_1	32	pp2048	481.18	592.29	1.23
Radeon 8060S Graphics	llama 8B Q4_1	64	pp2048	92.41	806.43	8.73
Radeon 8060S Graphics	llama 8B Q4_1	128	pp2048	182.70	852.10	4.66
Radeon 8060S Graphics	llama 8B Q4_1	256	pp2048	227.37	905.45	3.98
Radeon 8060S Graphics	llama 8B Q4_1	512	pp2048	247.61	913.52	3.69
Radeon 8060S Graphics	llama 8B Q4_1	1024	pp2048	239.45	977.86	4.08
Radeon 8060S Graphics	llama 8B Q4_1	2048	pp2048	228.53	985.37	4.31
Radeon 8060S Graphics	llama 8B Q4_K_S	1	pp2048	36.91	36.97	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	2	pp2048	59.71	59.64	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	4	pp2048	90.77	90.74	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	8	pp2048	111.25	111.11	1.00
Radeon 8060S Graphics	llama 8B Q4_K_S	16	pp2048	348.88	400.24	1.15
Radeon 8060S Graphics	llama 8B Q4_K_S	32	pp2048	456.45	596.97	1.31
Radeon 8060S Graphics	llama 8B Q4_K_S	64	pp2048	103.69	829.45	8.00
Radeon 8060S Graphics	llama 8B Q4_K_S	128	pp2048	173.50	956.40	5.51
Radeon 8060S Graphics	llama 8B Q4_K_S	256	pp2048	231.08	1003.89	4.34
Radeon 8060S Graphics	llama 8B Q4_K_S	512	pp2048	240.33	1024.19	4.26
Radeon 8060S Graphics	llama 8B Q4_K_S	1024	pp2048	237.83	1082.11	4.55
Radeon 8060S Graphics	llama 8B Q4_K_S	2048	pp2048	234.01	1083.16	4.63
Radeon 8060S Graphics	llama 8B Q5_0	1	pp2048	38.27	38.19	1.00
Radeon 8060S Graphics	llama 8B Q5_0	2	pp2048	70.68	70.09	0.99
Radeon 8060S Graphics	llama 8B Q5_0	4	pp2048	131.39	129.59	0.99
Radeon 8060S Graphics	llama 8B Q5_0	8	pp2048	215.73	207.00	0.96
Radeon 8060S Graphics	llama 8B Q5_0	16	pp2048	332.22	319.15	0.96
Radeon 8060S Graphics	llama 8B Q5_0	32	pp2048	469.37	263.91	0.56
Radeon 8060S Graphics	llama 8B Q5_0	64	pp2048	87.59	789.16	9.01
Radeon 8060S Graphics	llama 8B Q5_0	128	pp2048	177.16	953.71	5.38
Radeon 8060S Graphics	llama 8B Q5_0	256	pp2048	223.48	1017.42	4.55
Radeon 8060S Graphics	llama 8B Q5_0	512	pp2048	237.09	1038.12	4.38
Radeon 8060S Graphics	llama 8B Q5_0	1024	pp2048	225.83	1081.58	4.79
Radeon 8060S Graphics	llama 8B Q5_0	2048	pp2048	235.68	1078.49	4.58
Radeon 8060S Graphics	llama 8B Q5_1	1	pp2048	33.29	33.13	1.00
Radeon 8060S Graphics	llama 8B Q5_1	2	pp2048	63.68	63.22	0.99
Radeon 8060S Graphics	llama 8B Q5_1	4	pp2048	122.40	122.39	1.00
Radeon 8060S Graphics	llama 8B Q5_1	8	pp2048	202.96	202.54	1.00
Radeon 8060S Graphics	llama 8B Q5_1	16	pp2048	269.11	274.79	1.02
Radeon 8060S Graphics	llama 8B Q5_1	32	pp2048	445.14	504.62	1.13
Radeon 8060S Graphics	llama 8B Q5_1	64	pp2048	77.63	747.86	9.63
Radeon 8060S Graphics	llama 8B Q5_1	128	pp2048	164.50	890.15	5.41
Radeon 8060S Graphics	llama 8B Q5_1	256	pp2048	220.13	950.10	4.32
Radeon 8060S Graphics	llama 8B Q5_1	512	pp2048	235.44	982.28	4.17
Radeon 8060S Graphics	llama 8B Q5_1	1024	pp2048	239.13	1037.66	4.34
Radeon 8060S Graphics	llama 8B Q5_1	2048	pp2048	236.47	1041.81	4.41
Radeon 8060S Graphics	llama 8B Q5_K_S	1	pp2048	33.26	33.14	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	2	pp2048	55.50	55.48	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	4	pp2048	86.88	86.95	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	8	pp2048	108.10	108.16	1.00
Radeon 8060S Graphics	llama 8B Q5_K_S	16	pp2048	322.77	398.07	1.23
Radeon 8060S Graphics	llama 8B Q5_K_S	32	pp2048	399.04	597.20	1.50
Radeon 8060S Graphics	llama 8B Q5_K_S	64	pp2048	107.29	848.10	7.90
Radeon 8060S Graphics	llama 8B Q5_K_S	128	pp2048	167.89	933.85	5.56
Radeon 8060S Graphics	llama 8B Q5_K_S	256	pp2048	228.17	989.01	4.33
Radeon 8060S Graphics	llama 8B Q5_K_S	512	pp2048	229.98	1001.43	4.35
Radeon 8060S Graphics	llama 8B Q5_K_S	1024	pp2048	233.47	1059.95	4.54
Radeon 8060S Graphics	llama 8B Q5_K_S	2048	pp2048	236.34	1065.00	4.51
Radeon 8060S Graphics	llama 8B Q6_K	1	pp2048	32.37	32.34	1.00
Radeon 8060S Graphics	llama 8B Q6_K	2	pp2048	58.31	58.16	1.00
Radeon 8060S Graphics	llama 8B Q6_K	4	pp2048	104.07	103.64	1.00
Radeon 8060S Graphics	llama 8B Q6_K	8	pp2048	139.88	137.91	0.99
Radeon 8060S Graphics	llama 8B Q6_K	16	pp2048	295.75	322.76	1.09
Radeon 8060S Graphics	llama 8B Q6_K	32	pp2048	348.83	461.68	1.32
Radeon 8060S Graphics	llama 8B Q6_K	64	pp2048	78.75	606.52	7.70
Radeon 8060S Graphics	llama 8B Q6_K	128	pp2048	167.87	705.62	4.20
Radeon 8060S Graphics	llama 8B Q6_K	256	pp2048	226.10	739.75	3.27
Radeon 8060S Graphics	llama 8B Q6_K	512	pp2048	238.97	752.91	3.15
Radeon 8060S Graphics	llama 8B Q6_K	1024	pp2048	227.70	790.67	3.47
Radeon 8060S Graphics	llama 8B Q6_K	2048	pp2048	221.73	802.90	3.62
Radeon 8060S Graphics	llama 8B Q8_0	1	pp2048	26.36	26.46	1.00
Radeon 8060S Graphics	llama 8B Q8_0	2	pp2048	51.45	51.54	1.00
Radeon 8060S Graphics	llama 8B Q8_0	4	pp2048	100.68	100.50	1.00
Radeon 8060S Graphics	llama 8B Q8_0	8	pp2048	176.95	175.28	0.99
Radeon 8060S Graphics	llama 8B Q8_0	16	pp2048	302.59	309.96	1.02
Radeon 8060S Graphics	llama 8B Q8_0	32	pp2048	467.38	360.98	0.77
Radeon 8060S Graphics	llama 8B Q8_0	64	pp2048	110.18	787.00	7.14
Radeon 8060S Graphics	llama 8B Q8_0	128	pp2048	182.45	932.92	5.11
Radeon 8060S Graphics	llama 8B Q8_0	256	pp2048	233.55	992.30	4.25
Radeon 8060S Graphics	llama 8B Q8_0	512	pp2048	245.72	1022.11	4.16
Radeon 8060S Graphics	llama 8B Q8_0	1024	pp2048	237.64	1077.98	4.54
Radeon 8060S Graphics	llama 8B Q8_0	2048	pp2048	234.19	1082.87	4.62

In terms of performance I think this PR would be good to merge. There are some cases around batch size 32 that have suboptimal performance but that particular batch size is comparatively less important vs. the larger ones. So I think it would be fine to merge the PR as-is and to maybe optimize that use case in a follow-up PR. (Batch sizes 1-8 are using the same code for both tests so changes there are just random noise and can be ignored, I only included them to investigate the scaling.)

JohannesGaessler · 2025-12-03T21:40:22Z

(This PR still needs a rebase on top of master.)

jiachengjason · 2025-12-04T05:43:35Z

(This PR still needs a rebase on top of master.)

has been done

JohannesGaessler · 2025-12-04T13:15:18Z

ggml/src/ggml-cuda/mma.cuh

        );
-#endif // defined(RDNA4)
+
+#elif defined(RDNA3) 


Suggested change

#elif defined(RDNA3)

#elif defined(RDNA3)

To fix the EditorConfig CI.

…r #endif

CISC · 2025-12-05T09:39:50Z

Please fix:
https://github.com/ggml-org/llama.cpp/actions/runs/19956967339/job/57228072247#step:8:5217

hjc4869 · 2025-12-05T09:41:37Z

Seems like some minor issues on RDNA4: hjc4869@df264e1

arch-btw · 2025-12-05T09:42:04Z

@jiachengjason

Thank you for this! Those are great speed up results.

I did get some errors and my build failed, could you please take a look?

log.log

Beinsezii · 2025-12-05T19:55:18Z

On GFX1100 + ROC641 seems like this commit is causing test-backend-ops -o MUL_MAT_ID to fail with inf overflows

jiachengjason · 2025-12-05T20:59:22Z

On GFX1100 + ROC641 seems like this commit is causing test-backend-ops -o MUL_MAT_ID to fail with inf overflows

I believe this is due to FP16/BF16 MMF kernels not been enabled yet, once this pr (#17495) get merged it should no longer get this failure

Beinsezii · 2025-12-06T03:36:54Z

I believe this is due to FP16/BF16 MMF kernels not been enabled yet, once this pr (#17495) get merged it should no longer get this failure

I suppose ROCm builds will just be broken for RDNA 3 until someone finds the time to finish that PR then?

This reverts commit 668ed76.

* enabled wmma instructions for most quantizations other than q2k * fixed the last q2_k test case failure * address comments: fix out of bound write for RDNA4, add comments after #endif * clean up rebase: fix ne error in half2 * fix the EditorConfig CI

This reverts commit 668ed76.

* enabled wmma instructions for most quantizations other than q2k * fixed the last q2_k test case failure * address comments: fix out of bound write for RDNA4, add comments after #endif * clean up rebase: fix ne error in half2 * fix the EditorConfig CI

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 28, 2025

jiachengjason marked this pull request as ready for review December 1, 2025 21:36

jiachengjason requested a review from JohannesGaessler as a code owner December 1, 2025 21:36

jiachengjason mentioned this pull request Dec 1, 2025

HIP: Add RDNA3 WMMA support to MMF #17495

Open

JohannesGaessler reviewed Dec 1, 2025

View reviewed changes

JohannesGaessler approved these changes Dec 3, 2025

View reviewed changes

jiachengjason force-pushed the feat/jiachengjason/enable_mmq_kernels_for_RDNA3 branch 2 times, most recently from a34b76f to c9ec96c Compare December 4, 2025 05:42

JohannesGaessler reviewed Dec 4, 2025

View reviewed changes

jiachengjason added 5 commits December 4, 2025 10:07

enabled wmma instructions for most quantizations other than q2k

032f69d

fixed the last q2_k test case failure

888b788

address comments: fix out of bound write for RDNA4, add comments afte…

40e435c

…r #endif

clean up rebase: fix ne error in half2

e4fecbc

fix the EditorConfig CI

685be0e

jiachengjason force-pushed the feat/jiachengjason/enable_mmq_kernels_for_RDNA3 branch from 5941226 to 685be0e Compare December 4, 2025 15:11

JohannesGaessler merged commit 668ed76 into ggml-org:master Dec 5, 2025
78 checks passed

JohannesGaessler mentioned this pull request Dec 5, 2025

HIP: fix RDNA4 build #17792

Merged

This was referenced Dec 6, 2025

Eval bug: ROCm-compiled program outputs gibberish #17797

Closed

HIP: fix RDNA3 FP16/BF16 matrix multiplication #17817

Merged

Beinsezii added a commit to Beinsezii/llama.cpp that referenced this pull request Dec 6, 2025

Revert "HIP: enable WMMA-MMQ INT kernels for RDNA 3 (ggml-org#17576)"

f083c05

This reverts commit 668ed76.

gabe-l-hart mentioned this pull request Dec 10, 2025

feat: llama.cpp bump (17f7f4) for SSM performance improvements ollama/ollama#13408

Merged

eugr mentioned this pull request Dec 10, 2025

Misc. bug: Performance regression using ROCm on Strix Halo #17917

Open

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Dec 20, 2025

Revert "HIP: enable WMMA-MMQ INT kernels for RDNA 3 (ggml-org#17576)"

a2b9bb4

This reverts commit 668ed76.

Beinsezii mentioned this pull request Dec 29, 2025

Patch perf regression for mmq kernels in ROCm #18442

Closed

Conversation

jiachengjason commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented Dec 3, 2025

Uh oh!

JohannesGaessler commented Dec 3, 2025

Uh oh!

jiachengjason commented Dec 4, 2025

Uh oh!

JohannesGaessler Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CISC commented Dec 5, 2025

Uh oh!

hjc4869 commented Dec 5, 2025

Uh oh!

arch-btw commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Beinsezii commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiachengjason commented Dec 5, 2025

Uh oh!

Beinsezii commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jiachengjason commented Nov 28, 2025 •

edited

Loading

arch-btw commented Dec 5, 2025 •

edited

Loading

Beinsezii commented Dec 5, 2025 •

edited

Loading