forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 99
Discussion / FYI : Smooth performance on long context #21
Copy link
Copy link
Open
Description
since the early builds have massive performance degradations, I have redone the test with the current code base with the CUDA implementation:
From 0 to 260000 tokens, the performance is moving down very smoothly and predictably when running:
Unsloths Qwen3.5-4B-Q4_K_M.gguf Model with -ctk turbo3 -ctv turbo3
Great Job :-)
| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) |
|:---------------|-----------------:|---------------:|-------------:|-----------------:|-----------------:|-----------------:|
| llamacpp-model | pp2048 | 3357.96 ± 0.00 | | 714.35 ± 0.00 | 578.33 ± 0.00 | 714.40 ± 0.00 |
| llamacpp-model | tg32 | 83.42 ± 0.00 | 86.26 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d4096 | 3426.76 ± 0.00 | | 1801.74 ± 0.00 | 1665.71 ± 0.00 | 1801.78 ± 0.00 |
| llamacpp-model | tg32 @ d4096 | 75.85 ± 0.00 | 78.53 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d8192 | 3446.56 ± 0.00 | | 2841.33 ± 0.00 | 2705.31 ± 0.00 | 2841.37 ± 0.00 |
| llamacpp-model | tg32 @ d8192 | 69.58 ± 0.00 | 72.12 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d12288 | 3382.78 ± 0.00 | | 4016.56 ± 0.00 | 3880.53 ± 0.00 | 4016.61 ± 0.00 |
| llamacpp-model | tg32 @ d12288 | 65.45 ± 0.00 | 67.88 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d16384 | 3353.15 ± 0.00 | | 5137.89 ± 0.00 | 5001.86 ± 0.00 | 5137.93 ± 0.00 |
| llamacpp-model | tg32 @ d16384 | 60.70 ± 0.00 | 63.03 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d20480 | 3272.50 ± 0.00 | | 6239.61 ± 0.00 | 6103.59 ± 0.00 | 6239.66 ± 0.00 |
| llamacpp-model | tg32 @ d20480 | 57.20 ± 0.00 | 59.40 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d24576 | 3206.25 ± 0.00 | | 7647.60 ± 0.00 | 7511.58 ± 0.00 | 7647.65 ± 0.00 |
| llamacpp-model | tg32 @ d24576 | 53.72 ± 0.00 | 55.82 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d28672 | 3137.46 ± 0.00 | | 8999.56 ± 0.00 | 8863.53 ± 0.00 | 8999.60 ± 0.00 |
| llamacpp-model | tg32 @ d28672 | 50.54 ± 0.00 | 52.52 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d32768 | 3065.26 ± 0.00 | | 10406.61 ± 0.00 | 10270.58 ± 0.00 | 10406.65 ± 0.00 |
| llamacpp-model | tg32 @ d32768 | 47.16 ± 0.00 | 49.03 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d36864 | 2995.70 ± 0.00 | | 11860.16 ± 0.00 | 11724.14 ± 0.00 | 11860.20 ± 0.00 |
| llamacpp-model | tg32 @ d36864 | 45.48 ± 0.00 | 47.32 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d40960 | 2936.61 ± 0.00 | | 13334.58 ± 0.00 | 13198.55 ± 0.00 | 13334.61 ± 0.00 |
| llamacpp-model | tg32 @ d40960 | 43.19 ± 0.00 | 44.95 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d45056 | 2860.09 ± 0.00 | | 15032.78 ± 0.00 | 14896.75 ± 0.00 | 15032.81 ± 0.00 |
| llamacpp-model | tg32 @ d45056 | 40.95 ± 0.00 | 42.60 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d49152 | 2803.67 ± 0.00 | | 16652.93 ± 0.00 | 16516.90 ± 0.00 | 16652.97 ± 0.00 |
| llamacpp-model | tg32 @ d49152 | 38.75 ± 0.00 | 40.33 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d53248 | 2733.08 ± 0.00 | | 18462.59 ± 0.00 | 18326.57 ± 0.00 | 18462.63 ± 0.00 |
| llamacpp-model | tg32 @ d53248 | 37.29 ± 0.00 | 38.79 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d57344 | 2668.39 ± 0.00 | | 20194.57 ± 0.00 | 20058.55 ± 0.00 | 20194.61 ± 0.00 |
| llamacpp-model | tg32 @ d57344 | 35.67 ± 0.00 | 37.12 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d61440 | 2604.57 ± 0.00 | | 22247.17 ± 0.00 | 22111.14 ± 0.00 | 22247.21 ± 0.00 |
| llamacpp-model | tg32 @ d61440 | 33.79 ± 0.00 | 35.21 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d65536 | 2532.17 ± 0.00 | | 24346.91 ± 0.00 | 24210.89 ± 0.00 | 24346.96 ± 0.00 |
| llamacpp-model | tg32 @ d65536 | 32.59 ± 0.00 | 33.92 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d69632 | 2457.23 ± 0.00 | | 26618.68 ± 0.00 | 26482.65 ± 0.00 | 26618.72 ± 0.00 |
| llamacpp-model | tg32 @ d69632 | 31.49 ± 0.00 | 32.78 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d73728 | 2399.10 ± 0.00 | | 28717.61 ± 0.00 | 28581.58 ± 0.00 | 28717.65 ± 0.00 |
| llamacpp-model | tg32 @ d73728 | 30.22 ± 0.00 | 31.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d77824 | 2350.01 ± 0.00 | | 30949.93 ± 0.00 | 30813.90 ± 0.00 | 30949.96 ± 0.00 |
| llamacpp-model | tg32 @ d77824 | 28.95 ± 0.00 | 30.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d81920 | 2294.49 ± 0.00 | | 33227.09 ± 0.00 | 33091.07 ± 0.00 | 33227.13 ± 0.00 |
| llamacpp-model | tg32 @ d81920 | 28.00 ± 0.00 | 29.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d86016 | 2242.76 ± 0.00 | | 35746.63 ± 0.00 | 35610.61 ± 0.00 | 35746.69 ± 0.00 |
| llamacpp-model | tg32 @ d86016 | 27.03 ± 0.00 | 28.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d90112 | 2204.01 ± 0.00 | | 38000.70 ± 0.00 | 37864.68 ± 0.00 | 38000.76 ± 0.00 |
| llamacpp-model | tg32 @ d90112 | 26.17 ± 0.00 | 27.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d94208 | 2158.79 ± 0.00 | | 40471.18 ± 0.00 | 40335.16 ± 0.00 | 40471.23 ± 0.00 |
| llamacpp-model | tg32 @ d94208 | 25.55 ± 0.00 | 27.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d98304 | 2121.21 ± 0.00 | | 42979.55 ± 0.00 | 42843.53 ± 0.00 | 42979.59 ± 0.00 |
| llamacpp-model | tg32 @ d98304 | 24.80 ± 0.00 | 26.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d102400 | 2080.53 ± 0.00 | | 45673.90 ± 0.00 | 45537.88 ± 0.00 | 45673.94 ± 0.00 |
| llamacpp-model | tg32 @ d102400 | 24.11 ± 0.00 | 25.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d106496 | 2047.42 ± 0.00 | | 48177.01 ± 0.00 | 48040.98 ± 0.00 | 48177.05 ± 0.00 |
| llamacpp-model | tg32 @ d106496 | 23.35 ± 0.00 | 24.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d110592 | 2019.84 ± 0.00 | | 50714.38 ± 0.00 | 50578.36 ± 0.00 | 50714.42 ± 0.00 |
| llamacpp-model | tg32 @ d110592 | 22.81 ± 0.00 | 24.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d114688 | 1985.74 ± 0.00 | | 53406.45 ± 0.00 | 53270.42 ± 0.00 | 53406.53 ± 0.00 |
| llamacpp-model | tg32 @ d114688 | 22.12 ± 0.00 | 23.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d118784 | 1947.66 ± 0.00 | | 56508.32 ± 0.00 | 56372.30 ± 0.00 | 56508.38 ± 0.00 |
| llamacpp-model | tg32 @ d118784 | 21.45 ± 0.00 | 22.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d122880 | 1921.86 ± 0.00 | | 58975.36 ± 0.00 | 58839.33 ± 0.00 | 58975.41 ± 0.00 |
| llamacpp-model | tg32 @ d122880 | 21.15 ± 0.00 | 22.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d126976 | 1885.92 ± 0.00 | | 62158.18 ± 0.00 | 62022.16 ± 0.00 | 62158.22 ± 0.00 |
| llamacpp-model | tg32 @ d126976 | 20.60 ± 0.00 | 21.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d131072 | 1861.36 ± 0.00 | | 64886.48 ± 0.00 | 64750.46 ± 0.00 | 64886.57 ± 0.00 |
| llamacpp-model | tg32 @ d131072 | 20.08 ± 0.00 | 21.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d135168 | 1831.51 ± 0.00 | | 67914.49 ± 0.00 | 67778.47 ± 0.00 | 67914.55 ± 0.00 |
| llamacpp-model | tg32 @ d135168 | 19.65 ± 0.00 | 21.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d139264 | 1799.84 ± 0.00 | | 71136.37 ± 0.00 | 71000.35 ± 0.00 | 71136.43 ± 0.00 |
| llamacpp-model | tg32 @ d139264 | 19.22 ± 0.00 | 20.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d143360 | 1771.14 ± 0.00 | | 74602.37 ± 0.00 | 74466.35 ± 0.00 | 74602.42 ± 0.00 |
| llamacpp-model | tg32 @ d143360 | 18.77 ± 0.00 | 20.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d147456 | 1744.12 ± 0.00 | | 77818.14 ± 0.00 | 77682.12 ± 0.00 | 77818.19 ± 0.00 |
| llamacpp-model | tg32 @ d147456 | 18.19 ± 0.00 | 19.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d151552 | 1723.22 ± 0.00 | | 80809.77 ± 0.00 | 80673.74 ± 0.00 | 80809.80 ± 0.00 |
| llamacpp-model | tg32 @ d151552 | 17.99 ± 0.00 | 19.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d155648 | 1696.27 ± 0.00 | | 84377.23 ± 0.00 | 84241.21 ± 0.00 | 84377.28 ± 0.00 |
| llamacpp-model | tg32 @ d155648 | 17.63 ± 0.00 | 19.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d159744 | 1673.22 ± 0.00 | | 87788.61 ± 0.00 | 87652.58 ± 0.00 | 87788.65 ± 0.00 |
| llamacpp-model | tg32 @ d159744 | 17.25 ± 0.00 | 18.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d163840 | 1646.63 ± 0.00 | | 91424.09 ± 0.00 | 91288.06 ± 0.00 | 91424.14 ± 0.00 |
| llamacpp-model | tg32 @ d163840 | 17.07 ± 0.00 | 18.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d167936 | 1622.49 ± 0.00 | | 95189.04 ± 0.00 | 95053.02 ± 0.00 | 95189.08 ± 0.00 |
| llamacpp-model | tg32 @ d167936 | 16.61 ± 0.00 | 18.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d172032 | 1593.15 ± 0.00 | | 99303.49 ± 0.00 | 99167.46 ± 0.00 | 99303.53 ± 0.00 |
| llamacpp-model | tg32 @ d172032 | 16.32 ± 0.00 | 17.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d176128 | 1575.59 ± 0.00 | | 102785.84 ± 0.00 | 102649.81 ± 0.00 | 102785.88 ± 0.00 |
| llamacpp-model | tg32 @ d176128 | 16.05 ± 0.00 | 17.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d180224 | 1552.26 ± 0.00 | | 106711.51 ± 0.00 | 106575.49 ± 0.00 | 106711.56 ± 0.00 |
| llamacpp-model | tg32 @ d180224 | 15.68 ± 0.00 | 17.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d184320 | 1534.39 ± 0.00 | | 110310.06 ± 0.00 | 110174.04 ± 0.00 | 110310.10 ± 0.00 |
| llamacpp-model | tg32 @ d184320 | 15.48 ± 0.00 | 16.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d188416 | 1516.98 ± 0.00 | | 113721.50 ± 0.00 | 113585.47 ± 0.00 | 113721.54 ± 0.00 |
| llamacpp-model | tg32 @ d188416 | 15.24 ± 0.00 | 16.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d192512 | 1491.30 ± 0.00 | | 118509.15 ± 0.00 | 118373.13 ± 0.00 | 118509.19 ± 0.00 |
| llamacpp-model | tg32 @ d192512 | 14.95 ± 0.00 | 16.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d196608 | 1471.26 ± 0.00 | | 122805.39 ± 0.00 | 122669.37 ± 0.00 | 122805.43 ± 0.00 |
| llamacpp-model | tg32 @ d196608 | 14.80 ± 0.00 | 16.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d200704 | 1458.37 ± 0.00 | | 125987.13 ± 0.00 | 125851.11 ± 0.00 | 125987.18 ± 0.00 |
| llamacpp-model | tg32 @ d200704 | 14.41 ± 0.00 | 15.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d204800 | 1437.43 ± 0.00 | | 130774.22 ± 0.00 | 130638.19 ± 0.00 | 130774.27 ± 0.00 |
| llamacpp-model | tg32 @ d204800 | 14.18 ± 0.00 | 15.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d208896 | 1416.41 ± 0.00 | | 135347.23 ± 0.00 | 135211.20 ± 0.00 | 135347.27 ± 0.00 |
| llamacpp-model | tg32 @ d208896 | 14.02 ± 0.00 | 15.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d212992 | 1400.21 ± 0.00 | | 139079.05 ± 0.00 | 138943.02 ± 0.00 | 139079.09 ± 0.00 |
| llamacpp-model | tg32 @ d212992 | 13.68 ± 0.00 | 14.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d217088 | 1382.22 ± 0.00 | | 143739.22 ± 0.00 | 143603.19 ± 0.00 | 143739.26 ± 0.00 |
| llamacpp-model | tg32 @ d217088 | 13.61 ± 0.00 | 15.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d221184 | 1360.31 ± 0.00 | | 149030.98 ± 0.00 | 148894.96 ± 0.00 | 149031.03 ± 0.00 |
| llamacpp-model | tg32 @ d221184 | 13.30 ± 0.00 | 14.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d225280 | 1344.43 ± 0.00 | | 153261.20 ± 0.00 | 153125.17 ± 0.00 | 153261.24 ± 0.00 |
| llamacpp-model | tg32 @ d225280 | 13.12 ± 0.00 | 14.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d229376 | 1325.67 ± 0.00 | | 158416.95 ± 0.00 | 158280.93 ± 0.00 | 158417.01 ± 0.00 |
| llamacpp-model | tg32 @ d229376 | 12.84 ± 0.00 | 14.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d233472 | 1313.42 ± 0.00 | | 162637.25 ± 0.00 | 162501.22 ± 0.00 | 162637.31 ± 0.00 |
| llamacpp-model | tg32 @ d233472 | 12.78 ± 0.00 | 14.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d237568 | 1297.20 ± 0.00 | | 167530.82 ± 0.00 | 167394.79 ± 0.00 | 167530.85 ± 0.00 |
| llamacpp-model | tg32 @ d237568 | 12.35 ± 0.00 | 13.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d241664 | 1281.79 ± 0.00 | | 172606.08 ± 0.00 | 172470.05 ± 0.00 | 172606.13 ± 0.00 |
| llamacpp-model | tg32 @ d241664 | 12.40 ± 0.00 | 13.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d245760 | 1266.15 ± 0.00 | | 177761.94 ± 0.00 | 177625.92 ± 0.00 | 177761.98 ± 0.00 |
| llamacpp-model | tg32 @ d245760 | 12.25 ± 0.00 | 13.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d249856 | 1252.94 ± 0.00 | | 182221.82 ± 0.00 | 182085.79 ± 0.00 | 182221.88 ± 0.00 |
| llamacpp-model | tg32 @ d249856 | 12.07 ± 0.00 | 13.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d253952 | 1239.44 ± 0.00 | | 187327.68 ± 0.00 | 187191.66 ± 0.00 | 187327.72 ± 0.00 |
| llamacpp-model | tg32 @ d253952 | 11.93 ± 0.00 | 13.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d258048 | 1227.05 ± 0.00 | | 192256.83 ± 0.00 | 192120.81 ± 0.00 | 192256.88 ± 0.00 |
| llamacpp-model | tg32 @ d258048 | 11.78 ± 0.00 | 13.00 ± 0.00 | | | |
llama-benchy (0.3.2.dev1+g17b42667a)
date: 2026-03-29 09:31:13 | latency mode: generation
Also turbo2 works fine:
| model | test | t/s | peak t/s | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) |
|:---------------|-----------------:|---------------:|-------------:|-----------------:|-----------------:|-----------------:|
| llamacpp-model | pp2048 | 3363.06 ± 0.00 | | 700.56 ± 0.00 | 564.37 ± 0.00 | 700.61 ± 0.00 |
| llamacpp-model | tg32 | 85.18 ± 0.00 | 88.04 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d4096 | 3437.04 ± 0.00 | | 1719.24 ± 0.00 | 1583.05 ± 0.00 | 1719.28 ± 0.00 |
| llamacpp-model | tg32 @ d4096 | 80.56 ± 0.00 | 83.39 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d8192 | 3451.68 ± 0.00 | | 2871.38 ± 0.00 | 2735.19 ± 0.00 | 2871.42 ± 0.00 |
| llamacpp-model | tg32 @ d8192 | 76.03 ± 0.00 | 78.80 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d12288 | 3388.71 ± 0.00 | | 3982.50 ± 0.00 | 3846.30 ± 0.00 | 3982.55 ± 0.00 |
| llamacpp-model | tg32 @ d12288 | 71.38 ± 0.00 | 74.06 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d16384 | 3361.10 ± 0.00 | | 5150.32 ± 0.00 | 5014.13 ± 0.00 | 5150.36 ± 0.00 |
| llamacpp-model | tg32 @ d16384 | 68.72 ± 0.00 | 71.37 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d20480 | 3285.20 ± 0.00 | | 6390.61 ± 0.00 | 6254.42 ± 0.00 | 6390.65 ± 0.00 |
| llamacpp-model | tg32 @ d20480 | 65.54 ± 0.00 | 68.14 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d24576 | 3223.35 ± 0.00 | | 7587.13 ± 0.00 | 7450.94 ± 0.00 | 7587.17 ± 0.00 |
| llamacpp-model | tg32 @ d24576 | 63.20 ± 0.00 | 65.72 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d28672 | 3144.74 ± 0.00 | | 9011.01 ± 0.00 | 8874.81 ± 0.00 | 9011.05 ± 0.00 |
| llamacpp-model | tg32 @ d28672 | 60.18 ± 0.00 | 62.61 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d32768 | 3075.11 ± 0.00 | | 10489.95 ± 0.00 | 10353.76 ± 0.00 | 10489.99 ± 0.00 |
| llamacpp-model | tg32 @ d32768 | 57.62 ± 0.00 | 59.98 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d36864 | 3012.57 ± 0.00 | | 11863.07 ± 0.00 | 11726.88 ± 0.00 | 11863.12 ± 0.00 |
| llamacpp-model | tg32 @ d36864 | 55.69 ± 0.00 | 58.04 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d40960 | 2952.43 ± 0.00 | | 13386.63 ± 0.00 | 13250.43 ± 0.00 | 13386.66 ± 0.00 |
| llamacpp-model | tg32 @ d40960 | 53.80 ± 0.00 | 56.05 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d45056 | 2878.70 ± 0.00 | | 15032.48 ± 0.00 | 14896.29 ± 0.00 | 15032.52 ± 0.00 |
| llamacpp-model | tg32 @ d45056 | 51.63 ± 0.00 | 53.84 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d49152 | 2807.42 ± 0.00 | | 16613.96 ± 0.00 | 16477.76 ± 0.00 | 16614.00 ± 0.00 |
| llamacpp-model | tg32 @ d49152 | 49.93 ± 0.00 | 52.11 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d53248 | 2742.86 ± 0.00 | | 18394.85 ± 0.00 | 18258.65 ± 0.00 | 18394.90 ± 0.00 |
| llamacpp-model | tg32 @ d53248 | 47.99 ± 0.00 | 50.08 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d57344 | 2667.60 ± 0.00 | | 20288.00 ± 0.00 | 20151.81 ± 0.00 | 20288.05 ± 0.00 |
| llamacpp-model | tg32 @ d57344 | 45.92 ± 0.00 | 47.93 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d61440 | 2603.33 ± 0.00 | | 22152.22 ± 0.00 | 22016.03 ± 0.00 | 22152.26 ± 0.00 |
| llamacpp-model | tg32 @ d61440 | 44.33 ± 0.00 | 46.27 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d65536 | 2538.95 ± 0.00 | | 24205.15 ± 0.00 | 24068.96 ± 0.00 | 24205.19 ± 0.00 |
| llamacpp-model | tg32 @ d65536 | 43.12 ± 0.00 | 45.02 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d69632 | 2465.64 ± 0.00 | | 26411.76 ± 0.00 | 26275.57 ± 0.00 | 26411.80 ± 0.00 |
| llamacpp-model | tg32 @ d69632 | 41.87 ± 0.00 | 43.71 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d73728 | 2390.93 ± 0.00 | | 28804.07 ± 0.00 | 28667.88 ± 0.00 | 28804.11 ± 0.00 |
| llamacpp-model | tg32 @ d73728 | 40.38 ± 0.00 | 42.22 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d77824 | 2343.44 ± 0.00 | | 30971.25 ± 0.00 | 30835.06 ± 0.00 | 30971.31 ± 0.00 |
| llamacpp-model | tg32 @ d77824 | 39.29 ± 0.00 | 41.04 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d81920 | 2304.18 ± 0.00 | | 33270.39 ± 0.00 | 33134.20 ± 0.00 | 33270.44 ± 0.00 |
| llamacpp-model | tg32 @ d81920 | 38.07 ± 0.00 | 39.77 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d86016 | 2257.04 ± 0.00 | | 35594.20 ± 0.00 | 35458.01 ± 0.00 | 35594.24 ± 0.00 |
| llamacpp-model | tg32 @ d86016 | 37.10 ± 0.00 | 38.75 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d90112 | 2219.84 ± 0.00 | | 37789.44 ± 0.00 | 37653.25 ± 0.00 | 37789.48 ± 0.00 |
| llamacpp-model | tg32 @ d90112 | 36.15 ± 0.00 | 37.79 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d94208 | 2179.44 ± 0.00 | | 40145.48 ± 0.00 | 40009.29 ± 0.00 | 40145.59 ± 0.00 |
| llamacpp-model | tg32 @ d94208 | 35.18 ± 0.00 | 36.81 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d98304 | 2142.30 ± 0.00 | | 42581.17 ± 0.00 | 42444.97 ± 0.00 | 42581.21 ± 0.00 |
| llamacpp-model | tg32 @ d98304 | 34.37 ± 0.00 | 35.93 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d102400 | 2104.26 ± 0.00 | | 45206.73 ± 0.00 | 45070.54 ± 0.00 | 45206.78 ± 0.00 |
| llamacpp-model | tg32 @ d102400 | 33.29 ± 0.00 | 34.81 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d106496 | 2069.37 ± 0.00 | | 47754.02 ± 0.00 | 47617.83 ± 0.00 | 47754.07 ± 0.00 |
| llamacpp-model | tg32 @ d106496 | 32.35 ± 0.00 | 33.84 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d110592 | 2041.54 ± 0.00 | | 50176.25 ± 0.00 | 50040.05 ± 0.00 | 50176.28 ± 0.00 |
| llamacpp-model | tg32 @ d110592 | 31.99 ± 0.00 | 33.47 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d114688 | 2005.05 ± 0.00 | | 52968.85 ± 0.00 | 52832.66 ± 0.00 | 52968.90 ± 0.00 |
| llamacpp-model | tg32 @ d114688 | 31.17 ± 0.00 | 32.60 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d118784 | 1969.64 ± 0.00 | | 55882.88 ± 0.00 | 55746.68 ± 0.00 | 55882.93 ± 0.00 |
| llamacpp-model | tg32 @ d118784 | 30.25 ± 0.00 | 31.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d122880 | 1941.63 ± 0.00 | | 58532.56 ± 0.00 | 58396.36 ± 0.00 | 58532.59 ± 0.00 |
| llamacpp-model | tg32 @ d122880 | 29.72 ± 0.00 | 31.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d126976 | 1910.64 ± 0.00 | | 61508.30 ± 0.00 | 61372.11 ± 0.00 | 61508.34 ± 0.00 |
| llamacpp-model | tg32 @ d126976 | 29.22 ± 0.00 | 30.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d131072 | 1878.21 ± 0.00 | | 64561.27 ± 0.00 | 64425.08 ± 0.00 | 64561.31 ± 0.00 |
| llamacpp-model | tg32 @ d131072 | 28.46 ± 0.00 | 30.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d135168 | 1855.69 ± 0.00 | | 67228.33 ± 0.00 | 67092.14 ± 0.00 | 67228.37 ± 0.00 |
| llamacpp-model | tg32 @ d135168 | 28.11 ± 0.00 | 29.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d139264 | 1823.41 ± 0.00 | | 70526.92 ± 0.00 | 70390.73 ± 0.00 | 70526.97 ± 0.00 |
| llamacpp-model | tg32 @ d139264 | 27.65 ± 0.00 | 29.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d143360 | 1799.44 ± 0.00 | | 73399.98 ± 0.00 | 73263.79 ± 0.00 | 73400.02 ± 0.00 |
| llamacpp-model | tg32 @ d143360 | 26.94 ± 0.00 | 28.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d147456 | 1773.69 ± 0.00 | | 76430.21 ± 0.00 | 76294.01 ± 0.00 | 76430.24 ± 0.00 |
| llamacpp-model | tg32 @ d147456 | 26.64 ± 0.00 | 28.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d151552 | 1748.98 ± 0.00 | | 79640.03 ± 0.00 | 79503.84 ± 0.00 | 79640.07 ± 0.00 |
| llamacpp-model | tg32 @ d151552 | 26.04 ± 0.00 | 27.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d155648 | 1723.21 ± 0.00 | | 83051.93 ± 0.00 | 82915.74 ± 0.00 | 83051.97 ± 0.00 |
| llamacpp-model | tg32 @ d155648 | 25.44 ± 0.00 | 27.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d159744 | 1697.96 ± 0.00 | | 86511.71 ± 0.00 | 86375.51 ± 0.00 | 86511.75 ± 0.00 |
| llamacpp-model | tg32 @ d159744 | 24.99 ± 0.00 | 26.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d163840 | 1673.13 ± 0.00 | | 89972.86 ± 0.00 | 89836.67 ± 0.00 | 89972.91 ± 0.00 |
| llamacpp-model | tg32 @ d163840 | 24.62 ± 0.00 | 26.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d167936 | 1655.17 ± 0.00 | | 93144.33 ± 0.00 | 93008.13 ± 0.00 | 93144.37 ± 0.00 |
| llamacpp-model | tg32 @ d167936 | 24.27 ± 0.00 | 25.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d172032 | 1630.81 ± 0.00 | | 96750.41 ± 0.00 | 96614.22 ± 0.00 | 96750.45 ± 0.00 |
| llamacpp-model | tg32 @ d172032 | 23.87 ± 0.00 | 25.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d176128 | 1611.12 ± 0.00 | | 100318.37 ± 0.00 | 100182.17 ± 0.00 | 100318.41 ± 0.00 |
| llamacpp-model | tg32 @ d176128 | 23.40 ± 0.00 | 25.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d180224 | 1586.65 ± 0.00 | | 104271.75 ± 0.00 | 104135.56 ± 0.00 | 104271.79 ± 0.00 |
| llamacpp-model | tg32 @ d180224 | 22.77 ± 0.00 | 24.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d184320 | 1566.23 ± 0.00 | | 107966.12 ± 0.00 | 107829.93 ± 0.00 | 107966.16 ± 0.00 |
| llamacpp-model | tg32 @ d184320 | 22.76 ± 0.00 | 24.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d188416 | 1539.55 ± 0.00 | | 112443.18 ± 0.00 | 112306.98 ± 0.00 | 112443.22 ± 0.00 |
| llamacpp-model | tg32 @ d188416 | 22.09 ± 0.00 | 23.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d192512 | 1522.61 ± 0.00 | | 116000.03 ± 0.00 | 115863.84 ± 0.00 | 116000.08 ± 0.00 |
| llamacpp-model | tg32 @ d192512 | 21.98 ± 0.00 | 23.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d196608 | 1503.58 ± 0.00 | | 119889.05 ± 0.00 | 119752.86 ± 0.00 | 119889.09 ± 0.00 |
| llamacpp-model | tg32 @ d196608 | 21.67 ± 0.00 | 23.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d200704 | 1478.96 ± 0.00 | | 124346.05 ± 0.00 | 124209.86 ± 0.00 | 124346.09 ± 0.00 |
| llamacpp-model | tg32 @ d200704 | 21.31 ± 0.00 | 22.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d204800 | 1462.59 ± 0.00 | | 128285.16 ± 0.00 | 128148.97 ± 0.00 | 128285.20 ± 0.00 |
| llamacpp-model | tg32 @ d204800 | 21.08 ± 0.00 | 22.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d208896 | 1444.56 ± 0.00 | | 132489.65 ± 0.00 | 132353.46 ± 0.00 | 132489.69 ± 0.00 |
| llamacpp-model | tg32 @ d208896 | 20.70 ± 0.00 | 22.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d212992 | 1427.11 ± 0.00 | | 136684.45 ± 0.00 | 136548.26 ± 0.00 | 136684.49 ± 0.00 |
| llamacpp-model | tg32 @ d212992 | 20.21 ± 0.00 | 21.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d217088 | 1409.41 ± 0.00 | | 141056.49 ± 0.00 | 140920.30 ± 0.00 | 141056.54 ± 0.00 |
| llamacpp-model | tg32 @ d217088 | 20.06 ± 0.00 | 21.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d221184 | 1394.43 ± 0.00 | | 145122.07 ± 0.00 | 144985.88 ± 0.00 | 145122.12 ± 0.00 |
| llamacpp-model | tg32 @ d221184 | 19.91 ± 0.00 | 21.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d225280 | 1375.80 ± 0.00 | | 150017.28 ± 0.00 | 149881.09 ± 0.00 | 150017.32 ± 0.00 |
| llamacpp-model | tg32 @ d225280 | 19.43 ± 0.00 | 20.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d229376 | 1356.68 ± 0.00 | | 154843.64 ± 0.00 | 154707.45 ± 0.00 | 154843.68 ± 0.00 |
| llamacpp-model | tg32 @ d229376 | 19.37 ± 0.00 | 20.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d233472 | 1342.57 ± 0.00 | | 159033.12 ± 0.00 | 158896.93 ± 0.00 | 159033.16 ± 0.00 |
| llamacpp-model | tg32 @ d233472 | 19.13 ± 0.00 | 20.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d237568 | 1327.78 ± 0.00 | | 163738.17 ± 0.00 | 163601.98 ± 0.00 | 163738.23 ± 0.00 |
| llamacpp-model | tg32 @ d237568 | 18.84 ± 0.00 | 20.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d241664 | 1313.72 ± 0.00 | | 168279.85 ± 0.00 | 168143.66 ± 0.00 | 168279.89 ± 0.00 |
| llamacpp-model | tg32 @ d241664 | 18.76 ± 0.00 | 20.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d245760 | 1295.95 ± 0.00 | | 173362.60 ± 0.00 | 173226.41 ± 0.00 | 173362.65 ± 0.00 |
| llamacpp-model | tg32 @ d245760 | 18.36 ± 0.00 | 19.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d249856 | 1281.91 ± 0.00 | | 178170.67 ± 0.00 | 178034.47 ± 0.00 | 178170.70 ± 0.00 |
| llamacpp-model | tg32 @ d249856 | 18.10 ± 0.00 | 19.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d253952 | 1266.63 ± 0.00 | | 183321.25 ± 0.00 | 183185.06 ± 0.00 | 183321.29 ± 0.00 |
| llamacpp-model | tg32 @ d253952 | 17.89 ± 0.00 | 19.00 ± 0.00 | | | |
| llamacpp-model | pp2048 @ d258048 | 1252.60 ± 0.00 | | 188352.62 ± 0.00 | 188216.42 ± 0.00 | 188352.67 ± 0.00 |
| llamacpp-model | tg32 @ d258048 | 17.69 ± 0.00 | 19.00 ± 0.00 | | | |
llama-benchy (0.3.2.dev1+g17b42667a)
date: 2026-03-29 11:25:44 | latency mode: generation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels