Name and Version
I was able to narrow it down to as of commit 7acb4e8cd2ce21f457d1298e75fad729520d263c, prefill performance with unsloth/GLM-5.1-GGUF:UD-Q3_K_XL when using CUDA 13.3 and ZenDNN (Zen5) drops by ~50%.
~250-260 prefill drops to ~120-130 prefill.
Decode appears to be impacted as well, less so.
Operating systems
Linux
GGML backends
ZenDNN
Hardware
2 x Epyc 9115
1 x RTX Pro 6000
2 x RTX 5090
Models
unsloth/GLM-5.1-GGUF:UD-Q3_K_XL
Problem description & steps to reproduce
Run commit 7acb4e8cd2ce21f457d1298e75fad729520d263c, compare performance with commit 3ecfb150a4bd2d92b2a7974bb1af954c8a5e2985.
First Bad Commit
7acb4e8cd2ce21f457d1298e75fad729520d263c
Relevant log output
Logs
...
Name and Version
I was able to narrow it down to as of commit
7acb4e8cd2ce21f457d1298e75fad729520d263c, prefill performance withunsloth/GLM-5.1-GGUF:UD-Q3_K_XLwhen using CUDA 13.3 and ZenDNN (Zen5) drops by ~50%.~250-260 prefill drops to ~120-130 prefill.
Decode appears to be impacted as well, less so.
Operating systems
Linux
GGML backends
ZenDNN
Hardware
2 x Epyc 9115
1 x RTX Pro 6000
2 x RTX 5090
Models
unsloth/GLM-5.1-GGUF:UD-Q3_K_XLProblem description & steps to reproduce
Run commit
7acb4e8cd2ce21f457d1298e75fad729520d263c, compare performance with commit3ecfb150a4bd2d92b2a7974bb1af954c8a5e2985.First Bad Commit
7acb4e8cd2ce21f457d1298e75fad729520d263cRelevant log output
Logs
......