Skip to content

Eval bug: Major performance regression with commit 7acb4e8... when running hybrid CUDA + ZenDNN #24315

@skhameneh

Description

@skhameneh

Name and Version

I was able to narrow it down to as of commit 7acb4e8cd2ce21f457d1298e75fad729520d263c, prefill performance with unsloth/GLM-5.1-GGUF:UD-Q3_K_XL when using CUDA 13.3 and ZenDNN (Zen5) drops by ~50%.

~250-260 prefill drops to ~120-130 prefill.
Decode appears to be impacted as well, less so.

Operating systems

Linux

GGML backends

ZenDNN

Hardware

2 x Epyc 9115
1 x RTX Pro 6000
2 x RTX 5090

Models

unsloth/GLM-5.1-GGUF:UD-Q3_K_XL

Problem description & steps to reproduce

Run commit 7acb4e8cd2ce21f457d1298e75fad729520d263c, compare performance with commit 3ecfb150a4bd2d92b2a7974bb1af954c8a5e2985.

First Bad Commit

7acb4e8cd2ce21f457d1298e75fad729520d263c

Relevant log output

Logs
...

...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions