Eval bug: Major performance regression with commit 7acb4e8... when running hybrid CUDA + ZenDNN

### Name and Version

I was able to narrow it down to as of commit `7acb4e8cd2ce21f457d1298e75fad729520d263c`, prefill performance with `unsloth/GLM-5.1-GGUF:UD-Q3_K_XL` when using CUDA 13.3 and ZenDNN (Zen5) drops by ~50%.

~250-260 prefill drops to ~120-130 prefill.
Decode appears to be impacted as well, less so.

### Operating systems

Linux

### GGML backends

ZenDNN

### Hardware

2 x Epyc 9115
1 x RTX Pro 6000
2 x RTX 5090

### Models

`unsloth/GLM-5.1-GGUF:UD-Q3_K_XL`

### Problem description & steps to reproduce

Run commit `7acb4e8cd2ce21f457d1298e75fad729520d263c`, compare performance with commit `3ecfb150a4bd2d92b2a7974bb1af954c8a5e2985`.

### First Bad Commit

`7acb4e8cd2ce21f457d1298e75fad729520d263c`

### Relevant log output

<details>
<summary>Logs</summary>


```console
...
```
</details>


...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Major performance regression with commit 7acb4e8... when running hybrid CUDA + ZenDNN #24315

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: Major performance regression with commit 7acb4e8... when running hybrid CUDA + ZenDNN #24315

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions