Misc. bug: Vulkan's performance degradation(TG) on A770 from b7194 and FA problem

### Name and Version

llama-cli -v
load_backend: loaded RPC backend from C:\llm\llama-cpp\VULKAN\b7209\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(TM) A770 Graphics (Intel Corporation) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\llm\llama-cpp\VULKAN\b7209\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\llm\llama-cpp\VULKAN\b7209\ggml-cpu-haswell.dll
build: 7209 (7f8ef50cc) with clang version 19.1.5 for x86_64-pc-windows-msvc

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

llama-bench
llama-server 
### Command line

```shell
llama-bench -m T:\models\lmstudio-community\gpt-oss-20b-GGUF\gpt-oss-20b-MXFP4.gguf  -ngl 100 -fa 0,1


llama-bench -m T:\models\lmstudio-community\Meta-Llama-3.1-8B-Instruct-GGUF\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -ngl 100 -fa 0,1
```

### Problem description & steps to reproduce

Hello.

Drop in token generation (TG) performance compared to the B7189 version on the Intel Arc A770.
Between b7189 and b7209.

Driver: 8250
cpu: xeon 2699v3 x2
GPU: 1x A770

Models:
53 -> 42 t/s lmstudio-community\Meta-Llama-3.1-8B-Instruct-GGUF\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
60 -> 54 t/s lmstudio-community\gpt-oss-20b-GGUF\gpt-oss-20b-MXFP4.gguf




### B7209
```
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | Vulkan     | 100 |  0 |           pp512 |        921.00 ± 3.22 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | Vulkan     | 100 |  0 |           tg128 |         42.39 ± 0.07 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | Vulkan     | 100 |  1 |           pp512 |        280.12 ± 0.44 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | Vulkan     | 100 |  1 |           tg128 |         29.39 ± 0.02 |

build: 7f8ef50cc (7209)
```
**sometimes flash attention crashes (with no error, The bench just stops)**
<img width="1363" height="202" alt="Image" src="https://github.com/user-attachments/assets/1f327e42-a759-4f21-bd60-5eefc765c642" />

```
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | Vulkan     | 100 |  0 |           pp512 |        885.48 ± 5.59 |
| gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | Vulkan     | 100 |  0 |           tg128 |         54.20 ± 0.07 |
```

### B7189

```
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | Vulkan     | 100 |  0 |           pp512 |        918.06 ± 4.28 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | Vulkan     | 100 |  0 |           tg128 |         53.63 ± 0.10 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | Vulkan     | 100 |  1 |           pp512 |        280.26 ± 0.70 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | Vulkan     | 100 |  1 |           tg128 |         34.50 ± 0.04 |
```


```
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | Vulkan     | 100 |  0 |           pp512 |        884.45 ± 5.87 |
| gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | Vulkan     | 100 |  0 |           tg128 |         60.95 ± 0.06 |
```

### First Bad Commit

Between b7189 and b7209



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Vulkan's performance degradation(TG) on A770 from b7194 and FA problem #17628

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

B7209

B7189

First Bad Commit

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Misc. bug: Vulkan's performance degradation(TG) on A770 from b7194 and FA problem #17628

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

B7209

B7189

First Bad Commit

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions