Misc. bug: Totally broken for agentic use on Intel dGPUs

### Name and Version

7719

### Operating systems

Windows, Linux

### Which llama.cpp modules do you know to be affected?

_No response_

### Command line

```shell

```

### Problem description & steps to reproduce

Sorry if the title sounds harsh but this is my experience :( Don't want to be ungrateful in any way, just hoping this could be resolved/improved somehow.

### Quick summary
Hardware specs:
- Ryzen 5600g
- 64gb DDR4
- Intel arc pro b50(battlemage)

**Issue**:
When trying to use llama.cpp(tried both, vulkan and sycl) for agentic coding, no matter config(fa, ub, b, different offloading configs, etc.), prompt processing basically goes down to single digits and stalls on first couple of agent's steps.

Models I've tried: Devstral 2 small 24b, GPT-OSS 120b.

I realize that the hardware I have is limited and models are pretty hefty for it but I do not expect 5090 level of perfomance. Main problem that I basically can't use it at all even if I leave it overnight because agent literarily hangs.

### Details
Again, I'm not aiming at realtime performance, but would be great if it at least somewhat worked. When context is filled even a bit(at least for agentic use) to 4k-8k tokens, processing almost stops.

**What I tried already**:
- Windows & Linux
- Updating to latest devel kernel and mesa(26.0)
- Different llama.cpp builds, including latest one, which contains XE2 improvements
- FA on/off, ctk,ctv quant off/on(q8_0)
- Different combinations of UB,B
- Different offloading techniques(none; only experts; fill as much as I can)
- Vulkan mainly, tried SYCL as well but SYCL is totally broken for me, when context fills a bit I get black screen and OOM(left 8 gigs free on GPU but didn't help much)

**Spotted quite a few observations**:
- So, when benching without FA, at least devstral 24b gives around 300(pp), with FA enabled it basically goes down to ~15-20. I though that I could use it without FA but still this is not the case, on real agentic use it still stalls, despite bench telling me that around 200 PPS on 4096 context should be expected. In reality, it starts somewhere around 30 and goes down to single digits after just couple agent's iterations. Context is not even close to full at this point.
- When using GPT-OSS, CPU is always used during PP, even without FA on. FA seems to not have measurable effect on GPT-OSS somehow, but performance still goes down to single digits quite early.
- When context goes past 4k-8k tokens, GPU util starts spiking from 0 to 100%, further it goes, zero gaps become wider. It seems like something is not working for Intel here at all.

Unfortunatelly, I'm not familiar with shader development so won't be able to contribute. I can do testing as much as needed though so if anyone has any ideas on how to debug it, I'm all yours. Thank you

lama-bench -m ~/LLMs/Models/Devstral-Small-2-24B-Instruct-2512-IQ4_XS.gguf -fa 0,1 -d 0,4096
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Intel(R) Arc(tm) Pro B50 Graphics (BMG G21) (Intel open-source Mesa driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| mistral3 14B IQ4_XS - 4.25 bpw |  11.89 GiB |    23.57 B | Vulkan     |  99 |  0 |           pp512 |        256.10 ± 2.01 |
| mistral3 14B IQ4_XS - 4.25 bpw |  11.89 GiB |    23.57 B | Vulkan     |  99 |  0 |           tg128 |          5.69 ± 0.00 |
| mistral3 14B IQ4_XS - 4.25 bpw |  11.89 GiB |    23.57 B | Vulkan     |  99 |  0 |   pp512 @ d4096 |        214.98 ± 0.75 |
| mistral3 14B IQ4_XS - 4.25 bpw |  11.89 GiB |    23.57 B | Vulkan     |  99 |  0 |   tg128 @ d4096 |          5.46 ± 0.02 |
| mistral3 14B IQ4_XS - 4.25 bpw |  11.89 GiB |    23.57 B | Vulkan     |  99 |  1 |           pp512 |        204.15 ± 1.81 |
| mistral3 14B IQ4_XS - 4.25 bpw |  11.89 GiB |    23.57 B | Vulkan     |  99 |  1 |           tg128 |          5.63 ± 0.00 |
| mistral3 14B IQ4_XS - 4.25 bpw |  11.89 GiB |    23.57 B | Vulkan     |  99 |  1 |   pp512 @ d4096 |         15.31 ± 0.00 |
| mistral3 14B IQ4_XS - 4.25 bpw |  11.89 GiB |    23.57 B | Vulkan     |  99 |  1 |   tg128 @ d4096 |          3.68 ± 0.00 |

### First Bad Commit

b7064

### Relevant log output

<details>
<summary>Logs</summary>


```console

```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Totally broken for agentic use on Intel dGPUs #18808

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Quick summary

Details

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

model	size	params	backend	ngl	fa	test	t/s
mistral3 14B IQ4_XS - 4.25 bpw	11.89 GiB	23.57 B	Vulkan	99	0	pp512	256.10 ± 2.01
mistral3 14B IQ4_XS - 4.25 bpw	11.89 GiB	23.57 B	Vulkan	99	0	tg128	5.69 ± 0.00
mistral3 14B IQ4_XS - 4.25 bpw	11.89 GiB	23.57 B	Vulkan	99	0	pp512 @ d4096	214.98 ± 0.75
mistral3 14B IQ4_XS - 4.25 bpw	11.89 GiB	23.57 B	Vulkan	99	0	tg128 @ d4096	5.46 ± 0.02
mistral3 14B IQ4_XS - 4.25 bpw	11.89 GiB	23.57 B	Vulkan	99	1	pp512	204.15 ± 1.81
mistral3 14B IQ4_XS - 4.25 bpw	11.89 GiB	23.57 B	Vulkan	99	1	tg128	5.63 ± 0.00
mistral3 14B IQ4_XS - 4.25 bpw	11.89 GiB	23.57 B	Vulkan	99	1	pp512 @ d4096	15.31 ± 0.00
mistral3 14B IQ4_XS - 4.25 bpw	11.89 GiB	23.57 B	Vulkan	99	1	tg128 @ d4096	3.68 ± 0.00

Misc. bug: Totally broken for agentic use on Intel dGPUs #18808

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Quick summary

Details

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions