Eval bug: Vulkan llama-server crashes with vk::DeviceLostError randomly (long contexts)

### Name and Version

./llama-server --version
version: 8173 (2e7e63852)


### Operating systems

Linux

### GGML backends

Vulkan

### Hardware

- Ryzen 7900X
- 64 GB DDR5@6000
- X870e board
- NVIDIA RTX 5080 16GB
- AMD Radeon AI Pro R9700


### Models

Has happened with at least the following models:
- Unsloth Qwen3-Coder-Next-Q4_K_M.gguf
- Unsloth Qwen3-Coder-Next-UD-Q6_K_XL
- Unsloth Qwen3.5-35B-A3B-Q8_0

### Problem description & steps to reproduce

I start the server with the following command, either using both cards or any single one (via GGML_VK_VISIBLE_DEVICES):

```bash
/llama-server \
  -m /home/<redacted>/.lmstudio/models/unsloth/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-Q4_K_M.gguf \
  --port 1234 --host 0.0.0.0 -fa on --no-mmap \
  -b 2048 -ub 2048 -c 100000 --cache-ram 0 \
  --temp 1 --top-k 40 --top-p 0.95 --repeat-penalty 1 --min-p 0.01
```
After a few, random, thousands worth of token processing/generating (typically above ~40K in total), I get a momentary system freeze, and then the model crashes, with any client unable to proceed with inference. (see attached log).

### First Bad Commit

Was able to notice it after build [b8143](https://github.com/ggml-org/llama.cpp/releases/tag/b8143), since before that inference was too slow to make things really usable.

### Relevant log output

<details>
<summary>Logs</summary>


```console
[39895] srv  params_from_: Chat format: peg-constructed
[39895] slot get_availabl: id  2 | task -1 | selected slot by LRU, t_last = -1
[39895] slot launch_slot_: id  2 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
[39895] slot launch_slot_: id  2 | task 1791 | processing task, is_child = 0
[39895] slot update_slots: id  2 | task 1791 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 58741
[39895] slot update_slots: id  2 | task 1791 | n_tokens = 0, memory_seq_rm [0, end)
[39895] slot update_slots: id  2 | task 1791 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.034865
[39895] slot update_slots: id  2 | task 1791 | n_tokens = 2048, memory_seq_rm [2048, end)
[39895] slot update_slots: id  2 | task 1791 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.069730
[39895] [New LWP 67035]
[39895] [New LWP 67034]
[39895] [New LWP 67026]
[39895] [New LWP 67025]
[39895] [New LWP 67024]
[39895] [New LWP 67023]
[39895] [New LWP 67022]
[39895] [New LWP 67021]
[39895] [New LWP 67020]
[39895] [New LWP 67019]
[39895] [New LWP 67018]
[39895] [New LWP 67017]
[39895] [New LWP 67016]
[39895] [New LWP 66652]
[39895] [New LWP 66651]
[39895] [New LWP 66650]
[39895] [New LWP 66641]
[39895] [New LWP 66640]
[39895] [New LWP 66638]
[39895] [New LWP 66636]
[39895] [New LWP 66634]
[39895] [New LWP 66632]
[39895] [New LWP 66630]
[39895] [New LWP 66628]
[39895] [New LWP 66626]
[39895] [New LWP 66624]
[39895] [New LWP 66622]
[39895] [New LWP 66620]
[39895] [New LWP 66618]
[39895] [New LWP 66616]
[39895] [New LWP 66614]
[39895] [New LWP 66613]
[39895] [New LWP 66611]
[39895] [New LWP 66609]
[39895] [New LWP 66607]
[39895] [New LWP 66605]
[39895] [New LWP 66603]
[39895] [New LWP 66601]
[39895] [New LWP 66599]
[39895] [New LWP 66597]
[39895] [New LWP 66588]
[39895] [New LWP 66587]
[39895]
[39895] This GDB supports auto-downloading debuginfo from the following URLs:
[39895]   <https://debuginfod.ubuntu.com>
[39895] Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
[39895] Debuginfod has been disabled.
[39895] To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[39895] [Thread debugging using libthread_db enabled]
[39895] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[39895] __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
[39895] warning: 56     ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S: No such file or directory
[39895] #0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
[39895] 56      in ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S
[39895] #1  0x0000733182ea013c in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=0, a6=0, nr=61) at ./nptl/cancellation.c:49
[39895] warning: 49     ./nptl/cancellation.c: No such file or directory
[39895] #2  __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:75
[39895] 75      in ./nptl/cancellation.c
[39895] #3  0x0000733182f1ca0f in __GI___wait4 (pid=<optimized out>, stat_loc=<optimized out>, options=<optimized out>, usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait4.c:30
[39895] warning: 30     ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
[39895] #4  0x000073318402db3b in ggml_print_backtrace () from /home/aanagnostopoulos/llama-vulkan/libggml-base.so.0
[39895] #5  0x000073318404138f in ggml_uncaught_exception() () from /home/aanagnostopoulos/llama-vulkan/libggml-base.so.0
[39895] #6  0x00007331832c11ea in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
[39895] #7  0x00007331832aaa9c in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
[39895] #8  0x00007331832c14a1 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
[39895] #9  0x000073317f67e3a7 in ggml_vk_wait_for_fence(ggml_backend_vk_context*) [clone .cold] () from /home/aanagnostopoulos/llama-vulkan/libggml-vulkan.so
[39895] #10 0x000073317f6cbb1d in ggml_vk_synchronize(ggml_backend_vk_context*) () from /home/aanagnostopoulos/llama-vulkan/libggml-vulkan.so
[39895] #11 0x000073317f6cbca4 in ggml_backend_vk_synchronize(ggml_backend*) () from /home/aanagnostopoulos/llama-vulkan/libggml-vulkan.so
[39895] #12 0x000073318404a9c6 in ggml_backend_sched_graph_compute_async () from /home/aanagnostopoulos/llama-vulkan/libggml-base.so.0
[39895] #13 0x00007331836b9921 in llama_context::graph_compute(ggml_cgraph*, bool) () from /home/aanagnostopoulos/llama-vulkan/libllama.so.0
[39895] #14 0x00007331836b9d55 in llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) () from /home/aanagnostopoulos/llama-vulkan/libllama.so.0
[39895] #15 0x00007331836c1aa3 in llama_context::decode(llama_batch const&) () from /home/aanagnostopoulos/llama-vulkan/libllama.so.0
[39895] #16 0x00007331836c30e0 in llama_decode () from /home/aanagnostopoulos/llama-vulkan/libllama.so.0
[39895] #17 0x000058e28ce22b82 in server_context_impl::update_slots() ()
[39895] #18 0x000058e28ce5f226 in server_queue::start_loop(long) ()
[39895] #19 0x000058e28cd7adae in main ()
[39895] [Inferior 1 (process 66557) detached]
[39895] terminate called after throwing an instance of 'vk::DeviceLostError'
[39895]   what():  vk::Device::waitForFences: ErrorDeviceLost
srv    operator(): http client error: Failed to read connection
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 500
srv  proxy_reques: proxying request to model qwen3-coder-next on port 39895
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Vulkan llama-server crashes with vk::DeviceLostError randomly (long contexts) #19955

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Vulkan llama-server crashes with vk::DeviceLostError randomly (long contexts) #19955

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions