Skip to content

Eval bug: Vulkan llama-server crashes with vk::DeviceLostError randomly (long contexts) #19955

@el95149

Description

@el95149

Name and Version

./llama-server --version
version: 8173 (2e7e638)

Operating systems

Linux

GGML backends

Vulkan

Hardware

  • Ryzen 7900X
  • 64 GB DDR5@6000
  • X870e board
  • NVIDIA RTX 5080 16GB
  • AMD Radeon AI Pro R9700

Models

Has happened with at least the following models:

  • Unsloth Qwen3-Coder-Next-Q4_K_M.gguf
  • Unsloth Qwen3-Coder-Next-UD-Q6_K_XL
  • Unsloth Qwen3.5-35B-A3B-Q8_0

Problem description & steps to reproduce

I start the server with the following command, either using both cards or any single one (via GGML_VK_VISIBLE_DEVICES):

/llama-server \
  -m /home/<redacted>/.lmstudio/models/unsloth/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-Q4_K_M.gguf \
  --port 1234 --host 0.0.0.0 -fa on --no-mmap \
  -b 2048 -ub 2048 -c 100000 --cache-ram 0 \
  --temp 1 --top-k 40 --top-p 0.95 --repeat-penalty 1 --min-p 0.01

After a few, random, thousands worth of token processing/generating (typically above ~40K in total), I get a momentary system freeze, and then the model crashes, with any client unable to proceed with inference. (see attached log).

First Bad Commit

Was able to notice it after build b8143, since before that inference was too slow to make things really usable.

Relevant log output

Logs
[39895] srv  params_from_: Chat format: peg-constructed
[39895] slot get_availabl: id  2 | task -1 | selected slot by LRU, t_last = -1
[39895] slot launch_slot_: id  2 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
[39895] slot launch_slot_: id  2 | task 1791 | processing task, is_child = 0
[39895] slot update_slots: id  2 | task 1791 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 58741
[39895] slot update_slots: id  2 | task 1791 | n_tokens = 0, memory_seq_rm [0, end)
[39895] slot update_slots: id  2 | task 1791 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.034865
[39895] slot update_slots: id  2 | task 1791 | n_tokens = 2048, memory_seq_rm [2048, end)
[39895] slot update_slots: id  2 | task 1791 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.069730
[39895] [New LWP 67035]
[39895] [New LWP 67034]
[39895] [New LWP 67026]
[39895] [New LWP 67025]
[39895] [New LWP 67024]
[39895] [New LWP 67023]
[39895] [New LWP 67022]
[39895] [New LWP 67021]
[39895] [New LWP 67020]
[39895] [New LWP 67019]
[39895] [New LWP 67018]
[39895] [New LWP 67017]
[39895] [New LWP 67016]
[39895] [New LWP 66652]
[39895] [New LWP 66651]
[39895] [New LWP 66650]
[39895] [New LWP 66641]
[39895] [New LWP 66640]
[39895] [New LWP 66638]
[39895] [New LWP 66636]
[39895] [New LWP 66634]
[39895] [New LWP 66632]
[39895] [New LWP 66630]
[39895] [New LWP 66628]
[39895] [New LWP 66626]
[39895] [New LWP 66624]
[39895] [New LWP 66622]
[39895] [New LWP 66620]
[39895] [New LWP 66618]
[39895] [New LWP 66616]
[39895] [New LWP 66614]
[39895] [New LWP 66613]
[39895] [New LWP 66611]
[39895] [New LWP 66609]
[39895] [New LWP 66607]
[39895] [New LWP 66605]
[39895] [New LWP 66603]
[39895] [New LWP 66601]
[39895] [New LWP 66599]
[39895] [New LWP 66597]
[39895] [New LWP 66588]
[39895] [New LWP 66587]
[39895]
[39895] This GDB supports auto-downloading debuginfo from the following URLs:
[39895]   <https://debuginfod.ubuntu.com>
[39895] Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
[39895] Debuginfod has been disabled.
[39895] To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[39895] [Thread debugging using libthread_db enabled]
[39895] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[39895] __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
[39895] warning: 56     ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S: No such file or directory
[39895] #0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
[39895] 56      in ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S
[39895] #1  0x0000733182ea013c in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=0, a6=0, nr=61) at ./nptl/cancellation.c:49
[39895] warning: 49     ./nptl/cancellation.c: No such file or directory
[39895] #2  __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:75
[39895] 75      in ./nptl/cancellation.c
[39895] #3  0x0000733182f1ca0f in __GI___wait4 (pid=<optimized out>, stat_loc=<optimized out>, options=<optimized out>, usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait4.c:30
[39895] warning: 30     ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
[39895] #4  0x000073318402db3b in ggml_print_backtrace () from /home/aanagnostopoulos/llama-vulkan/libggml-base.so.0
[39895] #5  0x000073318404138f in ggml_uncaught_exception() () from /home/aanagnostopoulos/llama-vulkan/libggml-base.so.0
[39895] #6  0x00007331832c11ea in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
[39895] #7  0x00007331832aaa9c in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
[39895] #8  0x00007331832c14a1 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
[39895] #9  0x000073317f67e3a7 in ggml_vk_wait_for_fence(ggml_backend_vk_context*) [clone .cold] () from /home/aanagnostopoulos/llama-vulkan/libggml-vulkan.so
[39895] #10 0x000073317f6cbb1d in ggml_vk_synchronize(ggml_backend_vk_context*) () from /home/aanagnostopoulos/llama-vulkan/libggml-vulkan.so
[39895] #11 0x000073317f6cbca4 in ggml_backend_vk_synchronize(ggml_backend*) () from /home/aanagnostopoulos/llama-vulkan/libggml-vulkan.so
[39895] #12 0x000073318404a9c6 in ggml_backend_sched_graph_compute_async () from /home/aanagnostopoulos/llama-vulkan/libggml-base.so.0
[39895] #13 0x00007331836b9921 in llama_context::graph_compute(ggml_cgraph*, bool) () from /home/aanagnostopoulos/llama-vulkan/libllama.so.0
[39895] #14 0x00007331836b9d55 in llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) () from /home/aanagnostopoulos/llama-vulkan/libllama.so.0
[39895] #15 0x00007331836c1aa3 in llama_context::decode(llama_batch const&) () from /home/aanagnostopoulos/llama-vulkan/libllama.so.0
[39895] #16 0x00007331836c30e0 in llama_decode () from /home/aanagnostopoulos/llama-vulkan/libllama.so.0
[39895] #17 0x000058e28ce22b82 in server_context_impl::update_slots() ()
[39895] #18 0x000058e28ce5f226 in server_queue::start_loop(long) ()
[39895] #19 0x000058e28cd7adae in main ()
[39895] [Inferior 1 (process 66557) detached]
[39895] terminate called after throwing an instance of 'vk::DeviceLostError'
[39895]   what():  vk::Device::waitForFences: ErrorDeviceLost
srv    operator(): http client error: Failed to read connection
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 500
srv  proxy_reques: proxying request to model qwen3-coder-next on port 39895

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions