Name and Version
$ llama-server --version
version: 7968 (9a5f577)
built with GNU 13.3.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
AMD EPYC 7532 + 7x 3090
Models
https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF
https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/blob/main/Qwen3-Coder-Next-Q4_K_M.gguf
Problem description & steps to reproduce
It seems like in cases were multiple tool calls are made in a row or if the model outputs both content and tool calls, it crashes llama-server.
First Bad Commit
No response
Relevant log output
warning: 30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
terminate called after throwing an instance of 'std::runtime_error'
what(): Unexpected empty grammar stack after accepting piece: =list (40972)