Eval bug: crash in llama_grammar_accept_token

### Name and Version

```
./build/bin/llama-cli --version
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 7946 (7a4f97d19)
built with Clang 21.1.7 for Linux x86_64
```

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

Ryzen 5950X + NVIDIA RTX 3090

### Models

Qwen-Coder-Next-80B-A3B (unsloth UD-Q6-XL)

### Problem description & steps to reproduce

```
./build/bin/llama-server --jinja --model ~/llama/Qwen3-Coder-Next-GGUF/UD-Q6_K_XL/Qwen3-Coder
-Next-UD-Q6_K_XL-00001-of-00002.gguf --threads -1 -fa 1 -ctv q8_0 -ctk q8_0 --host 0.0.0.0 --temp 1.0 --top-k 40 --top-p 0.95 --min-p 0.01 -c 131072 --no-direct-io -fitt 384
```

### First Bad Commit

_No response_

### Relevant log output

```
slot print_timing: id  3 | task 0 |
prompt eval time =   57251.21 ms / 10634 tokens (    5.38 ms per token,   185.74 tokens per second)
       eval time =    4467.24 ms /    33 tokens (  135.37 ms per token,     7.39 tokens per second)
      total time =   61718.45 ms / 10667 tokens
slot      release: id  3 | task 0 | stop processing: n_tokens = 10666, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/chat/completions 192.168.2.218 200
srv  params_from_: Chat format: Qwen3 Coder
slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.914 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?to
p-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id  3 | task 40 | processing task, is_child = 0
slot update_slots: id  3 | task 40 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 11675
slot update_slots: id  3 | task 40 | n_tokens = 10666, memory_seq_rm [10666, end)
slot update_slots: id  3 | task 40 | prompt processing progress, n_tokens = 11611, batch.n_tokens = 945, progress = 0.994518
slot update_slots: id  3 | task 40 | n_tokens = 11611, memory_seq_rm [11611, end)
slot update_slots: id  3 | task 40 | prompt processing progress, n_tokens = 11675, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  3 | task 40 | prompt done, n_tokens = 11675, batch.n_tokens = 64
slot init_sampler: id  3 | task 40 | init sampler, took 1.06 ms, tokens: text = 11675, total = 11675
slot update_slots: id  3 | task 40 | created context checkpoint 2 of 8 (pos_min = 11610, pos_max = 11610, size = 75.376 MiB)
```

```
0x0000737214b10813 in __GI___wait4 (pid=630236, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30     ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#0  0x0000737214b10813 in __GI___wait4 (pid=630236, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.
c:30
30      in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x00007372179a14f6 in ggml_print_backtrace () from /home/morbo/git/llama.cpp/build/bin/libggml-base.so.0
#2  0x00007372179b6de6 in ggml_uncaught_exception() () from /home/morbo/git/llama.cpp/build/bin/libggml-base.so.0
#3  0x0000737214ebb0da in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x0000737214ea5a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x0000737214ebb391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x0000737217cb5571 in llama_grammar_accept_token(llama_grammar&, int, std::__cxx11::basic_string<char, std::char_traits<
char>, std::allocator<char> > const&) () from /home/morbo/git/llama.cpp/build/bin/libllama.so.0
#7  0x0000737217cb4b0f in llama_grammar_accept_impl(llama_grammar&, int) () from /home/morbo/git/llama.cpp/build/bin/libllam
a.so.0
#8  0x00005e31b24060b9 in common_sampler_accept(common_sampler*, int, bool) ()
#9  0x00005e31b21b0a39 in server_context_impl::update_slots() ()
#10 0x00005e31b22645cc in server_queue::start_loop(long) ()
#11 0x00005e31b20e7498 in main ()
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: crash in llama_grammar_accept_token #19353

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: crash in llama_grammar_accept_token #19353

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions