Misc. bug: Speculative decoding only works once with /v1/chat/completions

### Name and Version

 version : latest commit 89f10baad5a1809055d71110dff60e55561b9c62

### Operating systems

_No response_

### Which llama.cpp modules do you know to be affected?

_No response_

### Command line

```shell
llama-server -m translategemma-27b-it.i1-IQ4_XS.gguf  --port 1234 --host 0.0.0.0 -c 2560  --jinja  -fit on  --temp 0.05 --top_p 1.0   --chat-template-kwargs '{"source_lang_code": "en","target_lang_code": "fr"}' --spec-type ngram-simple --draft-max 64 --draft-min 24  --spec-ngram-size-n 12 -ctk q8_0 --no-cache-prompt -cram 0
```

### Problem description & steps to reproduce

Speculative Decoding only works once with the first request after it doesn't seems to work ( no more draft acceptance rate =  ... ) and slower speed  ?

```
srv          init: init: chat template, thinking = 0
main: model loaded
main: server is listening on http://0.0.0.0:1234
main: starting the main loop...
srv  update_slots: all slots are idle
srv  params_from_: Chat format: Generic
slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  3 | task 0 | processing task, is_child = 0
slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 2304, n_keep = 0, task.n_tokens = 474
slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 410, batch.n_tokens = 410, progress = 0.864979
slot update_slots: id  3 | task 0 | n_tokens = 410, memory_seq_rm [410, end)
slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 474, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  3 | task 0 | prompt done, n_tokens = 474, batch.n_tokens = 64
slot init_sampler: id  3 | task 0 | init sampler, took 0.06 ms, tokens: text = 474, total = 474
slot update_slots: id  3 | task 0 | created context checkpoint 1 of 8 (pos_min = 0, pos_max = 409, size = 127.530 MiB)
slot print_timing: id  3 | task 0 |
prompt eval time =     283.75 ms /   474 tokens (    0.60 ms per token,  1670.50 tokens per second)
       eval time =    4724.44 ms /   420 tokens (   11.25 ms per token,    88.90 tokens per second)
      total time =    5008.19 ms /   894 tokens
draft acceptance rate = 0.39054 (  223 accepted /   571 generated)
statistics ngram_simple: #calls = 196, #gen drafts = 12, #acc drafts = 8, #gen tokens = 571, #acc tokens = 223, dur = 0.118 ms
slot      release: id  3 | task 0 | stop processing: n_tokens = 893, truncated = 0
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/chat/completions 192.168.1.44 200
srv  params_from_: Chat format: Generic
slot get_availabl: id  3 | task -1 | selected slot by LCP similarity, sim_best = 0.225 (> 0.100 thold), f_keep = 0.113
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  3 | task 199 | processing task, is_child = 0
slot update_slots: id  3 | task 199 | new prompt, n_ctx_slot = 2304, n_keep = 0, task.n_tokens = 449
slot update_slots: id  3 | task 199 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 199 | prompt processing progress, n_tokens = 385, batch.n_tokens = 385, progress = 0.857461
slot update_slots: id  3 | task 199 | n_tokens = 385, memory_seq_rm [385, end)
slot update_slots: id  3 | task 199 | prompt processing progress, n_tokens = 449, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id  3 | task 199 | prompt done, n_tokens = 449, batch.n_tokens = 64
slot init_sampler: id  3 | task 199 | init sampler, took 0.05 ms, tokens: text = 449, total = 449
slot print_timing: id  3 | task 199 |
prompt eval time =     253.81 ms /   449 tokens (    0.57 ms per token,  1769.07 tokens per second)
       eval time =    7626.25 ms /   355 tokens (   21.48 ms per token,    46.55 tokens per second)
      total time =    7880.05 ms /   804 tokens
statistics ngram_simple: #calls = 550, #gen drafts = 12, #acc drafts = 8, #gen tokens = 571, #acc tokens = 223, dur = 0.149 ms
slot      release: id  3 | task 199 | stop processing: n_tokens = 803, truncated = 0
srv  update_slots: all slots are idle
```

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console

```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Speculative decoding only works once with /v1/chat/completions #19231

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Speculative decoding only works once with /v1/chat/completions #19231

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions