srv update_slots: all slots are idle
[INFO] Request 192.168.2.203 "GET /v1/models HTTP/1.1" 200 2389 "Python/3.11 aiohttp/3.13.5" 180.049µs
[INFO] Request 192.168.2.203 "GET /v1/models HTTP/1.1" 200 2389 "Python/3.11 aiohttp/3.13.5" 168.497µs
srv params_from_: Chat format: peg-native
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 883459233
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 34816, total state size = 1206.994 MiB (draft: 0.000 MiB)
srv load: - looking for better prompt, base f_keep = 0.000, sim = 0.000
srv update: - cache state: 1 prompts, 1952.310 MiB (limits: 8192.000 MiB, 262144 tokens, 262144 est)
srv update: - prompt 0x5b6c9f8f6740: 34816 tokens, checkpoints: 4, 1952.310 MiB
srv get_availabl: prompt cache update took 16682.15 ms
reasoning-budget: activated, budget=2147483647 tokens
common_sampler_init: backend sampling is not compatible with grammar, disabling
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 19 | processing task, is_child = 0
slot update_slots: id 0 | task 19 | new prompt, n_ctx_slot = 262144, n_keep = 0, task.n_tokens = 62689
slot update_slots: id 0 | task 19 | n_past = 1, slot.prompt.tokens.size() = 34816, seq_id = 0, pos_min = 34815, n_swa = 0
slot update_slots: id 0 | task 19 | Checking checkpoint with [32767, 32767] against 1...
slot update_slots: id 0 | task 19 | Checking checkpoint with [24575, 24575] against 1...
slot update_slots: id 0 | task 19 | Checking checkpoint with [16383, 16383] against 1...
slot update_slots: id 0 | task 19 | Checking checkpoint with [8191, 8191] against 1...
slot update_slots: id 0 | task 19 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
slot update_slots: id 0 | task 19 | erased invalidated context checkpoint (pos_min = 8191, pos_max = 8191, n_tokens = 8192, n_swa = 0, pos_next = 0, size = 186.329 MiB)
slot update_slots: id 0 | task 19 | erased invalidated context checkpoint (pos_min = 16383, pos_max = 16383, n_tokens = 16384, n_swa = 0, pos_next = 0, size = 186.329 MiB)
slot update_slots: id 0 | task 19 | erased invalidated context checkpoint (pos_min = 24575, pos_max = 24575, n_tokens = 24576, n_swa = 0, pos_next = 0, size = 186.329 MiB)
slot update_slots: id 0 | task 19 | erased invalidated context checkpoint (pos_min = 32767, pos_max = 32767, n_tokens = 32768, n_swa = 0, pos_next = 0, size = 186.329 MiB)
slot update_slots: id 0 | task 19 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.032669
slot update_slots: id 0 | task 19 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.065338
slot update_slots: id 0 | task 19 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.098008
slot update_slots: id 0 | task 19 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.130677
slot update_slots: id 0 | task 19 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id 0 | task 19 | 8192 tokens since last checkpoint at 0, creating new checkpoint during processing at position 10240
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 10240, batch.n_tokens = 2048, progress = 0.163346
slot create_check: id 0 | task 19 | created context checkpoint 1 of 32 (pos_min = 8191, pos_max = 8191, n_tokens = 8192, size = 186.329 MiB)
slot update_slots: id 0 | task 19 | n_tokens = 10240, memory_seq_rm [10240, end)
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 2048, progress = 0.196015
slot update_slots: id 0 | task 19 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 14336, batch.n_tokens = 2048, progress = 0.228684
slot update_slots: id 0 | task 19 | n_tokens = 14336, memory_seq_rm [14336, end)
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 16384, batch.n_tokens = 2048, progress = 0.261354
slot update_slots: id 0 | task 19 | n_tokens = 16384, memory_seq_rm [16384, end)
slot update_slots: id 0 | task 19 | 8192 tokens since last checkpoint at 8192, creating new checkpoint during processing at position 18432
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 18432, batch.n_tokens = 2048, progress = 0.294023
slot create_check: id 0 | task 19 | created context checkpoint 2 of 32 (pos_min = 16383, pos_max = 16383, n_tokens = 16384, size = 186.329 MiB)
slot update_slots: id 0 | task 19 | n_tokens = 18432, memory_seq_rm [18432, end)
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 20480, batch.n_tokens = 2048, progress = 0.326692
slot update_slots: id 0 | task 19 | n_tokens = 20480, memory_seq_rm [20480, end)
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 22528, batch.n_tokens = 2048, progress = 0.359361
slot update_slots: id 0 | task 19 | n_tokens = 22528, memory_seq_rm [22528, end)
slot update_slots: id 0 | task 19 | prompt processing progress, n_tokens = 24576, batch.n_tokens = 2048, progress = 0.392031
2026/05/12 21:22:57 http: proxy error: net/http: timeout awaiting response headers
[WARN] metrics skipped, HTTP status=502, path=/v1/chat/completions
[INFO] Request 192.168.2.203 "POST /v1/chat/completions HTTP/1.1" 502 -1 "Python/3.11 aiohttp/3.13.5" 1m0.006186774s
srv next: stopping wait for next result due to should_stop condition (adjust the --timeout argument if needed)
srv next: ref: https://github.com/ggml-org/llama.cpp/pull/22907
srv stop: cancel task, id_task = 19
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
slot release: id 0 | task 19 | stop processing: n_tokens = 24576, truncated = 0
srv update_slots: all slots are idle
Name and Version
$ ./llama-cli --version
version: 9125 (dded58b)
built with GNU 13.3.0 for Linux x86_64
Operating systems
Linux
GGML backends
Vulkan
Hardware
5x Radeon Pro W7900
Models
No response
Problem description & steps to reproduce
When prompt processing takes more than 60 seconds, llama-server instantly terminates the prompt processing with a
stopping wait for next result due to should_stop condition (adjust the --timeout argument if needed)error, even when the --timeout flag is manually set to a higher value (600 seconds). This appears to be a regression, as this behavior did not occur in previous versions.Steps to Reproduce
Start llama-server
Send a prompt that takes > 60 seconds to process
Observe that after approximately 60 seconds, the server terminates the slot regardless of the timeout setting
Note: I am using llama-swap on top of llama.cpp, hence logs will include llama-swap info
First Bad Commit
No response
Relevant log output
Logs