Name and Version
llama-server: b8648
Operating systems
Linux
GGML backends
CUDA
Hardware
RTX 5090 32GB vram
Models
Gemma4-31B-it-Q5-K-M
Problem description & steps to reproduce
Context shift isn't working when using Gemma-4. kv-quantization with context-shift isn't working aswell. :
using latest build of llamacpp rn.
First Bad Commit
No response
Relevant log output
slot update_slots: id 3 | task 102 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
Name and Version
llama-server: b8648
Operating systems
Linux
GGML backends
CUDA
Hardware
RTX 5090 32GB vram
Models
Gemma4-31B-it-Q5-K-M
Problem description & steps to reproduce
Context shift isn't working when using Gemma-4. kv-quantization with context-shift isn't working aswell. :
using latest build of llamacpp rn.
First Bad Commit
No response
Relevant log output
slot update_slots: id 3 | task 102 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)