Name and Version
llama-server.exe --version
version: 4764 (7ad0779)
built with MSVC 19.42.34436.0 for x64
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
llama-server.exe -m %file_path_16b% --no-mmap -fa -ctk q4_0 -c 8192 -np 2 -ngl 50 --temp 0.6 -t 10 -tb 8 -C FF000 --no-perf --host 0.0.0.0 --port 3000
Problem description & steps to reproduce
prompt eval time = 16975.44 ms / 282 tokens ( 60.20 ms per token, 16.61 tokens per second)
eval time = 2257.84 ms / 28 tokens ( 80.64 ms per token, 12.40 tokens per second)
total time = 19233.28 ms / 310 tokens
srv log_server_r: request: POST /v1/chat/completions 127.0.0.1 200
srv update_slots: all slots are idle
srv params_from_: Chat format: Content-only
slot launch_slot_: id 1 | task 1773 | processing task
slot update_slots: id 1 | task 1773 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 1426
slot update_slots: id 1 | task 1773 | kv cache rm [64, end)
slot update_slots: id 1 | task 1773 | prompt processing progress, n_past = 1426, n_tokens = 1362, progress = 0.955119
slot update_slots: id 1 | task 1773 | prompt done, n_past = 1426, n_tokens = 1362
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-backend.cpp:746: pre-allocated tensor (cache_k_l0 (view) (copy of cache_k_l0 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
[process exited with code 3221226505 (0xc0000409)]
First Bad Commit
Please help to resolve the error:
pre-allocated tensor (cache_k_l0 (view) (copy of cache_k_l0 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
Relevant log output