Name and Version
716bd6d
bisected
Operating systems
Linux
GGML backends
Vulkan
Hardware
amdgpu 8g
Models
Qwen2.5-Coder-14B-Instruct-Q4_K_M
or any model with similar size
Problem description & steps to reproduce
on c250ecb . the weight part of model can fit into vram. left only context/kv cache on gtt. memory usage is 8166m vram + 2271m gtt.
but on 716bd6d . memory usage is 6342m vram + 4107m gtt. significantly slowed down the tg speed.
First Bad Commit
716bd6d
Relevant log output
no difference on log output