llama : use n_swa + n_ubatch cells for SWA cache#13833
Conversation
1bce7e8 to
6468631
Compare
|
I'll try testing. |
6468631 to
ef5bb61
Compare
tools/server/server.cpp
Outdated
| const auto pos_min = llama_kv_self_seq_pos_min(ctx, slot.id); | ||
| if (pos_min > 0) { | ||
| SLT_WRN(slot, "n_past = %d, cache_tokens.size() = %d, seq_id = %d, pos_min = %d\n", slot.n_past, (int) slot.cache_tokens.size(), slot.id, pos_min); | ||
| if (pos_min == -1 || pos_min > slot.n_past - n_swa) { |
There was a problem hiding this comment.
pos_min == -1 meaning the seq is empty. In this case, I think the behavior of setting n_past = 0 is expected, so we don't necessary need to log the warning
There was a problem hiding this comment.
If the sequence is not present in the KV cache (i.e. pos_min == -1), but we somehow decided that slot.n_past > 0 (see the condition above) then this is still unexpected. I think we might even want to abort in such cases, because it means there is a bug somewhere.
There was a problem hiding this comment.
Apparently there is a bug somewhere, indeed, because, when enabling prompt caching, requests are failing due to pos_min being -1, e.g. see mudler/LocalAI#553 (comment)
It is tracked as llama.cpp bug #17118.
ef5bb61 to
4a9253a
Compare
8342295 to
855b397
Compare
|
I got this error just now. Version: Not sure why Nix stable has such an ancient version, will upgrade. |
target #13845
Overview