server: reset cache tokens after pp stops halfway by firecoperana · Pull Request #1787 · ikawrakow/ik_llama.cpp

firecoperana · 2026-05-12T13:30:16Z

When I was working on #1722, I noticed that when PP stops halfway, it treats like the current batch has been completed, but that is wrong. I don't know how to find where PP stops inside the batch, so the fix is to reset the status of tokens and kv cache to the start of the batch.

Also relax the condition to reset the llama decode flag. Could fix #1673 (comment).

ikawrakow · 2026-05-12T13:47:57Z

How can PP stop halfway? Via the user sending a stop command?

firecoperana · 2026-05-12T13:57:58Z

yes

ikawrakow · 2026-05-12T14:22:18Z

The batch is processed in u-batches. Checks for cancellation are only done after each u-batch has been processed. If you call llama_kv_cache_seq_pos_max with the sequence id of the slot, it should give you the last position that has been processed. Or not?

Scratch that. I see that the check for cancellation is being done after the sequence has been added to the KV cache cells, but before the u-batch has been actually computed. This is kind of stupid. I guess, we need to move the cancellation check either to the beginning or to the end of the loop over u-batches. So that the system is in a consistent state when a computation is cancelled. The same inconsistent state can also be achieved during TG, but TG is much faster, so the probability of the cancellation arriving after the KV cache cells have been manipulated but before the token has been computed is much smaller.

firecoperana · 2026-05-12T14:51:09Z

Yeah, I was not realizing that since I'm not familiar with how the inside of llama_decode works. Can you show me where should I do the cancellation check?

ikawrakow · 2026-05-12T16:20:57Z

Either at the beginning of this loop

ik_llama.cpp/src/llama.cpp

Line 4558 in f9a93c3

for (uint32_t cur_token = 0; cur_token < n_tokens_all; ) {

or at the end of it.

Instead of checking here:

ik_llama.cpp/src/llama.cpp

Line 4691 in f9a93c3

if (stop_internal_decode) {

firecoperana · 2026-05-13T00:32:13Z

Thanks! The stop signal works reliably if I put it at the end of the loop, but not at the beginning of the loop for some reason.

firecoperana mentioned this pull request May 12, 2026

server: fix ret=-3 on hybrid/recurrent prompt cache and clear sticky stop flag #1673

Merged

4 tasks

firecoperana changed the title ~~server: reset cache tokens after pp stops~~ server: reset cache tokens after pp stops halfway May 12, 2026

server: reset cache tokens after pp stops

7ff12d6

firecoperana force-pushed the fcp/fix_pp_stop branch from abc2429 to 7ff12d6 Compare May 13, 2026 00:30

ikawrakow approved these changes May 13, 2026

View reviewed changes

ikawrakow merged commit cdc288b into main May 13, 2026

firecoperana deleted the fcp/fix_pp_stop branch May 31, 2026 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: reset cache tokens after pp stops halfway#1787

server: reset cache tokens after pp stops halfway#1787
ikawrakow merged 1 commit into
mainfrom
fcp/fix_pp_stop

firecoperana commented May 12, 2026

Uh oh!

ikawrakow commented May 12, 2026

Uh oh!

firecoperana commented May 12, 2026

Uh oh!

ikawrakow commented May 12, 2026

Uh oh!

firecoperana commented May 12, 2026

Uh oh!

ikawrakow commented May 12, 2026

Uh oh!

firecoperana commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

firecoperana commented May 12, 2026

Uh oh!

ikawrakow commented May 12, 2026

Uh oh!

firecoperana commented May 12, 2026

Uh oh!

ikawrakow commented May 12, 2026

Uh oh!

firecoperana commented May 12, 2026

Uh oh!

ikawrakow commented May 12, 2026

Uh oh!

firecoperana commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

firecoperana commented May 13, 2026 •

edited

Loading