Skip to content

server : handle context overflow during decode#17267

Merged
ggerganov merged 2 commits intomasterfrom
gg/server-fix-decode-error-handling
Nov 16, 2025
Merged

server : handle context overflow during decode#17267
ggerganov merged 2 commits intomasterfrom
gg/server-fix-decode-error-handling

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Nov 14, 2025

fix #17260

  • If we overrun the total context size, clear the active slots
  • Rename purge -> clear
  • Remove obsolete kv_cache_clear()

@ggerganov ggerganov force-pushed the gg/server-fix-decode-error-handling branch from 82eb17b to 741baaf Compare November 14, 2025 12:04
@ggerganov ggerganov marked this pull request as ready for review November 14, 2025 12:05
@ggerganov ggerganov requested a review from ngxson as a code owner November 14, 2025 12:05
@ggerganov ggerganov merged commit 5b2093b into master Nov 16, 2025
72 checks passed
@ggerganov ggerganov deleted the gg/server-fix-decode-error-handling branch November 16, 2025 07:23
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* server : handle context overflow during decode

* server : minor refactor
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* server : handle context overflow during decode

* server : minor refactor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Recent updates lead to /infill requests on the Qwen2.5-Coder model failing and ultimately crashing.

2 participants