server : fix swa-full logic by ggerganov · Pull Request #22288 · ggml-org/llama.cpp

ggerganov · 2026-04-23T12:50:58Z

Overview

Simplify the logic by augmenting llama_model_n_swa with a server_context.n_swa member. When --swa-full is passed, we set n_swa = 0 to simulate a non-SWA model

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

shipped-it · 2026-04-23T13:16:43Z

I'm a bit unsure about the approach. Why would the model be reported as non-SWA? It is still SWA, just with a full-size cache.

Also, you've made the suggestion to use --swa-full --no-mmproj. Correct me if I'm wrong, but in practice I've seen repetitions issues when using Gemma 4.

I will confirm shortly if this fixes the issue, or if I get the repetition problem.

ggerganov · 2026-04-23T13:23:42Z

Also, you've made the suggestion to use --swa-full --no-mmproj.

Cache reuse does not work with mmproj.

Correct me if I'm wrong, but in practice I've seen repetitions issues when using Gemma 4.

I am not following. This fixes the cache reuse logic - I am not aware of any repetitions.

shipped-it · 2026-04-23T19:27:45Z

I confirm that it is fixed with this PR or #21749 (tested on ROCm)

Without PR:
warm req: prompt_n=821, prompt_ms=982

With PR:
warm req: prompt_n=5, prompt_ms=71 so about 13x faster

server : fix swa-full logic

d7e88f7

ggerganov requested a review from a team as a code owner April 23, 2026 12:50

This was referenced Apr 23, 2026

cache reuse is not supported for Gemma 4 models despite -fa enabled and --swa-full #21468

Closed

server: ensure prompt caching for SWA models #21749

Closed

github-actions Bot added examples server labels Apr 23, 2026

ggerganov merged commit ffdd983 into master Apr 24, 2026
42 of 47 checks passed

ggerganov deleted the gg/server-fix-n-swa branch April 24, 2026 07:17

IntelNav pushed a commit to IntelNav/llama.cpp that referenced this pull request Apr 29, 2026

server : fix swa-full logic (ggml-org#22288)

a6b5769

IntelNav pushed a commit to IntelNav/llama.cpp that referenced this pull request Apr 29, 2026

server : fix swa-full logic (ggml-org#22288)

bf284e5

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

server : fix swa-full logic (ggml-org#22288)

d84bcfb

samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request May 6, 2026

server : fix swa-full logic (ggml-org#22288)

02180a7

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

server : fix swa-full logic (ggml-org#22288)

2c3102e

meh pushed a commit to meh/llama.cpp that referenced this pull request May 10, 2026

server : fix swa-full logic (ggml-org#22288)

a03408e

This was referenced May 12, 2026

Bug: Please merge upstream PR #22288 for Gemma4 SWA cache reuse fix TheTom/llama-cpp-turboquant#141

Closed

Feature Request: Please merge upstream PR #22288 for Gemma4 SWA cache reuse fix TheTom/llama-cpp-turboquant#142

Open

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

server : fix swa-full logic (ggml-org#22288)

e315856

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

server : fix swa-full logic (ggml-org#22288)

1825c79

baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026

server : fix swa-full logic (ggml-org#22288)

2d91b97

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

server : fix swa-full logic (ggml-org#22288)

c859b21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : fix swa-full logic#22288

server : fix swa-full logic#22288
ggerganov merged 1 commit into
masterfrom
gg/server-fix-n-swa

ggerganov commented Apr 23, 2026

Uh oh!

shipped-it commented Apr 23, 2026

Uh oh!

ggerganov commented Apr 23, 2026

Uh oh!

shipped-it commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented Apr 23, 2026

Overview

Requirements

Uh oh!

shipped-it commented Apr 23, 2026

Uh oh!

ggerganov commented Apr 23, 2026

Uh oh!

shipped-it commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants