Skip to content

Kimi Linear fix conv state update#19531

Merged
CISC merged 95 commits intoggml-org:masterfrom
ymcki:Kimi-Linear
Feb 13, 2026
Merged

Kimi Linear fix conv state update#19531
CISC merged 95 commits intoggml-org:masterfrom
ymcki:Kimi-Linear

Conversation

@ymcki
Copy link
Contributor

@ymcki ymcki commented Feb 12, 2026

Make sure to read the contributing guidelines before submitting a PR

The current implementation has incorrect conv state update such that it has state corruption when running parallel in llama-server. This is fixed in this PR.

./build/bin/llama-server -c 16384 --parallel 8 --mmap -m ~/Kimi-Linear-48B-A3B-Instruct-GGUF/Kimi-Linear-48B-A3B-Instruct-jp-imatrix.IQ3_M.gguf -ngl 100

This PR also includes the block implementation that speeds up 20% pp and VRAM saving.

@ymcki ymcki requested a review from CISC as a code owner February 12, 2026 01:12
@CISC
Copy link
Member

CISC commented Feb 12, 2026

This PR also includes the block implementation that speeds up 20% pp and VRAM saving.

Please take this out again and make this a purely bugfix PR.

The block implementation can be done in a separate PR, however it's worth noting that there are several incoming improvements to this in #19375 that can be applied here also.

@ymcki ymcki changed the title Kimi Linear (correct conv state update + block implementation) Kimi Linear fix conv state update Feb 13, 2026
@ymcki
Copy link
Contributor Author

ymcki commented Feb 13, 2026

Now only fix conv state update.

@CISC CISC merged commit 33a56f9 into ggml-org:master Feb 13, 2026
6 of 76 checks passed
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
* fix conv state update for llama-server parallel serving

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
* fix conv state update for llama-server parallel serving

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Mar 3, 2026
* fix conv state update for llama-server parallel serving

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants