Kimi Linear fix conv state update by ymcki · Pull Request #19531 · ggml-org/llama.cpp

ymcki · 2026-02-12T01:12:28Z

Make sure to read the contributing guidelines before submitting a PR

The current implementation has incorrect conv state update such that it has state corruption when running parallel in llama-server. This is fixed in this PR.

./build/bin/llama-server -c 16384 --parallel 8 --mmap -m ~/Kimi-Linear-48B-A3B-Instruct-GGUF/Kimi-Linear-48B-A3B-Instruct-jp-imatrix.IQ3_M.gguf -ngl 100

This PR also includes the block implementation that speeds up 20% pp and VRAM saving.

…variable warning

…imiLinear

…t for faster inference. sync'd to b7682

…_dim

…a-graph.cpp

sync with latest

…near sync to latest

CISC · 2026-02-12T18:09:11Z

This PR also includes the block implementation that speeds up 20% pp and VRAM saving.

Please take this out again and make this a purely bugfix PR.

The block implementation can be done in a separate PR, however it's worth noting that there are several incoming improvements to this in #19375 that can be applied here also.

ymcki · 2026-02-13T00:07:38Z

Now only fix conv state update.

* fix conv state update for llama-server parallel serving --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>

ymcki and others added 30 commits December 2, 2025 08:35

kimi linear model implementation

27baad4

kimi linear convert_hf_to_gguf

84f822c

kimi linear constants.py tensor_mapping.py

57cca52

Kimi Linear ggml.h

6167f39

kimi linear ggml-cpu

26a6553

Kimi Linear ggml-cuda

bf42bc0

Kimi Linear ggml.c

d73d3e5

kimi linear src/llama

e308026

remove "const int64_t n_seq_tokens = q->ne[2];" to get rid of unused …

139548d

…variable warning

remove type mismatch warning

83d328d

read MoE params

772ca88

removed some hard coded code

9f1265f

removed all hard code

a0269af

use DeepseekV2 tokenizer

ef5bc30

removed unnecessary internal methods called by the old set_vocab of K…

ae9771d

…imiLinear

rewrite get_vocab for KimiLinear. Removed all kda_scan code

f9a11d7

removed all traces of kda_scan

776294c

reduce OP count by 1 due to removal of kda_scan

f67a42d

Move KIMI_LINEAR to llm_arch_is_hybrid to enable KV cache

f85e5c7

set n_embd_head_k/v to ensure kv cache works

8bd617e

don't quantize conv1d of Kimi Linear

a4020d8

Kimi Linear backend agnostic

66c0c5d

removed LOG_INFO

aba181e

naive chunking form implemented

cfed14e

fixed some comments

e3542ff

add Kimi-K2 specific tokens to be recognized as EOG

67bee56

sync fork from b7240 to b7243

30d883c

Merge branch 'ggml-org:master' into Kimi-Linear

40f6118

build_kda_autoregressive is implemented to replace build_kda_recurren…

1099cbf

…t for faster inference. sync'd to b7682

replaced Akk and Aqk with mul_mat and clamp

f99913d

ymcki and others added 15 commits February 3, 2026 08:15

Merge branch 'ggml-org:master' into Kimi-Linear

07f9979

removed if else for required parameters kv_lora_rank and qk_rope_head…

efaea45

…_dim

add back ggml_cont for Vcur

000fded

minor changes

8ec5b08

removed extra line in llama-vocab.cpp. Added back the comment in llam…

82215a0

…a-graph.cpp

f16 gguf cannot run without context length

a82103e

made a mistake of adding back n_ctx parsing

6456393

4x4 16x16 blocks computation for Akk and Aqk

17cd6e8

sync to latest plus replace chunkify with get_slice_2d

97f229c

Merge branch 'ggml-org:master' into Kimi-Linear

cc16e49

replace ggml_acc with ggml_set for vulkan compatibility

06f0728

Merge branch 'master' of github.com:ymcki/llama.cpp into Kimi-Linear

906abc3

sync with latest

Merge branch 'Kimi-Linear' of github.com:ymcki/llama.cpp into Kimi-Li…

3dfebbb

…near sync to latest

Merge branch 'ggml-org:master' into Kimi-Linear

19cf704

fix conv state update for llama-server parallel serving

63a15e3

ymcki requested a review from CISC as a code owner February 12, 2026 01:12

Merge branch 'ggml-org:master' into Kimi-Linear

b2d02ad

github-actions bot added the model Model specific label Feb 12, 2026

loci-dev mentioned this pull request Feb 12, 2026

UPSTREAM PR #19531: Kimi Linear (correct conv state update + block implementation) auroralabs-loci/llama.cpp#1165

Open

ymcki mentioned this pull request Feb 12, 2026

Unified delta net handling for Qwen3Next and Kimi Linear models #18792

Closed

revert back to normal implementation

6286253

ymcki changed the title ~~Kimi Linear (correct conv state update + block implementation)~~ Kimi Linear fix conv state update Feb 13, 2026

Merge branch 'ggml-org:master' into Kimi-Linear

a46782c

CISC approved these changes Feb 13, 2026

View reviewed changes

CISC merged commit 33a56f9 into ggml-org:master Feb 13, 2026
6 of 76 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kimi Linear fix conv state update#19531

Kimi Linear fix conv state update#19531
CISC merged 95 commits intoggml-org:masterfrom
ymcki:Kimi-Linear

ymcki commented Feb 12, 2026

Uh oh!

CISC commented Feb 12, 2026

Uh oh!

ymcki commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ymcki commented Feb 12, 2026

Uh oh!

CISC commented Feb 12, 2026

Uh oh!

ymcki commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants