model : avoid ggml_cont_3d for fused QKV weights by ggerganov · Pull Request #15662 · ggml-org/llama.cpp

ggerganov · 2025-08-29T10:34:14Z

ref #15602 (comment)

Avoid Vcur = ggml_cont_3d(..) when the QKV weights are merged in a single tensor
Make llama_kv_cache:: cpy_k and cpy_v more readable

ggml-ci

CISC

Tested with ~~CodeQwen1.5~~, Phi2, jina-embeddings-v3 and PLaMo2.

ggml-ci

CISC · 2025-09-08T07:03:08Z

Hmmm, https://github.com/ggml-org/ci/blob/results/llama.cpp/60/d6e7c6fd8bacac0892b8722f5d5c585139cb43/ggml-4-x86-cuda-v100/stdall#L1957

ggerganov · 2025-09-08T07:03:56Z

Hmmm, https://github.com/ggml-org/ci/blob/results/llama.cpp/60/d6e7c6fd8bacac0892b8722f5d5c585139cb43/ggml-4-x86-cuda-v100/stdall#L1957

This is due to #15687

CISC · 2025-09-08T07:06:51Z

Hmmm, https://github.com/ggml-org/ci/blob/results/llama.cpp/60/d6e7c6fd8bacac0892b8722f5d5c585139cb43/ggml-4-x86-cuda-v100/stdall#L1957

This is due to #15687

Ah, I get a segfault locally though at the first REPEAT test after ARGMAX.

ggerganov · 2025-09-08T07:16:02Z

On my end, all tests except GET_ROWS and the new IM2COL_3D are passing.

CISC · 2025-09-08T07:22:47Z

On my end, all tests except GET_ROWS and the new IM2COL_3D are passing.

Nvm, must have been some other issue pre-rebase, I pulled latest changes and applied #15868 and everything is fine now.

Edit: Eh, almost, got GGML_ASSERT(ggml_is_contiguous(src0)) on PAD, but that's surely not related. It's pad_ext test with v == true. Fixed in #15869

)" This reverts commit cf0e3ba.

* model : avoid ggml_cont_3d for fused QKV weights ggml-ci * kv-cache : make cpy_k and cpy_v implementation more readable ggml-ci * cont : add comments ggml-ci * cont : minor fix [no ci] * cont : one more fix * cont : clarity ggml-ci * kv-cache : require contiguous heads of k_cur and v_cur ggml-ci

ggerganov mentioned this pull request Aug 29, 2025

Feature Request: Repeated Unecessary Activation Quantization Ops #15602

Open

4 tasks

CISC mentioned this pull request Aug 29, 2025

Add support for CogVLM model #15002

Merged

4 tasks

ggerganov marked this pull request as ready for review September 7, 2025 17:24

ggerganov added 3 commits September 8, 2025 09:13

model : avoid ggml_cont_3d for fused QKV weights

bb1202b

ggml-ci

kv-cache : make cpy_k and cpy_v implementation more readable

85a5ea3

ggml-ci

cont : add comments

3dec397

ggml-ci

ggerganov force-pushed the gg/model-avoid-cont3d branch from f15d515 to 3dec397 Compare September 8, 2025 06:47

ggerganov added 3 commits September 8, 2025 09:49

cont : minor fix [no ci]

c62c354

cont : one more fix

1efa9e8

cont : clarity

d6be191

ggml-ci

CISC approved these changes Sep 8, 2025

View reviewed changes

kv-cache : require contiguous heads of k_cur and v_cur

60d6e7c

ggml-ci

ggerganov merged commit cf0e3ba into master Sep 8, 2025
52 of 55 checks passed

ggerganov deleted the gg/model-avoid-cont3d branch September 8, 2025 07:25

CISC mentioned this pull request Sep 9, 2025

Eval bug: GGML_ASSERT(ggml_is_contiguous(a)) with Jina reranker model #15895

Closed

ggerganov mentioned this pull request Oct 3, 2025

Eval bug: Jina embeddings v2 base code crashes with GGML_ASSERT(ggml_can_mul_mat(a, b)) failed #16392

Closed

CISC mentioned this pull request Oct 3, 2025

llama : fix shapes for bert/mpt q/k norm #16409

Merged

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 26, 2025

Revert "model : avoid ggml_cont_3d for fused QKV weights (ggml-org#15662

679abcd

)" This reverts commit cf0e3ba.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model : avoid ggml_cont_3d for fused QKV weights#15662

model : avoid ggml_cont_3d for fused QKV weights#15662
ggerganov merged 7 commits intomasterfrom
gg/model-avoid-cont3d

ggerganov commented Aug 29, 2025 •

edited

Loading

Uh oh!

CISC left a comment •

edited

Loading

Uh oh!

CISC commented Sep 8, 2025

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

CISC commented Sep 8, 2025

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

CISC commented Sep 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CISC commented Sep 8, 2025

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

CISC commented Sep 8, 2025

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

CISC commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented Aug 29, 2025 •

edited

Loading

CISC left a comment •

edited

Loading

CISC commented Sep 8, 2025 •

edited

Loading