model : fix plamo2 attention_key/value_length regression by CISC · Pull Request #24317 · ggml-org/llama.cpp

CISC · 2026-06-08T20:04:36Z

Overview

Fixes incorrect tensor sizes and FPE due to bad assert.

Additional information

At some point after #16075, possibly during one of the refactors; hard to tell, these metadata overrides got lost.

The assert was probably copy-pasted from mamba-base, but there n_head is reassigned while the same (hparams.ssm_dt_rank) variable is called n_heads here.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: nakuuq

* upstream/HEAD: (329 commits) vendor : update LibreSSL to 4.3.2 (ggml-org#24397) Remove padding and multiple D2D copies for MTP (ggml-org#24086) chat: fix LFM2/LFM2.5 ignoring json_schema (ggml-org#24377) CUDA: Fix ssm_scan_f32 data-races (ggml-org#24360) ci : bump komac version (ggml-org#24396) speculative : fix "ngram-map-k4v" name in logging (ggml-org#24253) webui: implement pinned conversations support (ggml-org#21387) graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (ggml-org#24357) ci : fix windows release (ggml-org#24369) ui: add opt-in run_javascript frontend tool (ggml-org#24244) mtmd: build_vit batching (ggml-org#24352) vulkan: reduce iq1 shared memory usage for mul_mm (ggml-org#24287) vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (ggml-org#24123) ui: Fix excessive style recalculation on hover (ggml-org#24243) mtmd: refactor video subproc handling (ggml-org#24316) server: log prompts to directory (ggml-org#22031) ui: fix mobile chat form overflow and bust stale bundle cache (ggml-org#24158) ggml : add GGML_OP_COL2IM_1D (ggml-org#24206) server : do not clear slots without unified KV cache (ggml-org#24190) models : fix plamo2 attention_key/value_length regression (ggml-org#24317) ...

fix plamo2 attention_key/value_length regression

502f0d1

github-actions Bot added the model Model specific label Jun 8, 2026

CISC requested a review from ggerganov June 8, 2026 20:05

CISC mentioned this pull request Jun 8, 2026

fix: handle hybrid models where layer 0 has no attention heads #24068

Closed

CISC added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 9, 2026

ggerganov merged commit f0152ef into master Jun 9, 2026
25 checks passed

CISC deleted the cisc/plamo2-attn-kv-fix branch June 9, 2026 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model : fix plamo2 attention_key/value_length regression#24317

model : fix plamo2 attention_key/value_length regression#24317
ggerganov merged 1 commit into
masterfrom
cisc/plamo2-attn-kv-fix

CISC commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CISC commented Jun 8, 2026

Overview

Additional information

Requirements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants