Skip to content

model : fix plamo2 attention_key/value_length regression#24317

Merged
ggerganov merged 1 commit into
masterfrom
cisc/plamo2-attn-kv-fix
Jun 9, 2026
Merged

model : fix plamo2 attention_key/value_length regression#24317
ggerganov merged 1 commit into
masterfrom
cisc/plamo2-attn-kv-fix

Conversation

@CISC

@CISC CISC commented Jun 8, 2026

Copy link
Copy Markdown
Member

Overview

Fixes incorrect tensor sizes and FPE due to bad assert.

Additional information

At some point after #16075, possibly during one of the refactors; hard to tell, these metadata overrides got lost.

The assert was probably copy-pasted from mamba-base, but there n_head is reassigned while the same (hparams.ssm_dt_rank) variable is called n_heads here.

Requirements

@github-actions github-actions Bot added the model Model specific label Jun 8, 2026
@CISC CISC requested a review from ggerganov June 8, 2026 20:05
@CISC CISC added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 9, 2026
@ggerganov ggerganov merged commit f0152ef into master Jun 9, 2026
25 checks passed
@CISC CISC deleted the cisc/plamo2-attn-kv-fix branch June 9, 2026 12:47
Jcfunk added a commit to Jcfunk/llama.cpp that referenced this pull request Jun 11, 2026
* upstream/HEAD: (329 commits)
  vendor : update LibreSSL to 4.3.2 (ggml-org#24397)
  Remove padding and multiple D2D copies for MTP (ggml-org#24086)
  chat: fix LFM2/LFM2.5 ignoring json_schema (ggml-org#24377)
  CUDA: Fix ssm_scan_f32 data-races (ggml-org#24360)
  ci : bump komac version (ggml-org#24396)
  speculative : fix "ngram-map-k4v" name in logging (ggml-org#24253)
  webui: implement pinned conversations support (ggml-org#21387)
  graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (ggml-org#24357)
  ci : fix windows release (ggml-org#24369)
  ui: add opt-in run_javascript frontend tool (ggml-org#24244)
  mtmd: build_vit batching (ggml-org#24352)
  vulkan: reduce iq1 shared memory usage for mul_mm (ggml-org#24287)
  vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (ggml-org#24123)
  ui: Fix excessive style recalculation on hover (ggml-org#24243)
  mtmd: refactor video subproc handling (ggml-org#24316)
  server: log prompts to directory (ggml-org#22031)
  ui: fix mobile chat form overflow and bust stale bundle cache (ggml-org#24158)
  ggml : add GGML_OP_COL2IM_1D (ggml-org#24206)
  server : do not clear slots without unified KV cache (ggml-org#24190)
  models : fix plamo2 attention_key/value_length regression (ggml-org#24317)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. model Model specific

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants