Skip to content

context : fix worst-case reserve outputs#12545

Merged
ggerganov merged 1 commit intomasterfrom
gg/pooling-fix-reserve
Mar 25, 2025
Merged

context : fix worst-case reserve outputs#12545
ggerganov merged 1 commit intomasterfrom
gg/pooling-fix-reserve

Conversation

@ggerganov
Copy link
Member

fix #12517

Set correct number of outputs when reserving worst-case graphs.

@ggerganov ggerganov merged commit 2d77d88 into master Mar 25, 2025
55 of 56 checks passed
@ggerganov ggerganov deleted the gg/pooling-fix-reserve branch March 25, 2025 07:19
ZeroV0LT pushed a commit to ZeroV0LT/llama.cpp that referenced this pull request Mar 12, 2026
The chunked fused Gated Delta Net detection in sched_reserve() calls
graph_reserve(16*n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs.
This creates a dimension mismatch in build_pooling() for embedding models
with mean/rank pooling: build_inp_mean() creates a tensor with shape
[n_tokens=16*n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...]
via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b).

Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation,
matching the pattern used by the pp/tg worst-case reservations.

Regression introduced by ggml-org#20340 (d28961d).
Same class of bug as ggml-org#12517, fixed by ggml-org#12545.
ggerganov pushed a commit that referenced this pull request Mar 13, 2026
…0468)

* llama : fix pooling assertion crash in chunked GDN detection path

The chunked fused Gated Delta Net detection in sched_reserve() calls
graph_reserve(16*n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs.
This creates a dimension mismatch in build_pooling() for embedding models
with mean/rank pooling: build_inp_mean() creates a tensor with shape
[n_tokens=16*n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...]
via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b).

Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation,
matching the pattern used by the pp/tg worst-case reservations.

Regression introduced by #20340 (d28961d).
Same class of bug as #12517, fixed by #12545.

* server : add mean pooling tests to embedding test suite

Add test_embedding_pooling_mean and test_embedding_pooling_mean_multiple
to cover the --pooling mean codepath, which was previously untested.

These tests would have caught the regression introduced by #20340 where
build_pooling() crashes with a ggml_mul_mat assertion due to mismatched
dimensions in the chunked GDN detection path.

---------

Co-authored-by: Domenico Crupi <domenico@zerovolt.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: e0dbec0 (aka #12181) breaks pooled embeddings: mean

1 participant