Fix garbled output with REPACK at high thread counts by NoahOksuz · Pull Request #16956 · ggml-org/llama.cpp

NoahOksuz · 2025-11-02T23:53:03Z

Fixed a race condition in the REPACK matrix multiplication code that caused garbled output when using 26+ threads (model-dependent threshold). The issue occurred because with high thread counts, the code forced chunk count to equal thread count, creating many small chunks. After aligning these chunks to NB_COLS boundaries, adjacent chunks could overlap, causing data corruption and race conditions. The fix enforces minimum chunk sizes based on NB_COLS and caps maximum chunk count to prevent creating too many tiny chunks, ensuring proper alignment without overlaps.

NoahOksuz · 2025-11-03T11:35:41Z

#16942

catap · 2025-11-03T12:06:00Z

I can't reproduce #16960 with this fix anymore.

ggerganov

Think this is OK, but would be nice if @max-krasnyansky can take a look as well.

ggml/src/ggml-cpu/repack.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

max-krasnyansky · 2025-11-03T20:14:00Z

Hmm. The change looks good but I'm seeing a significant regression in token rates from my earlier testing.
(around 10 t/s lower on the prompt). Reverted to my original commit and the token rates are same as before.
It's possible that some other changed merged since then are causing that. I'm looking into it ...

max-krasnyansky · 2025-11-04T05:04:34Z

Hmm. The change looks good but I'm seeing a significant regression in token rates from my earlier testing. (around 10 t/s lower on the prompt). Reverted to my original commit and the token rates are same as before. It's possible that some other changed merged since then are causing that. I'm looking into it ...

False alarm. Merging ...

* origin/master: (21 commits) vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (ggml-org#16919) examples(gguf): GGUF example outputs (ggml-org#17025) mtmd: allow QwenVL to process larger image by default (ggml-org#17020) server : do not default to multiple slots with speculative decoding (ggml-org#17017) mtmd: improve struct initialization (ggml-org#16981) docs: Clarify the endpoint that webui uses (ggml-org#17001) model : add openPangu-Embedded (ggml-org#16941) ggml webgpu: minor set rows optimization (ggml-org#16810) sync : ggml ggml : fix conv2d_dw SVE path (ggml/1380) CUDA: update ops.md (ggml-org#17005) opencl: update doc (ggml-org#17011) refactor: replace sprintf with snprintf for safer string handling in dump functions (ggml-org#16913) vulkan: remove the need for the dryrun (ggml-org#16826) server : do context shift only while generating (ggml-org#17000) readme : update hot topics (ggml-org#17002) ggml-cpu : bicubic interpolation (ggml-org#16891) ci : apply model label to models (ggml-org#16994) chore : fix models indent after refactor (ggml-org#16992) Fix garbled output with REPACK at high thread counts (ggml-org#16956) ...

joseph777111 · 2025-11-05T22:24:58Z

Thank you! This fixed Unsloth's Qwen3-VL-30B-A3B-Instruct-1M GGUF quants. Also, this had made other quants more coherent, including Wayfarer-2. Great work guys! I think this will positively affect more GGUF quants than we can rightly quantify. For reference: I tested with the METAL backend on an M1 MacBook Pro 16GB (Unified Memory).

* Fix garbled output with REPACK at high thread counts Fixed a race condition in the REPACK matrix multiplication code that caused garbled output when using 26+ threads (model-dependent threshold). The issue occurred because with high thread counts, the code forced chunk count to equal thread count, creating many small chunks. After aligning these chunks to NB_COLS boundaries, adjacent chunks could overlap, causing data corruption and race conditions. The fix enforces minimum chunk sizes based on NB_COLS and caps maximum chunk count to prevent creating too many tiny chunks, ensuring proper alignment without overlaps. * Update ggml/src/ggml-cpu/repack.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/repack.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

NoahOksuz requested review from ggerganov and slaren as code owners November 2, 2025 23:53

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 3, 2025

DajanaV mentioned this pull request Nov 3, 2025

UPSTREAM PR #16956: Fix garbled output with REPACK at high thread counts auroralabs-loci/llama.cpp#46

Closed

ggerganov mentioned this pull request Nov 3, 2025

Eval bug: Qwen3-VL-30B-A3B models produces garbage #16960

Closed

ggerganov requested a review from max-krasnyansky November 3, 2025 12:08

ggerganov approved these changes Nov 3, 2025

View reviewed changes

ggerganov reviewed Nov 3, 2025

View reviewed changes

ggml/src/ggml-cpu/repack.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-cpu/repack.cpp Outdated Show resolved Hide resolved

NoahOksuz and others added 2 commits November 3, 2025 14:17

Update ggml/src/ggml-cpu/repack.cpp

fe79c34

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Update ggml/src/ggml-cpu/repack.cpp

1135d0c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

max-krasnyansky approved these changes Nov 4, 2025

View reviewed changes

max-krasnyansky merged commit 1f5accb into ggml-org:master Nov 4, 2025
65 of 69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix garbled output with REPACK at high thread counts#16956

Fix garbled output with REPACK at high thread counts#16956
max-krasnyansky merged 3 commits intoggml-org:masterfrom
NoahOksuz:fix16942

NoahOksuz commented Nov 2, 2025 •

edited

Loading

Uh oh!

NoahOksuz commented Nov 3, 2025

Uh oh!

catap commented Nov 3, 2025

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Uh oh!

max-krasnyansky commented Nov 3, 2025

Uh oh!

max-krasnyansky commented Nov 4, 2025

Uh oh!

Uh oh!

joseph777111 commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

NoahOksuz commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NoahOksuz commented Nov 3, 2025

Uh oh!

catap commented Nov 3, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

max-krasnyansky commented Nov 3, 2025

Uh oh!

max-krasnyansky commented Nov 4, 2025

Uh oh!

Uh oh!

joseph777111 commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

NoahOksuz commented Nov 2, 2025 •

edited

Loading