Skip to content

llama-quant : correct n_attention_wv usage#20357

Merged
ggerganov merged 2 commits intoggml-org:masterfrom
ddh0:fix-quant-counter-init
Mar 10, 2026
Merged

llama-quant : correct n_attention_wv usage#20357
ggerganov merged 2 commits intoggml-org:masterfrom
ddh0:fix-quant-counter-init

Conversation

@ddh0
Copy link
Contributor

@ddh0 ddh0 commented Mar 10, 2026

In #19770, I made a mistake in the way the quantize_state_impl counter values were initialized. I was incrementing and using n_attention_wv in the same loop, when the value should have been fixed by the time we're deciding tensor types in llama_tensor_get_type_impl (for use_more_bits).

I never observed a difference in any of my tests - it was only after @bartowski kindly pointed this out that I realized it was incorrect. Thanks. :)

In ggml-org#19770, I introduced a regression in the way the
`quantize_state_impl` counter values were initialized. I was
incrementing and using `n_attention_wv` in the same loop, when it should
have been fixed by the time we're deciding tensor types in
`llama_tensor_get_type_impl` (for `use_more_bits`).

I never observed a difference in any of [my
tests](ggml-org#19770 (comment))
- it was only after @bartowski kindly pointed this out that I realized
it was incorrect. (Thanks!)
@ddh0 ddh0 requested a review from ggerganov as a code owner March 10, 2026 17:02
@ggerganov ggerganov merged commit 10e5b14 into ggml-org:master Mar 10, 2026
13 of 75 checks passed
asyncd1spatch pushed a commit to asyncd1spatch/llama.cpp that referenced this pull request Mar 10, 2026
* llama-quant : correct `n_attention_wv` usage

In ggml-org#19770, I introduced a regression in the way the
`quantize_state_impl` counter values were initialized. I was
incrementing and using `n_attention_wv` in the same loop, when it should
have been fixed by the time we're deciding tensor types in
`llama_tensor_get_type_impl` (for `use_more_bits`).

I never observed a difference in any of [my
tests](ggml-org#19770 (comment))
- it was only after @bartowski kindly pointed this out that I realized
it was incorrect. (Thanks!)

* simplify
@ddh0 ddh0 deleted the fix-quant-counter-init branch March 10, 2026 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants