Skip to content

convert : fix RuntimeError when stripping FP8 KV-cache scales#22818

Merged
ggerganov merged 2 commits into
ggml-org:masterfrom
pich:fix/nvfp4-convert-dict-iter
May 8, 2026
Merged

convert : fix RuntimeError when stripping FP8 KV-cache scales#22818
ggerganov merged 2 commits into
ggml-org:masterfrom
pich:fix/nvfp4-convert-dict-iter

Conversation

@pich

@pich pich commented May 7, 2026

Copy link
Copy Markdown
Contributor

Bug

convert_hf_to_gguf.py raises RuntimeError: dictionary changed size during iteration for any ModelOpt-quantised NVFP4 checkpoint that also has FP8 KV-cache scales — e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4 (and any hf_quant_config.json with kv_cache_quant_algo: FP8).

File "convert_hf_to_gguf.py", line 721, in _generate_nvfp4_tensors
    for name in self.model_tensors.keys():
RuntimeError: dictionary changed size during iteration

Cause

In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates self.model_tensors.keys() while calling del self.model_tensors[name] on the same dict. As soon as the first .k_scale / .v_scale tensor is found, the iterator invalidates.

The earlier loops in the same function avoid this by collecting names into consumed and popping them after iteration; this trailing loop was missed.

Fix

Wrap .keys() in list() so the deletions happen against a snapshot. One-line change.

Repro

hf download mmangkad/Qwen3.6-35B-A3B-NVFP4 --local-dir model-nvfp4
python convert_hf_to_gguf.py model-nvfp4 --outfile out.gguf --outtype auto

Without the patch: crashes after repacking experts. With the patch: writes out.gguf cleanly (973 tensors, 22 GB). Verified the resulting GGUF loads in llama-bench / llama-perplexity and gives sensible results on RTX PRO 4000 Blackwell (sm_120, native NVFP4).

In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates
self.model_tensors.keys() and calls del on the same dict, which raises
RuntimeError: dictionary changed size during iteration when a ModelOpt
NVFP4 model also has FP8 KV-cache scales (e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4
and any modelopt config with kv_cache_quant_algo: FP8).

Wrap the keys view in list() so the deletions happen on a snapshot.
@pich pich requested a review from CISC as a code owner May 7, 2026 19:57
@ggml-gh-bot

This comment was marked as off-topic.

@CISC CISC left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my bad, I removed a bunch of pointless lists, and didn't spot that this one was necessary, thanks!

@pich pich left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need second Code Review, i'm not able to approve your change

@pich pich requested a review from CISC May 7, 2026 20:33
@CISC

CISC commented May 7, 2026

Copy link
Copy Markdown
Member

we need second Code Review, i'm not able to approve your change

Don't worry about it, I'll set it as merge ready once CIs go green.

@github-actions github-actions Bot added the python python script changes label May 7, 2026
@CISC CISC added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label May 7, 2026
@ggerganov ggerganov merged commit 1d72d87 into ggml-org:master May 8, 2026
6 checks passed
cetarthoriphros pushed a commit to cetarthoriphros/llama.cpp that referenced this pull request May 9, 2026
…rg#22818)

* convert : fix RuntimeError when stripping FP8 KV-cache scales

In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates
self.model_tensors.keys() and calls del on the same dict, which raises
RuntimeError: dictionary changed size during iteration when a ModelOpt
NVFP4 model also has FP8 KV-cache scales (e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4
and any modelopt config with kv_cache_quant_algo: FP8).

Wrap the keys view in list() so the deletions happen on a snapshot.

* re-add another accidentally removed list

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
meh pushed a commit to meh/llama.cpp that referenced this pull request May 10, 2026
…rg#22818)

* convert : fix RuntimeError when stripping FP8 KV-cache scales

In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates
self.model_tensors.keys() and calls del on the same dict, which raises
RuntimeError: dictionary changed size during iteration when a ModelOpt
NVFP4 model also has FP8 KV-cache scales (e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4
and any modelopt config with kv_cache_quant_algo: FP8).

Wrap the keys view in list() so the deletions happen on a snapshot.

* re-add another accidentally removed list

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
pwilkin added a commit to pwilkin/llama.cpp that referenced this pull request May 13, 2026
Ports 15 upstream commits (05e141a..5d44db6) that touched the
monolithic convert_hf_to_gguf.py into the new conversion/*.py layout
introduced by the refactor split.

New text/mmproj architectures registered:
  GraniteSpeechForConditionalGeneration, MiMoV2ForCausalLM,
  MiniCPMV4_6ForConditionalGeneration, Sarashina2VisionForCausalLM,
  SarvamMoEForCausalLM (+ modeling_sarvam_moe.SarvamMoEForCausalLM).

Notable changes:
- filter_tensors classmethod added to ModelBase/TextModel/MmprojModel
  and wired into index_tensors; many model classes refactored to move
  tensor-name skip/rename logic out of modify_tensors and into
  filter_tensors (upstream ggml-org#22597).
- LlamaModel._repack_nvfp4 override (Q/K RoPE permutation, ggml-org#22611).
- MistralModel yarn apply_scale support (ggml-org#22612).
- Gemma4Model._generate_nvfp4_tensors override for 26B NVFP4 (ggml-org#22804).
- LlavaVisionModel image-break token fallback for Mistral params.json
  -1 placeholders (ggml-org#22914).
- Pixtral 12B --mistral-format conversion fixes (ggml-org#22981).
- FP8 KV-cache scales fix (ggml-org#22818) and uint dtype byteswap disable
  (ggml-org#18908).

New files:
  conversion/sarashina2.py (Sarashina2VL text + vision)
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026
…rg#22818)

* convert : fix RuntimeError when stripping FP8 KV-cache scales

In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates
self.model_tensors.keys() and calls del on the same dict, which raises
RuntimeError: dictionary changed size during iteration when a ModelOpt
NVFP4 model also has FP8 KV-cache scales (e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4
and any modelopt config with kv_cache_quant_algo: FP8).

Wrap the keys view in list() so the deletions happen on a snapshot.

* re-add another accidentally removed list

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
…rg#22818)

* convert : fix RuntimeError when stripping FP8 KV-cache scales

In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates
self.model_tensors.keys() and calls del on the same dict, which raises
RuntimeError: dictionary changed size during iteration when a ModelOpt
NVFP4 model also has FP8 KV-cache scales (e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4
and any modelopt config with kv_cache_quant_algo: FP8).

Wrap the keys view in list() so the deletions happen on a snapshot.

* re-add another accidentally removed list

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026
…rg#22818)

* convert : fix RuntimeError when stripping FP8 KV-cache scales

In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates
self.model_tensors.keys() and calls del on the same dict, which raises
RuntimeError: dictionary changed size during iteration when a ModelOpt
NVFP4 model also has FP8 KV-cache scales (e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4
and any modelopt config with kv_cache_quant_algo: FP8).

Wrap the keys view in list() so the deletions happen on a snapshot.

* re-add another accidentally removed list

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
…rg#22818)

* convert : fix RuntimeError when stripping FP8 KV-cache scales

In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates
self.model_tensors.keys() and calls del on the same dict, which raises
RuntimeError: dictionary changed size during iteration when a ModelOpt
NVFP4 model also has FP8 KV-cache scales (e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4
and any modelopt config with kv_cache_quant_algo: FP8).

Wrap the keys view in list() so the deletions happen on a snapshot.

* re-add another accidentally removed list

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants