convert : fix Pixtral 12B --mistral-format conversion (3 bugs)#22981
Merged
ngxson merged 1 commit intoMay 12, 2026
Merged
Conversation
CISC
approved these changes
May 12, 2026
ngxson
approved these changes
May 12, 2026
pwilkin
added a commit
to pwilkin/llama.cpp
that referenced
this pull request
May 13, 2026
Ports 15 upstream commits (05e141a..5d44db6) that touched the monolithic convert_hf_to_gguf.py into the new conversion/*.py layout introduced by the refactor split. New text/mmproj architectures registered: GraniteSpeechForConditionalGeneration, MiMoV2ForCausalLM, MiniCPMV4_6ForConditionalGeneration, Sarashina2VisionForCausalLM, SarvamMoEForCausalLM (+ modeling_sarvam_moe.SarvamMoEForCausalLM). Notable changes: - filter_tensors classmethod added to ModelBase/TextModel/MmprojModel and wired into index_tensors; many model classes refactored to move tensor-name skip/rename logic out of modify_tensors and into filter_tensors (upstream ggml-org#22597). - LlamaModel._repack_nvfp4 override (Q/K RoPE permutation, ggml-org#22611). - MistralModel yarn apply_scale support (ggml-org#22612). - Gemma4Model._generate_nvfp4_tensors override for 26B NVFP4 (ggml-org#22804). - LlavaVisionModel image-break token fallback for Mistral params.json -1 placeholders (ggml-org#22914). - Pixtral 12B --mistral-format conversion fixes (ggml-org#22981). - FP8 KV-cache scales fix (ggml-org#22818) and uint dtype byteswap disable (ggml-org#18908). New files: conversion/sarashina2.py (Sarashina2VL text + vision)
xxmustafacooTR
pushed a commit
to xxPlayground/llama-cpp-turboquant
that referenced
this pull request
May 13, 2026
rsenthilkumar6
pushed a commit
to rsenthilkumar6/llama.cpp
that referenced
this pull request
May 19, 2026
baramofme
pushed a commit
to baramofme/llama-cpp-turboquant
that referenced
this pull request
May 23, 2026
carlosfundora
pushed a commit
to carlosfundora/llama.cpp-1-bit-turbo
that referenced
this pull request
May 24, 2026
…org#22981) (cherry picked from commit cce09f0)
winstonma
pushed a commit
to winstonma/llama.cpp
that referenced
this pull request
May 27, 2026
fewtarius
pushed a commit
to fewtarius/llama.cpp
that referenced
this pull request
May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three small fixes in
convert_hf_to_gguf.pyto make--mistral-formatworkend-to-end on Pixtral 12B (2409) consolidated weights. Without these,
conversion crashes before writing the GGUF; with them, F16 -> Q4_K_M -> mtmd
inference produces correct image output. No inference-side changes required --
tools/mtmd/clip.cppalready reads the bias tensors when present.Bugs
1.
LlamaModel.__init__crashes on mistral-format input (line ~2867)ModelBase.load_hparams(self.dir_model, is_mistral_format=False)is calledunconditionally to read
architecturesfororigin_hf_arch(used downstreamto detect SmolVLM2 etc.). Mistral consolidated layouts have no
config.json,so this raises
FileNotFoundErrorfor any--mistral-formatrun.Fix: skip the HF-only lookup when
self.is_mistral_formatis True and setself.origin_hf_arch = None. The field is only consulted by HF-specificbranches, so
Noneis safe in the mistral path.2.
PixtralModel.set_gguf_parametersrequiresmm_projector_id(line ~13321)self.find_vparam([mm_projector_id])is mandatory, but Pixtral 12B 2409uses a plain linear projector and ships no
mm_projector_idinparams.json-- only the newer Mistral Small 3.1 sets it to
patch_merge. Result: KeyErroron every Pixtral 12B conversion.
Fix: pass
optional=True. The body inside theifonly runs when the valueequals
patch_merge, soNoneshort-circuits correctly and the existingpatch-merge path is unaffected.
3.
PixtralModel.map_tensor_namerejects adapter biases (line ~13330)Only
.weightis mapped forvision_language_adapter.w_in/w_out;.biasfalls through to
super().map_tensor_nameand raises. Pixtral 12B 2409'sconsolidated.safetensorsships both biases.Fix: add the two
.biasbranches mapping tomm.1.bias/mm.2.bias. Theinference side already supports this -- see
tools/mtmd/clip.cpp:2143-2148,which loads
mm.1.bias/mm.2.biaswith the optional flag set. The GGUFwriter was simply refusing tensors the runtime is happy to consume.
Reproduction
against the official Mistral release of Pixtral-12B-2409 (consolidated
safetensors layout, no
config.json). Pre-patch this fails atLlamaModel.__init__(FileNotFoundError onconfig.json). After bypassingthat it fails at
set_gguf_parameters(KeyErrormm_projector_id). Afterbypassing that it fails inside
map_tensor_nameonvision_language_adapter.w_in.bias.Verification
consolidated.safetensors+params.json+tekken.json.llama-quantize, loads cleanly.llama-mtmd-clismoke test on a test image returned correct,image-grounded output. CUDA inference at ~47 tok/s eval on RTX 3090.
convert_hf_to_gguf.py.Inference side
tools/mtmd/clip.cpp:2143-2148already callsget_tensor(TN_MM_INP_PROJ_B, /*optional=*/true)andget_tensor(TN_MM_OUTP_PROJ_B, /*optional=*/true)formm.1.biasandmm.2.bias, so the converter change makes existing runtime behaviorreachable. No C++ change is included or needed.