Skip to content

convert : fix Pixtral 12B --mistral-format conversion (3 bugs)#22981

Merged
ngxson merged 1 commit into
ggml-org:masterfrom
fredzillman:fix/pixtral-mistral-format-conversion
May 12, 2026
Merged

convert : fix Pixtral 12B --mistral-format conversion (3 bugs)#22981
ngxson merged 1 commit into
ggml-org:masterfrom
fredzillman:fix/pixtral-mistral-format-conversion

Conversation

@fredzillman

Copy link
Copy Markdown
Contributor

Summary

Three small fixes in convert_hf_to_gguf.py to make --mistral-format work
end-to-end on Pixtral 12B (2409) consolidated weights. Without these,
conversion crashes before writing the GGUF; with them, F16 -> Q4_K_M -> mtmd
inference produces correct image output. No inference-side changes required --
tools/mtmd/clip.cpp already reads the bias tensors when present.

Bugs

1. LlamaModel.__init__ crashes on mistral-format input (line ~2867)

ModelBase.load_hparams(self.dir_model, is_mistral_format=False) is called
unconditionally to read architectures for origin_hf_arch (used downstream
to detect SmolVLM2 etc.). Mistral consolidated layouts have no config.json,
so this raises FileNotFoundError for any --mistral-format run.

Fix: skip the HF-only lookup when self.is_mistral_format is True and set
self.origin_hf_arch = None. The field is only consulted by HF-specific
branches, so None is safe in the mistral path.

2. PixtralModel.set_gguf_parameters requires mm_projector_id (line ~13321)

self.find_vparam([mm_projector_id]) is mandatory, but Pixtral 12B 2409
uses a plain linear projector and ships no mm_projector_id in params.json
-- only the newer Mistral Small 3.1 sets it to patch_merge. Result: KeyError
on every Pixtral 12B conversion.

Fix: pass optional=True. The body inside the if only runs when the value
equals patch_merge, so None short-circuits correctly and the existing
patch-merge path is unaffected.

3. PixtralModel.map_tensor_name rejects adapter biases (line ~13330)

Only .weight is mapped for vision_language_adapter.w_in / w_out; .bias
falls through to super().map_tensor_name and raises. Pixtral 12B 2409's
consolidated.safetensors ships both biases.

Fix: add the two .bias branches mapping to mm.1.bias / mm.2.bias. The
inference side already supports this -- see tools/mtmd/clip.cpp:2143-2148,
which loads mm.1.bias / mm.2.bias with the optional flag set. The GGUF
writer was simply refusing tensors the runtime is happy to consume.

Reproduction

python convert_hf_to_gguf.py \
    /path/to/pixtral-12b-2409 \
    --mistral-format --outtype f16 \
    --outfile pixtral-12b-2409-f16.gguf

against the official Mistral release of Pixtral-12B-2409 (consolidated
safetensors layout, no config.json). Pre-patch this fails at
LlamaModel.__init__ (FileNotFoundError on config.json). After bypassing
that it fails at set_gguf_parameters (KeyError mm_projector_id). After
bypassing that it fails inside map_tensor_name on
vision_language_adapter.w_in.bias.

Verification

  • F16 GGUF written successfully from consolidated.safetensors +
    params.json + tekken.json.
  • Q4_K_M produced via llama-quantize, loads cleanly.
  • llama-mtmd-cli smoke test on a test image returned correct,
    image-grounded output. CUDA inference at ~47 tok/s eval on RTX 3090.
  • No upstream files touched besides convert_hf_to_gguf.py.

Inference side

tools/mtmd/clip.cpp:2143-2148 already calls
get_tensor(TN_MM_INP_PROJ_B, /*optional=*/true) and
get_tensor(TN_MM_OUTP_PROJ_B, /*optional=*/true) for mm.1.bias and
mm.2.bias, so the converter change makes existing runtime behavior
reachable. No C++ change is included or needed.

@fredzillman fredzillman requested a review from CISC as a code owner May 12, 2026 17:06
@CISC CISC requested a review from ngxson May 12, 2026 17:34
@github-actions github-actions Bot added the python python script changes label May 12, 2026
@ngxson ngxson merged commit cce09f0 into ggml-org:master May 12, 2026
6 checks passed
pwilkin added a commit to pwilkin/llama.cpp that referenced this pull request May 13, 2026
Ports 15 upstream commits (05e141a..5d44db6) that touched the
monolithic convert_hf_to_gguf.py into the new conversion/*.py layout
introduced by the refactor split.

New text/mmproj architectures registered:
  GraniteSpeechForConditionalGeneration, MiMoV2ForCausalLM,
  MiniCPMV4_6ForConditionalGeneration, Sarashina2VisionForCausalLM,
  SarvamMoEForCausalLM (+ modeling_sarvam_moe.SarvamMoEForCausalLM).

Notable changes:
- filter_tensors classmethod added to ModelBase/TextModel/MmprojModel
  and wired into index_tensors; many model classes refactored to move
  tensor-name skip/rename logic out of modify_tensors and into
  filter_tensors (upstream ggml-org#22597).
- LlamaModel._repack_nvfp4 override (Q/K RoPE permutation, ggml-org#22611).
- MistralModel yarn apply_scale support (ggml-org#22612).
- Gemma4Model._generate_nvfp4_tensors override for 26B NVFP4 (ggml-org#22804).
- LlavaVisionModel image-break token fallback for Mistral params.json
  -1 placeholders (ggml-org#22914).
- Pixtral 12B --mistral-format conversion fixes (ggml-org#22981).
- FP8 KV-cache scales fix (ggml-org#22818) and uint dtype byteswap disable
  (ggml-org#18908).

New files:
  conversion/sarashina2.py (Sarashina2VL text + vision)
xxmustafacooTR pushed a commit to xxPlayground/llama-cpp-turboquant that referenced this pull request May 13, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
carlosfundora pushed a commit to carlosfundora/llama.cpp-1-bit-turbo that referenced this pull request May 24, 2026
winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants