Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes#22804
Conversation
Signed-off-by: ynankani <ynankani@nvidia.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
This comment was marked as off-topic.
This comment was marked as off-topic.
Signed-off-by: ynankani <ynankani@nvidia.com>
|
BTW, it doesn't look like we support fused gate/up experts in |
|
@ynankani Looks like GitHub UI snuck in |
I see ModelOpt treating gate and up experts as separate for quantization and export Link. Also I see logic for calibration sync for gate and up projLink Link2 |
Is it possible in .gitattributes to set eol=lf for python file? |
I'm not sure what the reason is, but it seems totally random, so I doubt there's anything we can do. |
Signed-off-by: ynankani <ynankani@nvidia.com>
* Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes Signed-off-by: ynankani <ynankani@nvidia.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address review comments Signed-off-by: ynankani <ynankani@nvidia.com> * fix CRLF Signed-off-by: ynankani <ynankani@nvidia.com> * Lint error fix Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes Signed-off-by: ynankani <ynankani@nvidia.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address review comments Signed-off-by: ynankani <ynankani@nvidia.com> * fix CRLF Signed-off-by: ynankani <ynankani@nvidia.com> * Lint error fix Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Ports 15 upstream commits (05e141a..5d44db6) that touched the monolithic convert_hf_to_gguf.py into the new conversion/*.py layout introduced by the refactor split. New text/mmproj architectures registered: GraniteSpeechForConditionalGeneration, MiMoV2ForCausalLM, MiniCPMV4_6ForConditionalGeneration, Sarashina2VisionForCausalLM, SarvamMoEForCausalLM (+ modeling_sarvam_moe.SarvamMoEForCausalLM). Notable changes: - filter_tensors classmethod added to ModelBase/TextModel/MmprojModel and wired into index_tensors; many model classes refactored to move tensor-name skip/rename logic out of modify_tensors and into filter_tensors (upstream ggml-org#22597). - LlamaModel._repack_nvfp4 override (Q/K RoPE permutation, ggml-org#22611). - MistralModel yarn apply_scale support (ggml-org#22612). - Gemma4Model._generate_nvfp4_tensors override for 26B NVFP4 (ggml-org#22804). - LlavaVisionModel image-break token fallback for Mistral params.json -1 placeholders (ggml-org#22914). - Pixtral 12B --mistral-format conversion fixes (ggml-org#22981). - FP8 KV-cache scales fix (ggml-org#22818) and uint dtype byteswap disable (ggml-org#18908). New files: conversion/sarashina2.py (Sarashina2VL text + vision)
* Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes Signed-off-by: ynankani <ynankani@nvidia.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address review comments Signed-off-by: ynankani <ynankani@nvidia.com> * fix CRLF Signed-off-by: ynankani <ynankani@nvidia.com> * Lint error fix Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
The three files touched by mainline 9f5f0e6 are all already at the target state in ygg HEAD: - convert_hf_to_gguf.py: ygg uses the per-arch refactor (Bundle C ggml-org#17114); Gemma4Model lives in conversion/gemma.py. mainline ggml-org#17114 was merged AFTER ggml-org#22804 in upstream, so the refactor carried the NVFP4 _generate_nvfp4_tensors method and updated filter_tensors along with it. conversion/gemma.py lines 719-752 already contain the upstream additions verbatim. - gguf-py/gguf/constants.py: MODEL_ARCH.GEMMA4 block (line 2522+) already includes FFN_GATE_EXP and FFN_UP_EXP (auto-merged cleanly). - src/models/gemma4.cpp: ffn_gate_up_exps TENSOR_NOT_REQUIRED path and build_moe_ffn argument updates already in HEAD (auto-merged cleanly). Empty commit retained for audit-trail/lineage; pairs with the previous loader port (baddad949 = mainline 42928bc).
* Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes Signed-off-by: ynankani <ynankani@nvidia.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address review comments Signed-off-by: ynankani <ynankani@nvidia.com> * fix CRLF Signed-off-by: ynankani <ynankani@nvidia.com> * Lint error fix Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
The three files touched by mainline 9f5f0e6 are all already at the target state in ygg HEAD: - convert_hf_to_gguf.py: ygg uses the per-arch refactor (Bundle C ggml-org#17114); Gemma4Model lives in conversion/gemma.py. mainline ggml-org#17114 was merged AFTER ggml-org#22804 in upstream, so the refactor carried the NVFP4 _generate_nvfp4_tensors method and updated filter_tensors along with it. conversion/gemma.py lines 719-752 already contain the upstream additions verbatim. - gguf-py/gguf/constants.py: MODEL_ARCH.GEMMA4 block (line 2522+) already includes FFN_GATE_EXP and FFN_UP_EXP (auto-merged cleanly). - src/models/gemma4.cpp: ffn_gate_up_exps TENSOR_NOT_REQUIRED path and build_moe_ffn argument updates already in HEAD (auto-merged cleanly). Empty commit retained for audit-trail/lineage; pairs with the previous loader port (baddad949 = mainline 42928bc).
The three files touched by mainline 9f5f0e6 are all already at the target state in ygg HEAD: - convert_hf_to_gguf.py: ygg uses the per-arch refactor (Bundle C ggml-org#17114); Gemma4Model lives in conversion/gemma.py. mainline ggml-org#17114 was merged AFTER ggml-org#22804 in upstream, so the refactor carried the NVFP4 _generate_nvfp4_tensors method and updated filter_tensors along with it. conversion/gemma.py lines 719-752 already contain the upstream additions verbatim. - gguf-py/gguf/constants.py: MODEL_ARCH.GEMMA4 block (line 2522+) already includes FFN_GATE_EXP and FFN_UP_EXP (auto-merged cleanly). - src/models/gemma4.cpp: ffn_gate_up_exps TENSOR_NOT_REQUIRED path and build_moe_ffn argument updates already in HEAD (auto-merged cleanly). Empty commit retained for audit-trail/lineage; pairs with the previous loader port (baddad949 = mainline 42928bc).
* Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes Signed-off-by: ynankani <ynankani@nvidia.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address review comments Signed-off-by: ynankani <ynankani@nvidia.com> * fix CRLF Signed-off-by: ynankani <ynankani@nvidia.com> * Lint error fix Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes Signed-off-by: ynankani <ynankani@nvidia.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address review comments Signed-off-by: ynankani <ynankani@nvidia.com> * fix CRLF Signed-off-by: ynankani <ynankani@nvidia.com> * Lint error fix Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Overview
Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes. This PR fixes the following:
Additional information
Tested with https://huggingface.co/nvidia/Gemma-4-26B-A4B-NVFP4
Requirements