convert : add support for Nemotron Nano 3 Omni by danbev · Pull Request #22481 · ggml-org/llama.cpp

danbev · 2026-04-28T16:10:46Z

Overview

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:No

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

CISC · 2026-04-28T20:55:02Z

+    def dequant_model(self):
+        if self._is_nvfp4:
+            # Skip nvfp4 quantization for vision/audio model.
+            return
+        super().dequant_model()


What was the point of this?

This was to enable the mmproj model conversion for the NVFP4 model. It was a very late change as I did not get access to the NVFP4 model until yesterday, so there may be better ways to do this. Below is the commit in isolation, and also the error if we just remove/comment out the above dequant_model function in the NemotronNanoV2VLModel class.

nvfp4 commit

commit 11404c21dc0b5409e85686c426c9ae7c20944147 Author: Daniel Bevenius <daniel.bevenius@gmail.com> Date: Tue Apr 28 08:53:45 2026 +0200 convert : avoid nvfp4 processing for mmproj model This commit enables avoiding nvfp4 processing for mmproj models as the test language model does not need to be processed for these models and they also don't contain the mapping of the text model tensors which will cause errors during conversion. diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py index f5796cb5d..03aa957f0 100755 --- a/convert_hf_to_gguf.py +++ b/convert_hf_to_gguf.py @@ -728,6 +728,9 @@ class ModelBase: del experts, merged + def _needs_nvfp4_processing(self) -> bool: + return True + def prepare_tensors(self): # detect NVFP4 quantization (ModelOpt format) quant_algo = (self.hparams.get("quantization_config") or {}).get("quant_algo") @@ -758,7 +761,7 @@ class ModelBase: # NVFP4 weights are repacked and written directly to gguf_writer. # This must run before dequant_model so NVFP4 tensors are removed # from model_tensors, leaving only non-NVFP4 (e.g. FP8) for dequant. - if self._is_nvfp4: + if self._is_nvfp4 and self._needs_nvfp4_processing(): self._generate_nvfp4_tensors() self.dequant_model() @@ -2190,6 +2193,10 @@ class MmprojModel(ModelBase): # merge configs self.preprocessor_config = {**self.preprocessor_config, **cfg} + def _needs_nvfp4_processing(self) -> bool: + # nvfp4 quantization applies to the text model only. + return False + def get_vision_config(self) -> dict[str, Any] | None: config_name = "vision_config" if not self.is_mistral_format else "vision_encoder" return self.global_config.get(config_name) @@ -4450,6 +4457,12 @@ class NemotronNanoV2VLModel(MmprojModel): } return vision_config + def dequant_model(self): + if self._is_nvfp4: + # Skip nvfp4 quantization for vision/audio model. + return + super().dequant_model() + def set_gguf_parameters(self): if "image_mean" not in self.preprocessor_config: self.preprocessor_config["image_mean"] = [0.485, 0.456, 0.406]

error

INFO:hf-to-gguf:Exporting model... Traceback (most recent call last): File "/home/danbev/work/llama.cpp/examples/model-conversion/../../convert_hf_to_gguf.py", line 13586, in <module> main() File "/home/danbev/work/llama.cpp/examples/model-conversion/../../convert_hf_to_gguf.py", line 13580, in main model_instance.write() File "/home/danbev/work/llama.cpp/examples/model-conversion/../../convert_hf_to_gguf.py", line 933, in write self.prepare_tensors() File "/home/danbev/work/llama.cpp/examples/model-conversion/../../convert_hf_to_gguf.py", line 775, in prepare_tensors for name, data_torch in chain(self.generate_extra_tensors(), self.get_tensors()): File "/home/danbev/work/llama.cpp/examples/model-conversion/../../convert_hf_to_gguf.py", line 527, in get_tensors yield name, gen() ^^^^^ File "/home/danbev/work/llama.cpp/examples/model-conversion/../../convert_hf_to_gguf.py", line 511, in <lambda> self.model_tensors[weight_name] = lambda w=w, s=s: dequant_simple(w(), s(), None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/danbev/work/llama.cpp/examples/model-conversion/../../convert_hf_to_gguf.py", line 328, in dequant_simple return weight.float() * scale ~~~~~~~~~~~~~~~^~~~~~~ File "/home/danbev/work/llama.cpp/examples/model-conversion/../../gguf-py/gguf/lazy.py", line 40, in wrapped_special_op return type(self)._wrap_fn( ^^^^^^^^^^^^^^^^^^^^ File "/home/danbev/work/llama.cpp/examples/model-conversion/../../gguf-py/gguf/lazy.py", line 126, in wrapped_fn res = fn(*meta_args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/danbev/work/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/wrappers.py", line 291, in _fn result = fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/danbev/work/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/wrappers.py", line 143, in _fn result = fn(**bound.arguments) ^^^^^^^^^^^^^^^^^^^^^ File "/home/danbev/work/llama.cpp/venv/lib/python3.12/site-packages/torch/_refs/__init__.py", line 1095, in _ref a, b = _maybe_broadcast(a, b) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/danbev/work/llama.cpp/venv/lib/python3.12/site-packages/torch/_refs/__init__.py", line 437, in _maybe_broadcast common_shape = _broadcast_shapes( ^^^^^^^^^^^^^^^^^^ File "/home/danbev/work/llama.cpp/venv/lib/python3.12/site-packages/torch/_refs/__init__.py", line 425, in _broadcast_shapes torch._check( File "/home/danbev/work/llama.cpp/venv/lib/python3.12/site-packages/torch/__init__.py", line 1656, in _check _check_with(RuntimeError, cond, message) File "/home/danbev/work/llama.cpp/venv/lib/python3.12/site-packages/torch/__init__.py", line 1638, in _check_with raise error_type(message_evaluated) RuntimeError: Attempting to broadcast a dimension of length 116 at -1! Mismatching argument at index 1 had torch.Size([2688, 116]); but expected shape should be broadcastable to [2688, 928]

This suggests you are now left with weight_scale tensors unaccounted for, are you sure this created a working GGUF?

Edit: Oh, wait, I get it, it's because you're skipping the whole process for mmproj, so the NVFP4 tensors are left as-is.

There is probably a cleaner way to do this, I'll look into it.

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF. (cherry picked from commit 5d56eff)

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

convert : add support for Nemotron Nano 3 Omni

891e06c

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

danbev requested a review from CISC as a code owner April 28, 2026 16:10

danbev requested a review from ggerganov April 28, 2026 16:15

github-actions Bot added the python python script changes label Apr 28, 2026

ggerganov approved these changes Apr 28, 2026

View reviewed changes

danbev merged commit 5d56eff into ggml-org:master Apr 28, 2026
6 checks passed

CISC reviewed Apr 28, 2026

View reviewed changes

cnsiva pushed a commit to saas-home/llama.cpp that referenced this pull request Apr 29, 2026

convert : add support for Nemotron Nano 3 Omni (ggml-org#22481)

95a4beb

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

convert : add support for Nemotron Nano 3 Omni (ggml-org#22481)

f7ad7da

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

CISC mentioned this pull request May 1, 2026

convert : add filter_tensors method to pre-filter tensors #22597

Merged

samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request May 6, 2026

convert : add support for Nemotron Nano 3 Omni (ggml-org#22481)

ff437ed

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

convert : add support for Nemotron Nano 3 Omni (ggml-org#22481)

156afb1

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

meh pushed a commit to meh/llama.cpp that referenced this pull request May 10, 2026

convert : add support for Nemotron Nano 3 Omni (ggml-org#22481)

b563e80

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026

convert : add support for Nemotron Nano 3 Omni (ggml-org#22481)

eb906f3

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

winstonma pushed a commit to winstonma/llama.cpp that referenced this pull request May 27, 2026

convert : add support for Nemotron Nano 3 Omni (ggml-org#22481)

d5d1405

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

convert : add support for Nemotron Nano 3 Omni (ggml-org#22481)

4519819

This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert : add support for Nemotron Nano 3 Omni#22481

convert : add support for Nemotron Nano 3 Omni#22481
danbev merged 1 commit into
ggml-org:masterfrom
danbev:nemotron-nano-3-omni

danbev commented Apr 28, 2026

Uh oh!

Uh oh!

CISC Apr 28, 2026

Uh oh!

danbev Apr 29, 2026

Uh oh!

CISC Apr 29, 2026 •

edited

Loading

Uh oh!

CISC Apr 29, 2026

Uh oh!

danbev Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

danbev commented Apr 28, 2026

Overview

Requirements

Uh oh!

Uh oh!

CISC Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

danbev Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

CISC Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CISC Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

danbev Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CISC Apr 29, 2026 •

edited

Loading