Skip to content

llama : add support for Nemotron 3 Super#20411

Merged
danbev merged 3 commits intoggml-org:masterfrom
danbev:nemotron-3-super
Mar 11, 2026
Merged

llama : add support for Nemotron 3 Super#20411
danbev merged 3 commits intoggml-org:masterfrom
danbev:nemotron-3-super

Conversation

@danbev
Copy link
Member

@danbev danbev commented Mar 11, 2026

This commit adds support for the Nemotron 3 Super model (120B.A12B) enabling this model to be converted to GGUF format and run in llama.cpp.

This commit adds support for the Nemotron 3 Super model (120B.A12B)
enabling this model to be converted to GGUF format and run in llama.cpp.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Matt Clayton <156335168+mattjcly@users.noreply.github.com>
@danbev danbev requested review from CISC and ggerganov as code owners March 11, 2026 17:13
@github-actions github-actions bot added model Model specific python python script changes ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 11, 2026
@danbev danbev merged commit eaf1d79 into ggml-org:master Mar 11, 2026
16 of 79 checks passed
ProgenyAlpha pushed a commit to ProgenyAlpha/llama.cpp that referenced this pull request Mar 12, 2026
* llama : add support for Nemotron 3 Super

This commit adds support for the Nemotron 3 Super model (120B.A12B)
enabling this model to be converted to GGUF format and run in llama.cpp.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Matt Clayton <156335168+mattjcly@users.noreply.github.com>
@vbooka1
Copy link

vbooka1 commented Mar 13, 2026

Hello, I'm getting error "Quant method is not yet supported: 'modelopt'" when trying to convert NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 ( https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4/ ) to .gguf:

$ python convert_hf_to_gguf__c7229ade.py --outfile /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4.gguf --outtype f16 --verbose --dry-run /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4/
INFO:hf-to-gguf:Loading model: NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
WARNING:hf-to-gguf:Failed to load model config from /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4: The repository /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 .
 You can inspect the repository content at https://hf.co//mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: NemotronHForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4: The repository /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 .
 You can inspect the repository content at https://hf.co//mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-00017.safetensors'
...
INFO:hf-to-gguf:gguf: indexing model part 'model-00017-of-00017.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
WARNING:hf-to-gguf:Failed to load model config from /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4: The repository /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 .
 You can inspect the repository content at https://hf.co//mnt/LLM/nemotron/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Exporting model...
Traceback (most recent call last):
  File "/home/username/venv/convert_hf_to_gguf__c7229ade.py", line 12163, in <module>
    main()
    ~~~~^^
  File "/home/username/venv/convert_hf_to_gguf__c7229ade.py", line 12157, in main
    model_instance.write()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/home/username/venv/convert_hf_to_gguf__c7229ade.py", line 715, in write
    self.prepare_tensors()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/home/username/venv/convert_hf_to_gguf__c7229ade.py", line 9884, in prepare_tensors
    super().prepare_tensors()
    ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/username/venv/convert_hf_to_gguf__c7229ade.py", line 2745, in prepare_tensors
    super().prepare_tensors()
    ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/username/venv/convert_hf_to_gguf__c7229ade.py", line 555, in prepare_tensors
    self.dequant_model()
    ~~~~~~~~~~~~~~~~~~^^
  File "/home/username/venv/convert_hf_to_gguf__c7229ade.py", line 474, in dequant_model
    raise NotImplementedError(f"Quant method is not yet supported: {quant_method!r}")
NotImplementedError: Quant method is not yet supported: 'modelopt'

I have tried convert_hf_to_gguf.py from release number b8304 https://github.com/ggml-org/llama.cpp/releases/tag/b8304

from commit 7229ade tekintian@7229ade

from commit 9ff11f5 tekintian@9ff11f5

-rw-r--r--  1 username username    592218 Mar 13 14:00 convert_hf_to_gguf__b8304.py
-rw-r--r--  1 username username    578627 Mar 13 14:04 convert_hf_to_gguf__c7229ade.py
-rw-r--r--  1 username username    595668 Mar 13 13:59 convert_hf_to_gguf__c9ff11f5.py

and all 3 versions produce the same error message. Please tell if I am doing something wrong or the convert script is indeed broken.

@CISC
Copy link
Member

CISC commented Mar 13, 2026

@vbooka1 Requires per-tensor check/repacking, please create an issue.

@vbooka1
Copy link

vbooka1 commented Mar 13, 2026

@vbooka1 Requires per-tensor check/repacking, please create an issue.

#20504

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants