unsloth==2025.5.1
unsloth_zoo==2025.5.1
After I fine-tuned the unsloth/Qwen2.5-VL-3B-Instruct model using unsloth, it runs normally. However, when I try to save it in GGUF format using the following command:
model.save_pretrained_gguf("./save_dir", quantization_type="q8_0")
The following error appears:
RuntimeError Traceback (most recent call last)
Cell In[25], line 1
----> 1 model.save_pretrained_gguf("./save_dir", quantization_type="q8_0")
File ~/anaconda3/envs/mocr/lib/python3.10/site-packages/torch/utils/_contextlib.py:116, in context_decorator..decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
File ~/anaconda3/envs/mocr/lib/python3.10/site-packages/unsloth/save.py:2247, in save_to_gguf_generic(model, save_directory, quantization_type, repo_id, token)
2244 install_llama_cpp(just_clone_repo = True)
2245 pass
-> 2247 metadata = _convert_to_gguf(
2248 save_directory,
2249 print_output = True,
2250 quantization_type = quantization_type,
2251 )
2252 if repo_id is not None:
2253 prepare_saving(
2254 model,
2255 repo_id,
2256 is_gguf = True,
2257 save_directory = save_directory,
2258 metadata = metadata,
2259 token = token,
2260 )
File ~/anaconda3/envs/mocr/lib/python3.10/site-packages/unsloth_zoo/llama_cpp.py:692, in convert_to_gguf(input_folder, output_filename, quantization_type, max_shard_size, print_output, print_outputs)
689 pass
691 if metadata is None:
--> 692 raise RuntimeError(f"Unsloth: Failed to convert {conversion_filename} to GGUF.")
694 printed_metadata = "\n".join(metadata)
695 if print_output: print(f"Unsloth: Successfully saved GGUF to:\n{printed_metadata}")
RuntimeError: Unsloth: Failed to convert llama.cpp/unsloth_convert_hf_to_gguf.py to GGUF.
Or when I try to upload it to Hugging Face using the following command:
model.push_to_hub_gguf("haha/qwen2.5-vl-gguf-q8", tokenizer, quantization_type = "Q8_0")
The following error appears:
TypeError Traceback (most recent call last)
Cell In[27], line 1
----> 1 model.push_to_hub_gguf("haha/qwen2.5-vl-gguf-q8", tokenizer, quantization_type = "Q8_0")
File ~/anaconda3/envs/mocr/lib/python3.10/site-packages/torch/utils/_contextlib.py:116, in context_decorator..decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
TypeError: save_to_gguf_generic() got multiple values for argument 'quantization_type'
Besides using q8_0, I have also tried bf16 and f16, and the above errors still occur.
However, I can successfully save it in the safetensors format using the following command:
model.save_pretrained_merged("qwen2.5-vl-sat", tokenizer)
This command runs successfully and saves the model.
Is it possible that unsloth does not support saving Qwen2.5-VL models in GGUF format?
unsloth==2025.5.1
unsloth_zoo==2025.5.1
After I fine-tuned the unsloth/Qwen2.5-VL-3B-Instruct model using unsloth, it runs normally. However, when I try to save it in GGUF format using the following command:
model.save_pretrained_gguf("./save_dir", quantization_type="q8_0")
The following error appears:
RuntimeError Traceback (most recent call last)
Cell In[25], line 1
----> 1 model.save_pretrained_gguf("./save_dir", quantization_type="q8_0")
File ~/anaconda3/envs/mocr/lib/python3.10/site-packages/torch/utils/_contextlib.py:116, in context_decorator..decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
File ~/anaconda3/envs/mocr/lib/python3.10/site-packages/unsloth/save.py:2247, in save_to_gguf_generic(model, save_directory, quantization_type, repo_id, token)
2244 install_llama_cpp(just_clone_repo = True)
2245 pass
-> 2247 metadata = _convert_to_gguf(
2248 save_directory,
2249 print_output = True,
2250 quantization_type = quantization_type,
2251 )
2252 if repo_id is not None:
2253 prepare_saving(
2254 model,
2255 repo_id,
2256 is_gguf = True,
2257 save_directory = save_directory,
2258 metadata = metadata,
2259 token = token,
2260 )
File ~/anaconda3/envs/mocr/lib/python3.10/site-packages/unsloth_zoo/llama_cpp.py:692, in convert_to_gguf(input_folder, output_filename, quantization_type, max_shard_size, print_output, print_outputs)
689 pass
691 if metadata is None:
--> 692 raise RuntimeError(f"Unsloth: Failed to convert {conversion_filename} to GGUF.")
694 printed_metadata = "\n".join(metadata)
695 if print_output: print(f"Unsloth: Successfully saved GGUF to:\n{printed_metadata}")
RuntimeError: Unsloth: Failed to convert llama.cpp/unsloth_convert_hf_to_gguf.py to GGUF.
Or when I try to upload it to Hugging Face using the following command:
model.push_to_hub_gguf("haha/qwen2.5-vl-gguf-q8", tokenizer, quantization_type = "Q8_0")
The following error appears:
TypeError Traceback (most recent call last)
Cell In[27], line 1
----> 1 model.push_to_hub_gguf("haha/qwen2.5-vl-gguf-q8", tokenizer, quantization_type = "Q8_0")
File ~/anaconda3/envs/mocr/lib/python3.10/site-packages/torch/utils/_contextlib.py:116, in context_decorator..decorate_context(*args, **kwargs)
113 @functools.wraps(func)
114 def decorate_context(*args, **kwargs):
115 with ctx_factory():
--> 116 return func(*args, **kwargs)
TypeError: save_to_gguf_generic() got multiple values for argument 'quantization_type'
Besides using
q8_0, I have also triedbf16andf16, and the above errors still occur.However, I can successfully save it in the safetensors format using the following command:
model.save_pretrained_merged("qwen2.5-vl-sat", tokenizer)
This command runs successfully and saves the model.
Is it possible that unsloth does not support saving Qwen2.5-VL models in GGUF format?