Im having problems saving GGUFs of Gemma 3 finetunes. I was having this problem on my container environment and assumed I was having issues while training that caused other files to be generated but I am not. I have an almost identical Jypiter notebook as the one on the blog, yet I still cannot get model.save_pretrained_gguf() to run successfully with Gemma 3 models. In fact, it doesn't even work on the current version of the notebooks on the blog!
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B).ipynb#scrollTo=FqfebeAdT073
I have tried multiple methods of generating the config.json files
model.save_pretrained("Careconnect-gemma-3-4b-it")
tokenizer.save_pretrained("Careconnect-gemma-3-4b-it")
model.config.save_pretrained("Careconnect-gemma-3-4b-it") # Try to ensure config is saved config is saved
from transformers import AutoConfig
# Force auto generate because model.config.save_pretrained() isn't working
config = AutoConfig.from_pretrained("Careconnect-gemma-3-4b-it")
config.save_pretrained("Careconnect-gemma-3-4b-it")
Doing this, I was able to generate a config.json. However, an issue which I believe stems from limitations to external internet access on my container, still will be unable to run the model.save_pretrained_gguf(). Supposedly due to internet access, possibly not related to the config issue.
Unsloth: Updating system package directories
Unsloth: Install GGUF and other packages
RuntimeError: Unsloth: Could not obtain https://github.com/ggerganov/llama.cpp/raw/refs/heads/master/convert_hf_to_gguf.py. Maybe you don't have internet ocnnection?
Traceback:
File "Cell [cell17]", line 1, in <module>
model.save_pretrained_gguf(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/unsloth/save.py", line 2246, in save_to_gguf_generic
metadata = _convert_to_gguf(
File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/llama_cpp.py", line 653, in convert_to_gguf
conversion_filename, supported_types = _download_convert_hf_to_gguf()
File "/opt/conda/lib/python3.10/site-packages/unsloth_zoo/llama_cpp.py", line 353, in _download_convert_hf_to_gguf
raise RuntimeError(
As a work around I simply run the llama.cpp convert_hf_to_gguf.py python script and get this output.
python convert_hf_to_gguf.py Careconnect-gemma-3-4b-it --outtype q8_0 --outfile ./gguf_model
INFO:hf-to-gguf:Loading model: Careconnect-gemma-3-4b-it
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Has vision encoder, but it will be ignored
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 2
INFO:gguf.vocab:Setting special token type eos to 106
INFO:gguf.vocab:Setting special token type unk to 3
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {{ bos_token }}
{%- if messages[0]['role'] == 'system' -%}
{%- if messages[0]['content'] is string -%}
{%- set first_user_prefix = messages[0]['content'] + '
' -%}
{%- else -%}
{%- set first_user_prefix = messages[0]['content'][0]['text'] + '
' -%}
{%- endif -%}
{%- set loop_messages = messages[1:] -%}
{%- else -%}
{%- set first_user_prefix = "" -%}
{%- set loop_messages = messages -%}
{%- endif -%}
{%- for message in loop_messages -%}
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
{{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
{%- endif -%}
{%- if (message['role'] == 'assistant') -%}
{%- set role = "model" -%}
{%- else -%}
{%- set role = message['role'] -%}
{%- endif -%}
{{ '<start_of_turn>' + role + '
' + (first_user_prefix if loop.first else "") }}
{%- if message['content'] is string -%}
{{ message['content'] | trim }}
{%- elif message['content'] is iterable -%}
{%- for item in message['content'] -%}
{%- if item['type'] == 'image' -%}
{{ '<start_of_image>' }}
{%- elif item['type'] == 'text' -%}
{{ item['text'] | trim }}
{%- endif -%}
{%- endfor -%}
{%- else -%}
{{ raise_exception("Invalid content type") }}
{%- endif -%}
{{ '<end_of_turn>
' }}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{'<start_of_turn>model
'}}
{%- endif -%}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:gguf_model: n_tensors = 0, total_size = negligible - metadata only
Writing: 0.00byte [00:00, ?byte/s]
INFO:hf-to-gguf:NOTE: this script only convert the language model to GGUF
INFO:hf-to-gguf: for the vision model, please use gemma3_convert_encoder_to_gguf.py
INFO:hf-to-gguf:Model successfully exported to gguf_model
Writing 0.00bytes? 0:00 seconds? This cannot be right! The generated .gguf file is also only 6.3MB!
-rw-r--r--. 1 root root 6.3M Mar 19 01:54 /home/app/Careconnect-gemma-3-4b-it.gguf
This could be a problem with llama.cpp Gemma 3 GGUF and not Unsloth. But considering the issue is even appearing on the Colab Notebooks I believe this issue should be looked into by more people / devs. I see that Unsloth does have Hugging Face repositories with the GGUF files, so there should be a workaround.
Im having problems saving GGUFs of Gemma 3 finetunes. I was having this problem on my container environment and assumed I was having issues while training that caused other files to be generated but I am not. I have an almost identical Jypiter notebook as the one on the blog, yet I still cannot get model.save_pretrained_gguf() to run successfully with Gemma 3 models. In fact, it doesn't even work on the current version of the notebooks on the blog!
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B).ipynb#scrollTo=FqfebeAdT073
I have tried multiple methods of generating the config.json files
Doing this, I was able to generate a config.json. However, an issue which I believe stems from limitations to external internet access on my container, still will be unable to run the model.save_pretrained_gguf(). Supposedly due to internet access, possibly not related to the config issue.
As a work around I simply run the llama.cpp convert_hf_to_gguf.py python script and get this output.
Writing 0.00bytes? 0:00 seconds? This cannot be right! The generated .gguf file is also only 6.3MB!
-rw-r--r--. 1 root root 6.3M Mar 19 01:54 /home/app/Careconnect-gemma-3-4b-it.ggufThis could be a problem with llama.cpp Gemma 3 GGUF and not Unsloth. But considering the issue is even appearing on the Colab Notebooks I believe this issue should be looked into by more people / devs. I see that Unsloth does have Hugging Face repositories with the GGUF files, so there should be a workaround.