Skip to content

Error saving GGUF of vision model #1504

@SergioRubio01

Description

@SergioRubio01

Hey,
Congrat for the work!
I have a vision FT but when attempting to download the GGUF for Ollama usage, it gives me this exception:

Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.57s/it]%|          | 0/2 [00:00<?, ?it/s]
2025-01-05 04:09:07,810 - ERROR - STDERR: max_steps is given, it will override any value given in num_train_epochs
2025-01-05 04:09:07,810 - ERROR - STDERR: ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
2025-01-05 04:09:07,810 - ERROR - STDERR: \\   /|    Num examples = 54 | Num Epochs = 1
2025-01-05 04:09:07,820 - ERROR - STDERR: O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
2025-01-05 04:09:07,820 - ERROR - STDERR: \        /    Total batch size = 8 | Total steps = 5
2025-01-05 04:09:07,824 - ERROR - STDERR: "-____-"     Number of trainable parameters = 16,793,600
2025-01-05 04:09:07,826 - ERROR - STDERR: 🦥 Unsloth needs about 1-3 minutes to load everything - please wait!
100%|██████████| 5/5 [01:20<00:00, 16.08s/it]          | 0/5 [00:00<?, ?it/s]
2025-01-05 04:09:07,829 - ERROR - STDERR: Traceback (most recent call last):
2025-01-05 04:09:07,832 - ERROR - STDERR: File "/home/Ubuntu/finetune.py", line 145, in <module>
2025-01-05 04:09:07,833 - ERROR - STDERR: model.save_pretrained_gguf(MODEL_NAME, tokenizer)
2025-01-05 04:09:07,835 - ERROR - STDERR: File "/home/Ubuntu/miniconda3/lib/python3.12/site-packages/unsloth/save.py", line 2238, in not_implemented_save 
2025-01-05 04:09:07,836 - ERROR - STDERR: raise NotImplementedError("Unsloth: Sorry GGUF is currently not supported for vision models!")
2025-01-05 04:09:07,837 - ERROR - STDERR: NotImplementedError: Unsloth: Sorry GGUF is currently not supported for vision models!
2025-01-05 04:09:07,840 - ERROR - Command failed with non-zero exit status: bash -l -c 'eval "$(~/miniconda3/bin/conda shell.bash hook)" &&         conda activate &&         python3 finetune.py'

Is this being currently developed? Or is there another possible way of using my FT LLM models in Ollama?

I followed the steps from the Docs:

...
trainer_stats = trainer.train()

model.save_pretrained(MODEL_NAME) # Local saving
tokenizer.save_pretrained(MODEL_NAME)
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

used_memory = round(torch.cuda.max_memory_reserved()/1024/1024/1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory/max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100,3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

# Save model in GGUF format
model.save_pretrained_gguf(MODEL_NAME, tokenizer)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions