Hi!
I'm not using Kaggle, but how can I get that exception?
I'm just trying to save gguf model.
model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
RuntimeError: Unsloth: Quantization failed for ./model/unsloth.F16.gguf
You are in a Kaggle environment, which might be the reason this is failing.
Kaggle only provides 20GB of disk space. Merging to 16bit for 7b models use 16GB of space.
This means using model.{save_pretrained/push_to_hub}_merged works, but
`model.{save_pretrained/push_to_hub}_gguf will use too much disk space.
I suggest you to save the 16bit model first, then use manual llama.cpp conversion.
Hi!
I'm not using Kaggle, but how can I get that exception?
I'm just trying to save gguf model.
model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
RuntimeError: Unsloth: Quantization failed for ./model/unsloth.F16.gguf
You are in a Kaggle environment, which might be the reason this is failing.
Kaggle only provides 20GB of disk space. Merging to 16bit for 7b models use 16GB of space.
This means using
model.{save_pretrained/push_to_hub}_mergedworks, but`model.{save_pretrained/push_to_hub}_gguf will use too much disk space.
I suggest you to save the 16bit model first, then use manual llama.cpp conversion.