Note: Please do not remove the questions. Answer beside them.
- Did you update?
pip install --upgrade unsloth unsloth_zoo : Yes, I am using H100 GPU on databricks, and only this unsloth version works: unsloth[cu124-torch260]==2025.9.8, I also installed unsloth_zoo
Colab or Kaggle or local / cloud: On databricks using Serverless GPU with H100 Accerlator
- Number GPUs used, use
nvidia-smi 8
- Which notebook? Please link!
- Which Unsloth version, TRL version, transformers version, PyTorch version? Unsloth==2025.9.8; TRL==0.22.2; transformers==4.56.2; torch==2.6.0
- Which trainer?
SFTTrainer, GRPOTrainer etc: SFTTrainer
I am fine-tuning this model
unsloth/Llama-3.1-8B-Instruct-bnb-4bit
After fine-tuning, I use save_pretrained on the model and tokenizer, and load it again
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = lora_save_path,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
Then I proceed to saving it to gguf
model.save_pretrained_gguf(
volume_save_path, #Saving model to Databricks volume
tokenizer,
quantization_method="q4_k_m"
)
I keep getting this error: RuntimeError: Unsloth: Quantization failed for {my save path}/unsloth.BF16.gguf
And the logging shows this is the error:
File "{my workspace path}/llama.cpp/convert_hf_to_gguf.py", line 35
try:
^
IndentationError: expected an indented block after 'try' statement on line 34
I tried to delete the llama.cpp file auto generated when calling save_pretrained_gguf, and rebuild directly from repo:
%sh
git clone --recursive https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release
And I am getting the same issue.
Note: Please do not remove the questions. Answer beside them.
pip install --upgrade unsloth unsloth_zoo: Yes, I am using H100 GPU on databricks, and only this unsloth version works: unsloth[cu124-torch260]==2025.9.8, I also installed unsloth_zooColaborKaggleor local / cloud: On databricks using Serverless GPU with H100 Accerlatornvidia-smi8SFTTrainer,GRPOTraineretc: SFTTrainerI am fine-tuning this model
After fine-tuning, I use
save_pretrainedon the model and tokenizer, and load it againThen I proceed to saving it to gguf
I keep getting this error:
RuntimeError: Unsloth: Quantization failed for {my save path}/unsloth.BF16.ggufAnd the logging shows this is the error:
I tried to delete the llama.cpp file auto generated when calling
save_pretrained_gguf, and rebuild directly from repo:And I am getting the same issue.