- Have you tried uninstall Unsloth and upgrading?
Unsloth: Converting llama model. Can use fast conversion = False.
==((====))== Unsloth: Conversion from QLoRA to GGUF information
\\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GGUF 16bits might take 3 minutes.
\ / [2] Converting GGUF 16bits to ['q4_k_m'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.
Unsloth: llama.cpp found in the system. We shall skip installation.
Unsloth: [1] Converting model at model into bf16 GGUF format.
The output location will be /home/ubuntu/myaipj/Llama3-finetuning/model/unsloth.BF16.gguf
This might take 3 minutes...
Traceback (most recent call last):
File "/usr/local/bin/convert_hf_to_gguf.py", line 2572, in <module>
class PhiMoeModel(Phi3MiniModel):
File "/usr/local/bin/convert_hf_to_gguf.py", line 2573, in PhiMoeModel
model_arch = gguf.MODEL_ARCH.PHIMOE
^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/.conda/envs/unsloth_env/lib/python3.11/enum.py", line 786, in __getattr__
raise AttributeError(name) from None
AttributeError: PHIMOE
Traceback (most recent call last):
File "/home/ubuntu/myaipj/Llama3-finetuning/main.py", line 154, in <module>
model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
File "/home/ubuntu/.conda/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py", line 1735, in unsloth_save_pretrained_gguf
all_file_locations, want_full_precision = save_to_gguf(
^^^^^^^^^^^^^
File "/home/ubuntu/.conda/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py", line 1196, in save_to_gguf
raise RuntimeError(
RuntimeError: Unsloth: Quantization failed for /home/ubuntu/myaipj/Llama3-finetuning/model/unsloth.BF16.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.
- Otherwise, describe your problem or feature request: