I tested last week the saving of the GGUF method ( save_pretrained_gguf ) and it worked flawlessly. This weekend I let it train longer and afterwards it suddenly failed. I tried a few tries but it's always the same: No saving as GGUF works for me.
I also tried the recompiling mentioned, but to no avail. I'm testing on Google Colab on a A100.
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 61.18 out of 83.48 RAM for saving.
100%|██████████| 22/22 [00:00<00:00, 138.58it/s]
tokenizer config file saved in /content/test-save-q4_k_m/tokenizer_config.json
Special tokens file saved in /content/test-save-q4_k_m/special_tokens_map.json
Model config LlamaConfig {
"_name_or_path": "unsloth/tinyllama-bnb-4bit",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 2048,
"initializer_range": 0.02,
"intermediate_size": 5632,
"max_position_embeddings": 2048,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 22,
"num_key_value_heads": 4,
"pad_token_id": 0,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.38.2",
"unsloth_version": "2024.3",
"use_cache": true,
"vocab_size": 32000
}
Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
Configuration saved in /content/test-save-q4_k_m/config.json
Configuration saved in /content/test-save-q4_k_m/generation_config.json
Model weights saved in /content/test-save-q4_k_m/model.safetensors
Done.
==((====))== Unsloth: Conversion from QLoRA to GGUF information
\\ /| [0] Installing llama.cpp will take 3 minutes.
O^O/ \_/ \ [1] Converting HF to GUUF 16bits will take 3 minutes.
\ / [2] Converting GGUF 16bits to q8_0 will take 20 minutes.
"-____-" In total, you will have to wait around 26 minutes.
Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: [1] Converting model at /content/test-save-q4_k_m into q8_0 GGUF format.
The output location will be .//content/test-save-q4_k_m-unsloth.Q8_0.gguf
This will take 3 minutes...
Loading model file /content/test-save-q4_k_m/model.safetensors
params = Params(n_vocab=32000, n_embd=2048, n_layer=22, n_ctx=2048, n_ff=5632, n_head=32, n_head_kv=4, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=10000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyQ8_0: 7>, path_model=PosixPath('/content/test-save-q4_k_m'))
Found vocab files: {'spm': PosixPath('/content/test-save-q4_k_m/tokenizer.model'), 'bpe': None, 'hfft': PosixPath('/content/test-save-q4_k_m/tokenizer.json')}
Loading vocab file PosixPath('/content/test-save-q4_k_m/tokenizer.json'), type 'hfft'
fname_tokenizer: /content/test-save-q4_k_m
Vocab info: <HfVocab with 32000 base tokens and 0 added tokens>
Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 2, 'unk': 0, 'pad': 0}, add special tokens {'bos': True, 'eos': False}>
Permuting layer 0
Permuting layer 1
Permuting layer 2
Permuting layer 3
Permuting layer 4
Permuting layer 5
Permuting layer 6
Permuting layer 7
Permuting layer 8
Permuting layer 9
Permuting layer 10
Permuting layer 11
Permuting layer 12
Permuting layer 13
Permuting layer 14
Permuting layer 15
Permuting layer 16
Permuting layer 17
Permuting layer 18
Permuting layer 19
Permuting layer 20
Permuting layer 21
lm_head.weight -> output.weight | BF16 | [32000, 2048]
model.embed_tokens.weight -> token_embd.weight | BF16 | [32000, 2048]
model.layers.0.input_layernorm.weight -> blk.0.attn_norm.weight | BF16 | [2048]
model.layers.0.mlp.down_proj.weight -> blk.0.ffn_down.weight | BF16 | [2048, 5632]
model.layers.0.mlp.gate_proj.weight -> blk.0.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.0.mlp.up_proj.weight -> blk.0.ffn_up.weight | BF16 | [5632, 2048]
model.layers.0.post_attention_layernorm.weight -> blk.0.ffn_norm.weight | BF16 | [2048]
model.layers.0.self_attn.k_proj.weight -> blk.0.attn_k.weight | BF16 | [256, 2048]
model.layers.0.self_attn.o_proj.weight -> blk.0.attn_output.weight | BF16 | [2048, 2048]
model.layers.0.self_attn.q_proj.weight -> blk.0.attn_q.weight | BF16 | [2048, 2048]
model.layers.0.self_attn.v_proj.weight -> blk.0.attn_v.weight | BF16 | [256, 2048]
model.layers.1.input_layernorm.weight -> blk.1.attn_norm.weight | BF16 | [2048]
model.layers.1.mlp.down_proj.weight -> blk.1.ffn_down.weight | BF16 | [2048, 5632]
model.layers.1.mlp.gate_proj.weight -> blk.1.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.1.mlp.up_proj.weight -> blk.1.ffn_up.weight | BF16 | [5632, 2048]
model.layers.1.post_attention_layernorm.weight -> blk.1.ffn_norm.weight | BF16 | [2048]
model.layers.1.self_attn.k_proj.weight -> blk.1.attn_k.weight | BF16 | [256, 2048]
model.layers.1.self_attn.o_proj.weight -> blk.1.attn_output.weight | BF16 | [2048, 2048]
model.layers.1.self_attn.q_proj.weight -> blk.1.attn_q.weight | BF16 | [2048, 2048]
model.layers.1.self_attn.v_proj.weight -> blk.1.attn_v.weight | BF16 | [256, 2048]
model.layers.10.input_layernorm.weight -> blk.10.attn_norm.weight | BF16 | [2048]
model.layers.10.mlp.down_proj.weight -> blk.10.ffn_down.weight | BF16 | [2048, 5632]
model.layers.10.mlp.gate_proj.weight -> blk.10.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.10.mlp.up_proj.weight -> blk.10.ffn_up.weight | BF16 | [5632, 2048]
model.layers.10.post_attention_layernorm.weight -> blk.10.ffn_norm.weight | BF16 | [2048]
model.layers.10.self_attn.k_proj.weight -> blk.10.attn_k.weight | BF16 | [256, 2048]
model.layers.10.self_attn.o_proj.weight -> blk.10.attn_output.weight | BF16 | [2048, 2048]
model.layers.10.self_attn.q_proj.weight -> blk.10.attn_q.weight | BF16 | [2048, 2048]
model.layers.10.self_attn.v_proj.weight -> blk.10.attn_v.weight | BF16 | [256, 2048]
model.layers.11.input_layernorm.weight -> blk.11.attn_norm.weight | BF16 | [2048]
model.layers.11.mlp.down_proj.weight -> blk.11.ffn_down.weight | BF16 | [2048, 5632]
model.layers.11.mlp.gate_proj.weight -> blk.11.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.11.mlp.up_proj.weight -> blk.11.ffn_up.weight | BF16 | [5632, 2048]
model.layers.11.post_attention_layernorm.weight -> blk.11.ffn_norm.weight | BF16 | [2048]
model.layers.11.self_attn.k_proj.weight -> blk.11.attn_k.weight | BF16 | [256, 2048]
model.layers.11.self_attn.o_proj.weight -> blk.11.attn_output.weight | BF16 | [2048, 2048]
model.layers.11.self_attn.q_proj.weight -> blk.11.attn_q.weight | BF16 | [2048, 2048]
model.layers.11.self_attn.v_proj.weight -> blk.11.attn_v.weight | BF16 | [256, 2048]
model.layers.12.input_layernorm.weight -> blk.12.attn_norm.weight | BF16 | [2048]
model.layers.12.mlp.down_proj.weight -> blk.12.ffn_down.weight | BF16 | [2048, 5632]
model.layers.12.mlp.gate_proj.weight -> blk.12.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.12.mlp.up_proj.weight -> blk.12.ffn_up.weight | BF16 | [5632, 2048]
model.layers.12.post_attention_layernorm.weight -> blk.12.ffn_norm.weight | BF16 | [2048]
model.layers.12.self_attn.k_proj.weight -> blk.12.attn_k.weight | BF16 | [256, 2048]
model.layers.12.self_attn.o_proj.weight -> blk.12.attn_output.weight | BF16 | [2048, 2048]
model.layers.12.self_attn.q_proj.weight -> blk.12.attn_q.weight | BF16 | [2048, 2048]
model.layers.12.self_attn.v_proj.weight -> blk.12.attn_v.weight | BF16 | [256, 2048]
model.layers.13.input_layernorm.weight -> blk.13.attn_norm.weight | BF16 | [2048]
model.layers.13.mlp.down_proj.weight -> blk.13.ffn_down.weight | BF16 | [2048, 5632]
model.layers.13.mlp.gate_proj.weight -> blk.13.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.13.mlp.up_proj.weight -> blk.13.ffn_up.weight | BF16 | [5632, 2048]
model.layers.13.post_attention_layernorm.weight -> blk.13.ffn_norm.weight | BF16 | [2048]
model.layers.13.self_attn.k_proj.weight -> blk.13.attn_k.weight | BF16 | [256, 2048]
model.layers.13.self_attn.o_proj.weight -> blk.13.attn_output.weight | BF16 | [2048, 2048]
model.layers.13.self_attn.q_proj.weight -> blk.13.attn_q.weight | BF16 | [2048, 2048]
model.layers.13.self_attn.v_proj.weight -> blk.13.attn_v.weight | BF16 | [256, 2048]
model.layers.14.input_layernorm.weight -> blk.14.attn_norm.weight | BF16 | [2048]
model.layers.14.mlp.down_proj.weight -> blk.14.ffn_down.weight | BF16 | [2048, 5632]
model.layers.14.mlp.gate_proj.weight -> blk.14.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.14.mlp.up_proj.weight -> blk.14.ffn_up.weight | BF16 | [5632, 2048]
model.layers.14.post_attention_layernorm.weight -> blk.14.ffn_norm.weight | BF16 | [2048]
model.layers.14.self_attn.k_proj.weight -> blk.14.attn_k.weight | BF16 | [256, 2048]
model.layers.14.self_attn.o_proj.weight -> blk.14.attn_output.weight | BF16 | [2048, 2048]
model.layers.14.self_attn.q_proj.weight -> blk.14.attn_q.weight | BF16 | [2048, 2048]
model.layers.14.self_attn.v_proj.weight -> blk.14.attn_v.weight | BF16 | [256, 2048]
model.layers.15.input_layernorm.weight -> blk.15.attn_norm.weight | BF16 | [2048]
model.layers.15.mlp.down_proj.weight -> blk.15.ffn_down.weight | BF16 | [2048, 5632]
model.layers.15.mlp.gate_proj.weight -> blk.15.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.15.mlp.up_proj.weight -> blk.15.ffn_up.weight | BF16 | [5632, 2048]
model.layers.15.post_attention_layernorm.weight -> blk.15.ffn_norm.weight | BF16 | [2048]
model.layers.15.self_attn.k_proj.weight -> blk.15.attn_k.weight | BF16 | [256, 2048]
model.layers.15.self_attn.o_proj.weight -> blk.15.attn_output.weight | BF16 | [2048, 2048]
model.layers.15.self_attn.q_proj.weight -> blk.15.attn_q.weight | BF16 | [2048, 2048]
model.layers.15.self_attn.v_proj.weight -> blk.15.attn_v.weight | BF16 | [256, 2048]
model.layers.16.input_layernorm.weight -> blk.16.attn_norm.weight | BF16 | [2048]
model.layers.16.mlp.down_proj.weight -> blk.16.ffn_down.weight | BF16 | [2048, 5632]
model.layers.16.mlp.gate_proj.weight -> blk.16.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.16.mlp.up_proj.weight -> blk.16.ffn_up.weight | BF16 | [5632, 2048]
model.layers.16.post_attention_layernorm.weight -> blk.16.ffn_norm.weight | BF16 | [2048]
model.layers.16.self_attn.k_proj.weight -> blk.16.attn_k.weight | BF16 | [256, 2048]
model.layers.16.self_attn.o_proj.weight -> blk.16.attn_output.weight | BF16 | [2048, 2048]
model.layers.16.self_attn.q_proj.weight -> blk.16.attn_q.weight | BF16 | [2048, 2048]
model.layers.16.self_attn.v_proj.weight -> blk.16.attn_v.weight | BF16 | [256, 2048]
model.layers.17.input_layernorm.weight -> blk.17.attn_norm.weight | BF16 | [2048]
model.layers.17.mlp.down_proj.weight -> blk.17.ffn_down.weight | BF16 | [2048, 5632]
model.layers.17.mlp.gate_proj.weight -> blk.17.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.17.mlp.up_proj.weight -> blk.17.ffn_up.weight | BF16 | [5632, 2048]
model.layers.17.post_attention_layernorm.weight -> blk.17.ffn_norm.weight | BF16 | [2048]
model.layers.17.self_attn.k_proj.weight -> blk.17.attn_k.weight | BF16 | [256, 2048]
model.layers.17.self_attn.o_proj.weight -> blk.17.attn_output.weight | BF16 | [2048, 2048]
model.layers.17.self_attn.q_proj.weight -> blk.17.attn_q.weight | BF16 | [2048, 2048]
model.layers.17.self_attn.v_proj.weight -> blk.17.attn_v.weight | BF16 | [256, 2048]
model.layers.18.input_layernorm.weight -> blk.18.attn_norm.weight | BF16 | [2048]
model.layers.18.mlp.down_proj.weight -> blk.18.ffn_down.weight | BF16 | [2048, 5632]
model.layers.18.mlp.gate_proj.weight -> blk.18.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.18.mlp.up_proj.weight -> blk.18.ffn_up.weight | BF16 | [5632, 2048]
model.layers.18.post_attention_layernorm.weight -> blk.18.ffn_norm.weight | BF16 | [2048]
model.layers.18.self_attn.k_proj.weight -> blk.18.attn_k.weight | BF16 | [256, 2048]
model.layers.18.self_attn.o_proj.weight -> blk.18.attn_output.weight | BF16 | [2048, 2048]
model.layers.18.self_attn.q_proj.weight -> blk.18.attn_q.weight | BF16 | [2048, 2048]
model.layers.18.self_attn.v_proj.weight -> blk.18.attn_v.weight | BF16 | [256, 2048]
model.layers.19.input_layernorm.weight -> blk.19.attn_norm.weight | BF16 | [2048]
model.layers.19.mlp.down_proj.weight -> blk.19.ffn_down.weight | BF16 | [2048, 5632]
model.layers.19.mlp.gate_proj.weight -> blk.19.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.19.mlp.up_proj.weight -> blk.19.ffn_up.weight | BF16 | [5632, 2048]
model.layers.19.post_attention_layernorm.weight -> blk.19.ffn_norm.weight | BF16 | [2048]
model.layers.19.self_attn.k_proj.weight -> blk.19.attn_k.weight | BF16 | [256, 2048]
model.layers.19.self_attn.o_proj.weight -> blk.19.attn_output.weight | BF16 | [2048, 2048]
model.layers.19.self_attn.q_proj.weight -> blk.19.attn_q.weight | BF16 | [2048, 2048]
model.layers.19.self_attn.v_proj.weight -> blk.19.attn_v.weight | BF16 | [256, 2048]
model.layers.2.input_layernorm.weight -> blk.2.attn_norm.weight | BF16 | [2048]
model.layers.2.mlp.down_proj.weight -> blk.2.ffn_down.weight | BF16 | [2048, 5632]
model.layers.2.mlp.gate_proj.weight -> blk.2.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.2.mlp.up_proj.weight -> blk.2.ffn_up.weight | BF16 | [5632, 2048]
model.layers.2.post_attention_layernorm.weight -> blk.2.ffn_norm.weight | BF16 | [2048]
model.layers.2.self_attn.k_proj.weight -> blk.2.attn_k.weight | BF16 | [256, 2048]
model.layers.2.self_attn.o_proj.weight -> blk.2.attn_output.weight | BF16 | [2048, 2048]
model.layers.2.self_attn.q_proj.weight -> blk.2.attn_q.weight | BF16 | [2048, 2048]
model.layers.2.self_attn.v_proj.weight -> blk.2.attn_v.weight | BF16 | [256, 2048]
model.layers.20.input_layernorm.weight -> blk.20.attn_norm.weight | BF16 | [2048]
model.layers.20.mlp.down_proj.weight -> blk.20.ffn_down.weight | BF16 | [2048, 5632]
model.layers.20.mlp.gate_proj.weight -> blk.20.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.20.mlp.up_proj.weight -> blk.20.ffn_up.weight | BF16 | [5632, 2048]
model.layers.20.post_attention_layernorm.weight -> blk.20.ffn_norm.weight | BF16 | [2048]
model.layers.20.self_attn.k_proj.weight -> blk.20.attn_k.weight | BF16 | [256, 2048]
model.layers.20.self_attn.o_proj.weight -> blk.20.attn_output.weight | BF16 | [2048, 2048]
model.layers.20.self_attn.q_proj.weight -> blk.20.attn_q.weight | BF16 | [2048, 2048]
model.layers.20.self_attn.v_proj.weight -> blk.20.attn_v.weight | BF16 | [256, 2048]
model.layers.21.input_layernorm.weight -> blk.21.attn_norm.weight | BF16 | [2048]
model.layers.21.mlp.down_proj.weight -> blk.21.ffn_down.weight | BF16 | [2048, 5632]
model.layers.21.mlp.gate_proj.weight -> blk.21.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.21.mlp.up_proj.weight -> blk.21.ffn_up.weight | BF16 | [5632, 2048]
model.layers.21.post_attention_layernorm.weight -> blk.21.ffn_norm.weight | BF16 | [2048]
model.layers.21.self_attn.k_proj.weight -> blk.21.attn_k.weight | BF16 | [256, 2048]
model.layers.21.self_attn.o_proj.weight -> blk.21.attn_output.weight | BF16 | [2048, 2048]
model.layers.21.self_attn.q_proj.weight -> blk.21.attn_q.weight | BF16 | [2048, 2048]
model.layers.21.self_attn.v_proj.weight -> blk.21.attn_v.weight | BF16 | [256, 2048]
model.layers.3.input_layernorm.weight -> blk.3.attn_norm.weight | BF16 | [2048]
model.layers.3.mlp.down_proj.weight -> blk.3.ffn_down.weight | BF16 | [2048, 5632]
model.layers.3.mlp.gate_proj.weight -> blk.3.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.3.mlp.up_proj.weight -> blk.3.ffn_up.weight | BF16 | [5632, 2048]
model.layers.3.post_attention_layernorm.weight -> blk.3.ffn_norm.weight | BF16 | [2048]
model.layers.3.self_attn.k_proj.weight -> blk.3.attn_k.weight | BF16 | [256, 2048]
model.layers.3.self_attn.o_proj.weight -> blk.3.attn_output.weight | BF16 | [2048, 2048]
model.layers.3.self_attn.q_proj.weight -> blk.3.attn_q.weight | BF16 | [2048, 2048]
model.layers.3.self_attn.v_proj.weight -> blk.3.attn_v.weight | BF16 | [256, 2048]
model.layers.4.input_layernorm.weight -> blk.4.attn_norm.weight | BF16 | [2048]
model.layers.4.mlp.down_proj.weight -> blk.4.ffn_down.weight | BF16 | [2048, 5632]
model.layers.4.mlp.gate_proj.weight -> blk.4.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.4.mlp.up_proj.weight -> blk.4.ffn_up.weight | BF16 | [5632, 2048]
model.layers.4.post_attention_layernorm.weight -> blk.4.ffn_norm.weight | BF16 | [2048]
model.layers.4.self_attn.k_proj.weight -> blk.4.attn_k.weight | BF16 | [256, 2048]
model.layers.4.self_attn.o_proj.weight -> blk.4.attn_output.weight | BF16 | [2048, 2048]
model.layers.4.self_attn.q_proj.weight -> blk.4.attn_q.weight | BF16 | [2048, 2048]
model.layers.4.self_attn.v_proj.weight -> blk.4.attn_v.weight | BF16 | [256, 2048]
model.layers.5.input_layernorm.weight -> blk.5.attn_norm.weight | BF16 | [2048]
model.layers.5.mlp.down_proj.weight -> blk.5.ffn_down.weight | BF16 | [2048, 5632]
model.layers.5.mlp.gate_proj.weight -> blk.5.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.5.mlp.up_proj.weight -> blk.5.ffn_up.weight | BF16 | [5632, 2048]
model.layers.5.post_attention_layernorm.weight -> blk.5.ffn_norm.weight | BF16 | [2048]
model.layers.5.self_attn.k_proj.weight -> blk.5.attn_k.weight | BF16 | [256, 2048]
model.layers.5.self_attn.o_proj.weight -> blk.5.attn_output.weight | BF16 | [2048, 2048]
model.layers.5.self_attn.q_proj.weight -> blk.5.attn_q.weight | BF16 | [2048, 2048]
model.layers.5.self_attn.v_proj.weight -> blk.5.attn_v.weight | BF16 | [256, 2048]
model.layers.6.input_layernorm.weight -> blk.6.attn_norm.weight | BF16 | [2048]
model.layers.6.mlp.down_proj.weight -> blk.6.ffn_down.weight | BF16 | [2048, 5632]
model.layers.6.mlp.gate_proj.weight -> blk.6.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.6.mlp.up_proj.weight -> blk.6.ffn_up.weight | BF16 | [5632, 2048]
model.layers.6.post_attention_layernorm.weight -> blk.6.ffn_norm.weight | BF16 | [2048]
model.layers.6.self_attn.k_proj.weight -> blk.6.attn_k.weight | BF16 | [256, 2048]
model.layers.6.self_attn.o_proj.weight -> blk.6.attn_output.weight | BF16 | [2048, 2048]
model.layers.6.self_attn.q_proj.weight -> blk.6.attn_q.weight | BF16 | [2048, 2048]
model.layers.6.self_attn.v_proj.weight -> blk.6.attn_v.weight | BF16 | [256, 2048]
model.layers.7.input_layernorm.weight -> blk.7.attn_norm.weight | BF16 | [2048]
model.layers.7.mlp.down_proj.weight -> blk.7.ffn_down.weight | BF16 | [2048, 5632]
model.layers.7.mlp.gate_proj.weight -> blk.7.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.7.mlp.up_proj.weight -> blk.7.ffn_up.weight | BF16 | [5632, 2048]
model.layers.7.post_attention_layernorm.weight -> blk.7.ffn_norm.weight | BF16 | [2048]
model.layers.7.self_attn.k_proj.weight -> blk.7.attn_k.weight | BF16 | [256, 2048]
model.layers.7.self_attn.o_proj.weight -> blk.7.attn_output.weight | BF16 | [2048, 2048]
model.layers.7.self_attn.q_proj.weight -> blk.7.attn_q.weight | BF16 | [2048, 2048]
model.layers.7.self_attn.v_proj.weight -> blk.7.attn_v.weight | BF16 | [256, 2048]
model.layers.8.input_layernorm.weight -> blk.8.attn_norm.weight | BF16 | [2048]
model.layers.8.mlp.down_proj.weight -> blk.8.ffn_down.weight | BF16 | [2048, 5632]
model.layers.8.mlp.gate_proj.weight -> blk.8.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.8.mlp.up_proj.weight -> blk.8.ffn_up.weight | BF16 | [5632, 2048]
model.layers.8.post_attention_layernorm.weight -> blk.8.ffn_norm.weight | BF16 | [2048]
model.layers.8.self_attn.k_proj.weight -> blk.8.attn_k.weight | BF16 | [256, 2048]
model.layers.8.self_attn.o_proj.weight -> blk.8.attn_output.weight | BF16 | [2048, 2048]
model.layers.8.self_attn.q_proj.weight -> blk.8.attn_q.weight | BF16 | [2048, 2048]
model.layers.8.self_attn.v_proj.weight -> blk.8.attn_v.weight | BF16 | [256, 2048]
model.layers.9.input_layernorm.weight -> blk.9.attn_norm.weight | BF16 | [2048]
model.layers.9.mlp.down_proj.weight -> blk.9.ffn_down.weight | BF16 | [2048, 5632]
model.layers.9.mlp.gate_proj.weight -> blk.9.ffn_gate.weight | BF16 | [5632, 2048]
model.layers.9.mlp.up_proj.weight -> blk.9.ffn_up.weight | BF16 | [5632, 2048]
model.layers.9.post_attention_layernorm.weight -> blk.9.ffn_norm.weight | BF16 | [2048]
model.layers.9.self_attn.k_proj.weight -> blk.9.attn_k.weight | BF16 | [256, 2048]
model.layers.9.self_attn.o_proj.weight -> blk.9.attn_output.weight | BF16 | [2048, 2048]
model.layers.9.self_attn.q_proj.weight -> blk.9.attn_q.weight | BF16 | [2048, 2048]
model.layers.9.self_attn.v_proj.weight -> blk.9.attn_v.weight | BF16 | [256, 2048]
model.norm.weight -> output_norm.weight | BF16 | [2048]
Writing content/test-save-q4_k_m-unsloth.Q8_0.gguf, format 7
Ignoring added_tokens.json since model matches vocab size without it.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-15-9024c7b554f4>](https://hkjdhk20zt-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240321-060149_RC00_617803427#) in <cell line: 1>()
----> 1 model.save_pretrained_gguf("/content/test-save-q4_k_m", tokenizer)
2
3 # Save to 8bit Q8_0
4 if False: model.save_pretrained_gguf(MODEL_OUTPUT_DIR + MODEL_NAME, tokenizer,)
5 if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")
1 frames
[/usr/local/lib/python3.10/dist-packages/unsloth/save.py](https://hkjdhk20zt-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240321-060149_RC00_617803427#) in save_to_gguf(model_type, model_directory, quantization_method, first_conversion, _run_installer)
894 # Check if quantization succeeded!
895 if not os.path.isfile(final_location):
--> 896 raise RuntimeError(
897 f"Unsloth: Quantization failed for {final_location}\n"\
898 "You might have to compile llama.cpp yourself, then run this again.\n"\
RuntimeError: Unsloth: Quantization failed for .//content/test-save-q4_k_m-unsloth.Q8_0.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUBLAS=1 make all -j
Once that's done, redo the quantization.
I tested last week the saving of the GGUF method ( save_pretrained_gguf ) and it worked flawlessly. This weekend I let it train longer and afterwards it suddenly failed. I tried a few tries but it's always the same: No saving as GGUF works for me.
I also tried the recompiling mentioned, but to no avail. I'm testing on Google Colab on a A100.