Saving as GGUF fails

I tested last week the saving of the GGUF method ( save_pretrained_gguf ) and it worked flawlessly. This weekend I let it train longer and afterwards it suddenly failed. I tried a few tries but it's always the same: No saving as GGUF works for me.

I also tried the recompiling mentioned, but to no avail. I'm testing on Google Colab on a A100.

```
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 61.18 out of 83.48 RAM for saving.
100%|██████████| 22/22 [00:00<00:00, 138.58it/s]
tokenizer config file saved in /content/test-save-q4_k_m/tokenizer_config.json
Special tokens file saved in /content/test-save-q4_k_m/special_tokens_map.json
Model config LlamaConfig {
  "_name_or_path": "unsloth/tinyllama-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 5632,
  "max_position_embeddings": 2048,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 22,
  "num_key_value_heads": 4,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "unsloth_version": "2024.3",
  "use_cache": true,
  "vocab_size": 32000
}

Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
Configuration saved in /content/test-save-q4_k_m/config.json
Configuration saved in /content/test-save-q4_k_m/generation_config.json
Model weights saved in /content/test-save-q4_k_m/model.safetensors
Done.
==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp will take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GUUF 16bits will take 3 minutes.
\        /    [2] Converting GGUF 16bits to q8_0 will take 20 minutes.
 "-____-"     In total, you will have to wait around 26 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: [1] Converting model at /content/test-save-q4_k_m into q8_0 GGUF format.
The output location will be .//content/test-save-q4_k_m-unsloth.Q8_0.gguf
This will take 3 minutes...
Loading model file /content/test-save-q4_k_m/model.safetensors
params = Params(n_vocab=32000, n_embd=2048, n_layer=22, n_ctx=2048, n_ff=5632, n_head=32, n_head_kv=4, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=10000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyQ8_0: 7>, path_model=PosixPath('/content/test-save-q4_k_m'))
Found vocab files: {'spm': PosixPath('/content/test-save-q4_k_m/tokenizer.model'), 'bpe': None, 'hfft': PosixPath('/content/test-save-q4_k_m/tokenizer.json')}
Loading vocab file PosixPath('/content/test-save-q4_k_m/tokenizer.json'), type 'hfft'
fname_tokenizer: /content/test-save-q4_k_m
Vocab info: <HfVocab with 32000 base tokens and 0 added tokens>
Special vocab info: <SpecialVocab with 0 merges, special tokens {'bos': 1, 'eos': 2, 'unk': 0, 'pad': 0}, add special tokens {'bos': True, 'eos': False}>
Permuting layer 0
Permuting layer 1
Permuting layer 2
Permuting layer 3
Permuting layer 4
Permuting layer 5
Permuting layer 6
Permuting layer 7
Permuting layer 8
Permuting layer 9
Permuting layer 10
Permuting layer 11
Permuting layer 12
Permuting layer 13
Permuting layer 14
Permuting layer 15
Permuting layer 16
Permuting layer 17
Permuting layer 18
Permuting layer 19
Permuting layer 20
Permuting layer 21
lm_head.weight                                   -> output.weight                            | BF16   | [32000, 2048]
model.embed_tokens.weight                        -> token_embd.weight                        | BF16   | [32000, 2048]
model.layers.0.input_layernorm.weight            -> blk.0.attn_norm.weight                   | BF16   | [2048]
model.layers.0.mlp.down_proj.weight              -> blk.0.ffn_down.weight                    | BF16   | [2048, 5632]
model.layers.0.mlp.gate_proj.weight              -> blk.0.ffn_gate.weight                    | BF16   | [5632, 2048]
model.layers.0.mlp.up_proj.weight                -> blk.0.ffn_up.weight                      | BF16   | [5632, 2048]
model.layers.0.post_attention_layernorm.weight   -> blk.0.ffn_norm.weight                    | BF16   | [2048]
model.layers.0.self_attn.k_proj.weight           -> blk.0.attn_k.weight                      | BF16   | [256, 2048]
model.layers.0.self_attn.o_proj.weight           -> blk.0.attn_output.weight                 | BF16   | [2048, 2048]
model.layers.0.self_attn.q_proj.weight           -> blk.0.attn_q.weight                      | BF16   | [2048, 2048]
model.layers.0.self_attn.v_proj.weight           -> blk.0.attn_v.weight                      | BF16   | [256, 2048]
model.layers.1.input_layernorm.weight            -> blk.1.attn_norm.weight                   | BF16   | [2048]
model.layers.1.mlp.down_proj.weight              -> blk.1.ffn_down.weight                    | BF16   | [2048, 5632]
model.layers.1.mlp.gate_proj.weight              -> blk.1.ffn_gate.weight                    | BF16   | [5632, 2048]
model.layers.1.mlp.up_proj.weight                -> blk.1.ffn_up.weight                      | BF16   | [5632, 2048]
model.layers.1.post_attention_layernorm.weight   -> blk.1.ffn_norm.weight                    | BF16   | [2048]
model.layers.1.self_attn.k_proj.weight           -> blk.1.attn_k.weight                      | BF16   | [256, 2048]
model.layers.1.self_attn.o_proj.weight           -> blk.1.attn_output.weight                 | BF16   | [2048, 2048]
model.layers.1.self_attn.q_proj.weight           -> blk.1.attn_q.weight                      | BF16   | [2048, 2048]
model.layers.1.self_attn.v_proj.weight           -> blk.1.attn_v.weight                      | BF16   | [256, 2048]
model.layers.10.input_layernorm.weight           -> blk.10.attn_norm.weight                  | BF16   | [2048]
model.layers.10.mlp.down_proj.weight             -> blk.10.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.10.mlp.gate_proj.weight             -> blk.10.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.10.mlp.up_proj.weight               -> blk.10.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.10.post_attention_layernorm.weight  -> blk.10.ffn_norm.weight                   | BF16   | [2048]
model.layers.10.self_attn.k_proj.weight          -> blk.10.attn_k.weight                     | BF16   | [256, 2048]
model.layers.10.self_attn.o_proj.weight          -> blk.10.attn_output.weight                | BF16   | [2048, 2048]
model.layers.10.self_attn.q_proj.weight          -> blk.10.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.10.self_attn.v_proj.weight          -> blk.10.attn_v.weight                     | BF16   | [256, 2048]
model.layers.11.input_layernorm.weight           -> blk.11.attn_norm.weight                  | BF16   | [2048]
model.layers.11.mlp.down_proj.weight             -> blk.11.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.11.mlp.gate_proj.weight             -> blk.11.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.11.mlp.up_proj.weight               -> blk.11.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.11.post_attention_layernorm.weight  -> blk.11.ffn_norm.weight                   | BF16   | [2048]
model.layers.11.self_attn.k_proj.weight          -> blk.11.attn_k.weight                     | BF16   | [256, 2048]
model.layers.11.self_attn.o_proj.weight          -> blk.11.attn_output.weight                | BF16   | [2048, 2048]
model.layers.11.self_attn.q_proj.weight          -> blk.11.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.11.self_attn.v_proj.weight          -> blk.11.attn_v.weight                     | BF16   | [256, 2048]
model.layers.12.input_layernorm.weight           -> blk.12.attn_norm.weight                  | BF16   | [2048]
model.layers.12.mlp.down_proj.weight             -> blk.12.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.12.mlp.gate_proj.weight             -> blk.12.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.12.mlp.up_proj.weight               -> blk.12.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.12.post_attention_layernorm.weight  -> blk.12.ffn_norm.weight                   | BF16   | [2048]
model.layers.12.self_attn.k_proj.weight          -> blk.12.attn_k.weight                     | BF16   | [256, 2048]
model.layers.12.self_attn.o_proj.weight          -> blk.12.attn_output.weight                | BF16   | [2048, 2048]
model.layers.12.self_attn.q_proj.weight          -> blk.12.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.12.self_attn.v_proj.weight          -> blk.12.attn_v.weight                     | BF16   | [256, 2048]
model.layers.13.input_layernorm.weight           -> blk.13.attn_norm.weight                  | BF16   | [2048]
model.layers.13.mlp.down_proj.weight             -> blk.13.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.13.mlp.gate_proj.weight             -> blk.13.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.13.mlp.up_proj.weight               -> blk.13.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.13.post_attention_layernorm.weight  -> blk.13.ffn_norm.weight                   | BF16   | [2048]
model.layers.13.self_attn.k_proj.weight          -> blk.13.attn_k.weight                     | BF16   | [256, 2048]
model.layers.13.self_attn.o_proj.weight          -> blk.13.attn_output.weight                | BF16   | [2048, 2048]
model.layers.13.self_attn.q_proj.weight          -> blk.13.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.13.self_attn.v_proj.weight          -> blk.13.attn_v.weight                     | BF16   | [256, 2048]
model.layers.14.input_layernorm.weight           -> blk.14.attn_norm.weight                  | BF16   | [2048]
model.layers.14.mlp.down_proj.weight             -> blk.14.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.14.mlp.gate_proj.weight             -> blk.14.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.14.mlp.up_proj.weight               -> blk.14.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.14.post_attention_layernorm.weight  -> blk.14.ffn_norm.weight                   | BF16   | [2048]
model.layers.14.self_attn.k_proj.weight          -> blk.14.attn_k.weight                     | BF16   | [256, 2048]
model.layers.14.self_attn.o_proj.weight          -> blk.14.attn_output.weight                | BF16   | [2048, 2048]
model.layers.14.self_attn.q_proj.weight          -> blk.14.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.14.self_attn.v_proj.weight          -> blk.14.attn_v.weight                     | BF16   | [256, 2048]
model.layers.15.input_layernorm.weight           -> blk.15.attn_norm.weight                  | BF16   | [2048]
model.layers.15.mlp.down_proj.weight             -> blk.15.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.15.mlp.gate_proj.weight             -> blk.15.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.15.mlp.up_proj.weight               -> blk.15.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.15.post_attention_layernorm.weight  -> blk.15.ffn_norm.weight                   | BF16   | [2048]
model.layers.15.self_attn.k_proj.weight          -> blk.15.attn_k.weight                     | BF16   | [256, 2048]
model.layers.15.self_attn.o_proj.weight          -> blk.15.attn_output.weight                | BF16   | [2048, 2048]
model.layers.15.self_attn.q_proj.weight          -> blk.15.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.15.self_attn.v_proj.weight          -> blk.15.attn_v.weight                     | BF16   | [256, 2048]
model.layers.16.input_layernorm.weight           -> blk.16.attn_norm.weight                  | BF16   | [2048]
model.layers.16.mlp.down_proj.weight             -> blk.16.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.16.mlp.gate_proj.weight             -> blk.16.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.16.mlp.up_proj.weight               -> blk.16.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.16.post_attention_layernorm.weight  -> blk.16.ffn_norm.weight                   | BF16   | [2048]
model.layers.16.self_attn.k_proj.weight          -> blk.16.attn_k.weight                     | BF16   | [256, 2048]
model.layers.16.self_attn.o_proj.weight          -> blk.16.attn_output.weight                | BF16   | [2048, 2048]
model.layers.16.self_attn.q_proj.weight          -> blk.16.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.16.self_attn.v_proj.weight          -> blk.16.attn_v.weight                     | BF16   | [256, 2048]
model.layers.17.input_layernorm.weight           -> blk.17.attn_norm.weight                  | BF16   | [2048]
model.layers.17.mlp.down_proj.weight             -> blk.17.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.17.mlp.gate_proj.weight             -> blk.17.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.17.mlp.up_proj.weight               -> blk.17.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.17.post_attention_layernorm.weight  -> blk.17.ffn_norm.weight                   | BF16   | [2048]
model.layers.17.self_attn.k_proj.weight          -> blk.17.attn_k.weight                     | BF16   | [256, 2048]
model.layers.17.self_attn.o_proj.weight          -> blk.17.attn_output.weight                | BF16   | [2048, 2048]
model.layers.17.self_attn.q_proj.weight          -> blk.17.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.17.self_attn.v_proj.weight          -> blk.17.attn_v.weight                     | BF16   | [256, 2048]
model.layers.18.input_layernorm.weight           -> blk.18.attn_norm.weight                  | BF16   | [2048]
model.layers.18.mlp.down_proj.weight             -> blk.18.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.18.mlp.gate_proj.weight             -> blk.18.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.18.mlp.up_proj.weight               -> blk.18.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.18.post_attention_layernorm.weight  -> blk.18.ffn_norm.weight                   | BF16   | [2048]
model.layers.18.self_attn.k_proj.weight          -> blk.18.attn_k.weight                     | BF16   | [256, 2048]
model.layers.18.self_attn.o_proj.weight          -> blk.18.attn_output.weight                | BF16   | [2048, 2048]
model.layers.18.self_attn.q_proj.weight          -> blk.18.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.18.self_attn.v_proj.weight          -> blk.18.attn_v.weight                     | BF16   | [256, 2048]
model.layers.19.input_layernorm.weight           -> blk.19.attn_norm.weight                  | BF16   | [2048]
model.layers.19.mlp.down_proj.weight             -> blk.19.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.19.mlp.gate_proj.weight             -> blk.19.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.19.mlp.up_proj.weight               -> blk.19.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.19.post_attention_layernorm.weight  -> blk.19.ffn_norm.weight                   | BF16   | [2048]
model.layers.19.self_attn.k_proj.weight          -> blk.19.attn_k.weight                     | BF16   | [256, 2048]
model.layers.19.self_attn.o_proj.weight          -> blk.19.attn_output.weight                | BF16   | [2048, 2048]
model.layers.19.self_attn.q_proj.weight          -> blk.19.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.19.self_attn.v_proj.weight          -> blk.19.attn_v.weight                     | BF16   | [256, 2048]
model.layers.2.input_layernorm.weight            -> blk.2.attn_norm.weight                   | BF16   | [2048]
model.layers.2.mlp.down_proj.weight              -> blk.2.ffn_down.weight                    | BF16   | [2048, 5632]
model.layers.2.mlp.gate_proj.weight              -> blk.2.ffn_gate.weight                    | BF16   | [5632, 2048]
model.layers.2.mlp.up_proj.weight                -> blk.2.ffn_up.weight                      | BF16   | [5632, 2048]
model.layers.2.post_attention_layernorm.weight   -> blk.2.ffn_norm.weight                    | BF16   | [2048]
model.layers.2.self_attn.k_proj.weight           -> blk.2.attn_k.weight                      | BF16   | [256, 2048]
model.layers.2.self_attn.o_proj.weight           -> blk.2.attn_output.weight                 | BF16   | [2048, 2048]
model.layers.2.self_attn.q_proj.weight           -> blk.2.attn_q.weight                      | BF16   | [2048, 2048]
model.layers.2.self_attn.v_proj.weight           -> blk.2.attn_v.weight                      | BF16   | [256, 2048]
model.layers.20.input_layernorm.weight           -> blk.20.attn_norm.weight                  | BF16   | [2048]
model.layers.20.mlp.down_proj.weight             -> blk.20.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.20.mlp.gate_proj.weight             -> blk.20.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.20.mlp.up_proj.weight               -> blk.20.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.20.post_attention_layernorm.weight  -> blk.20.ffn_norm.weight                   | BF16   | [2048]
model.layers.20.self_attn.k_proj.weight          -> blk.20.attn_k.weight                     | BF16   | [256, 2048]
model.layers.20.self_attn.o_proj.weight          -> blk.20.attn_output.weight                | BF16   | [2048, 2048]
model.layers.20.self_attn.q_proj.weight          -> blk.20.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.20.self_attn.v_proj.weight          -> blk.20.attn_v.weight                     | BF16   | [256, 2048]
model.layers.21.input_layernorm.weight           -> blk.21.attn_norm.weight                  | BF16   | [2048]
model.layers.21.mlp.down_proj.weight             -> blk.21.ffn_down.weight                   | BF16   | [2048, 5632]
model.layers.21.mlp.gate_proj.weight             -> blk.21.ffn_gate.weight                   | BF16   | [5632, 2048]
model.layers.21.mlp.up_proj.weight               -> blk.21.ffn_up.weight                     | BF16   | [5632, 2048]
model.layers.21.post_attention_layernorm.weight  -> blk.21.ffn_norm.weight                   | BF16   | [2048]
model.layers.21.self_attn.k_proj.weight          -> blk.21.attn_k.weight                     | BF16   | [256, 2048]
model.layers.21.self_attn.o_proj.weight          -> blk.21.attn_output.weight                | BF16   | [2048, 2048]
model.layers.21.self_attn.q_proj.weight          -> blk.21.attn_q.weight                     | BF16   | [2048, 2048]
model.layers.21.self_attn.v_proj.weight          -> blk.21.attn_v.weight                     | BF16   | [256, 2048]
model.layers.3.input_layernorm.weight            -> blk.3.attn_norm.weight                   | BF16   | [2048]
model.layers.3.mlp.down_proj.weight              -> blk.3.ffn_down.weight                    | BF16   | [2048, 5632]
model.layers.3.mlp.gate_proj.weight              -> blk.3.ffn_gate.weight                    | BF16   | [5632, 2048]
model.layers.3.mlp.up_proj.weight                -> blk.3.ffn_up.weight                      | BF16   | [5632, 2048]
model.layers.3.post_attention_layernorm.weight   -> blk.3.ffn_norm.weight                    | BF16   | [2048]
model.layers.3.self_attn.k_proj.weight           -> blk.3.attn_k.weight                      | BF16   | [256, 2048]
model.layers.3.self_attn.o_proj.weight           -> blk.3.attn_output.weight                 | BF16   | [2048, 2048]
model.layers.3.self_attn.q_proj.weight           -> blk.3.attn_q.weight                      | BF16   | [2048, 2048]
model.layers.3.self_attn.v_proj.weight           -> blk.3.attn_v.weight                      | BF16   | [256, 2048]
model.layers.4.input_layernorm.weight            -> blk.4.attn_norm.weight                   | BF16   | [2048]
model.layers.4.mlp.down_proj.weight              -> blk.4.ffn_down.weight                    | BF16   | [2048, 5632]
model.layers.4.mlp.gate_proj.weight              -> blk.4.ffn_gate.weight                    | BF16   | [5632, 2048]
model.layers.4.mlp.up_proj.weight                -> blk.4.ffn_up.weight                      | BF16   | [5632, 2048]
model.layers.4.post_attention_layernorm.weight   -> blk.4.ffn_norm.weight                    | BF16   | [2048]
model.layers.4.self_attn.k_proj.weight           -> blk.4.attn_k.weight                      | BF16   | [256, 2048]
model.layers.4.self_attn.o_proj.weight           -> blk.4.attn_output.weight                 | BF16   | [2048, 2048]
model.layers.4.self_attn.q_proj.weight           -> blk.4.attn_q.weight                      | BF16   | [2048, 2048]
model.layers.4.self_attn.v_proj.weight           -> blk.4.attn_v.weight                      | BF16   | [256, 2048]
model.layers.5.input_layernorm.weight            -> blk.5.attn_norm.weight                   | BF16   | [2048]
model.layers.5.mlp.down_proj.weight              -> blk.5.ffn_down.weight                    | BF16   | [2048, 5632]
model.layers.5.mlp.gate_proj.weight              -> blk.5.ffn_gate.weight                    | BF16   | [5632, 2048]
model.layers.5.mlp.up_proj.weight                -> blk.5.ffn_up.weight                      | BF16   | [5632, 2048]
model.layers.5.post_attention_layernorm.weight   -> blk.5.ffn_norm.weight                    | BF16   | [2048]
model.layers.5.self_attn.k_proj.weight           -> blk.5.attn_k.weight                      | BF16   | [256, 2048]
model.layers.5.self_attn.o_proj.weight           -> blk.5.attn_output.weight                 | BF16   | [2048, 2048]
model.layers.5.self_attn.q_proj.weight           -> blk.5.attn_q.weight                      | BF16   | [2048, 2048]
model.layers.5.self_attn.v_proj.weight           -> blk.5.attn_v.weight                      | BF16   | [256, 2048]
model.layers.6.input_layernorm.weight            -> blk.6.attn_norm.weight                   | BF16   | [2048]
model.layers.6.mlp.down_proj.weight              -> blk.6.ffn_down.weight                    | BF16   | [2048, 5632]
model.layers.6.mlp.gate_proj.weight              -> blk.6.ffn_gate.weight                    | BF16   | [5632, 2048]
model.layers.6.mlp.up_proj.weight                -> blk.6.ffn_up.weight                      | BF16   | [5632, 2048]
model.layers.6.post_attention_layernorm.weight   -> blk.6.ffn_norm.weight                    | BF16   | [2048]
model.layers.6.self_attn.k_proj.weight           -> blk.6.attn_k.weight                      | BF16   | [256, 2048]
model.layers.6.self_attn.o_proj.weight           -> blk.6.attn_output.weight                 | BF16   | [2048, 2048]
model.layers.6.self_attn.q_proj.weight           -> blk.6.attn_q.weight                      | BF16   | [2048, 2048]
model.layers.6.self_attn.v_proj.weight           -> blk.6.attn_v.weight                      | BF16   | [256, 2048]
model.layers.7.input_layernorm.weight            -> blk.7.attn_norm.weight                   | BF16   | [2048]
model.layers.7.mlp.down_proj.weight              -> blk.7.ffn_down.weight                    | BF16   | [2048, 5632]
model.layers.7.mlp.gate_proj.weight              -> blk.7.ffn_gate.weight                    | BF16   | [5632, 2048]
model.layers.7.mlp.up_proj.weight                -> blk.7.ffn_up.weight                      | BF16   | [5632, 2048]
model.layers.7.post_attention_layernorm.weight   -> blk.7.ffn_norm.weight                    | BF16   | [2048]
model.layers.7.self_attn.k_proj.weight           -> blk.7.attn_k.weight                      | BF16   | [256, 2048]
model.layers.7.self_attn.o_proj.weight           -> blk.7.attn_output.weight                 | BF16   | [2048, 2048]
model.layers.7.self_attn.q_proj.weight           -> blk.7.attn_q.weight                      | BF16   | [2048, 2048]
model.layers.7.self_attn.v_proj.weight           -> blk.7.attn_v.weight                      | BF16   | [256, 2048]
model.layers.8.input_layernorm.weight            -> blk.8.attn_norm.weight                   | BF16   | [2048]
model.layers.8.mlp.down_proj.weight              -> blk.8.ffn_down.weight                    | BF16   | [2048, 5632]
model.layers.8.mlp.gate_proj.weight              -> blk.8.ffn_gate.weight                    | BF16   | [5632, 2048]
model.layers.8.mlp.up_proj.weight                -> blk.8.ffn_up.weight                      | BF16   | [5632, 2048]
model.layers.8.post_attention_layernorm.weight   -> blk.8.ffn_norm.weight                    | BF16   | [2048]
model.layers.8.self_attn.k_proj.weight           -> blk.8.attn_k.weight                      | BF16   | [256, 2048]
model.layers.8.self_attn.o_proj.weight           -> blk.8.attn_output.weight                 | BF16   | [2048, 2048]
model.layers.8.self_attn.q_proj.weight           -> blk.8.attn_q.weight                      | BF16   | [2048, 2048]
model.layers.8.self_attn.v_proj.weight           -> blk.8.attn_v.weight                      | BF16   | [256, 2048]
model.layers.9.input_layernorm.weight            -> blk.9.attn_norm.weight                   | BF16   | [2048]
model.layers.9.mlp.down_proj.weight              -> blk.9.ffn_down.weight                    | BF16   | [2048, 5632]
model.layers.9.mlp.gate_proj.weight              -> blk.9.ffn_gate.weight                    | BF16   | [5632, 2048]
model.layers.9.mlp.up_proj.weight                -> blk.9.ffn_up.weight                      | BF16   | [5632, 2048]
model.layers.9.post_attention_layernorm.weight   -> blk.9.ffn_norm.weight                    | BF16   | [2048]
model.layers.9.self_attn.k_proj.weight           -> blk.9.attn_k.weight                      | BF16   | [256, 2048]
model.layers.9.self_attn.o_proj.weight           -> blk.9.attn_output.weight                 | BF16   | [2048, 2048]
model.layers.9.self_attn.q_proj.weight           -> blk.9.attn_q.weight                      | BF16   | [2048, 2048]
model.layers.9.self_attn.v_proj.weight           -> blk.9.attn_v.weight                      | BF16   | [256, 2048]
model.norm.weight                                -> output_norm.weight                       | BF16   | [2048]
Writing content/test-save-q4_k_m-unsloth.Q8_0.gguf, format 7
Ignoring added_tokens.json since model matches vocab size without it.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-15-9024c7b554f4>](https://hkjdhk20zt-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240321-060149_RC00_617803427#) in <cell line: 1>()
----> 1 model.save_pretrained_gguf("/content/test-save-q4_k_m", tokenizer)
      2 
      3 # Save to 8bit Q8_0
      4 if False: model.save_pretrained_gguf(MODEL_OUTPUT_DIR + MODEL_NAME, tokenizer,)
      5 if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

1 frames
[/usr/local/lib/python3.10/dist-packages/unsloth/save.py](https://hkjdhk20zt-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20240321-060149_RC00_617803427#) in save_to_gguf(model_type, model_directory, quantization_method, first_conversion, _run_installer)
    894     # Check if quantization succeeded!
    895     if not os.path.isfile(final_location):
--> 896         raise RuntimeError(
    897             f"Unsloth: Quantization failed for {final_location}\n"\
    898             "You might have to compile llama.cpp yourself, then run this again.\n"\

RuntimeError: Unsloth: Quantization failed for .//content/test-save-q4_k_m-unsloth.Q8_0.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUBLAS=1 make all -j
Once that's done, redo the quantization.
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Saving as GGUF fails #275

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Saving as GGUF fails #275

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions