Skip to content

Error saving GGUF of Gemma27B (but not Gemma4B) on DGX Spark #3581

@weoieoeo

Description

@weoieoeo

After successful vision finetuning of vision model Gemma27B (4bit) I run into this error. The process utilizes only approximately 65 GB of the available 128 GB of unified RAM. This error does not occur when I finetune the smaller Gemma4B (4bit) with the same vision dataset.

I am grateful for any advice

{'loss': 0.0248, 'grad_norm': 0.3881801664829254, 'learning_rate': 8.695652173913045e-09, 'epoch': 20.0}
{'train_runtime': 196532.9404, 'train_samples_per_second': 0.192, 'train_steps_per_second': 0.006, 'train_loss': 0.07668430322393154, 'epoch': 20.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 1200/1200 [54:35:32<00:00, 163.78s/it]
Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### Your chat template has a BOS token. We shall remove it temporarily.
Unsloth: Merging model weights to 16-bit format...
Detected local model directory: /workspace/AIEngine/medgemma-27b-it
Copied tokenizer.model from local model directory
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Unsloth: Preparing safetensor model files: 0%| | 0/12 [00:00<?, ?it/s]Copied model-00003-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 8%|█████▍ | 1/12 [00:02<00:22, 2.02s/it]Copied model-00006-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 17%|██████████▊ | 2/12 [00:04<00:25, 2.52s/it]Copied model-00012-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 25%|████████████████▎ | 3/12 [00:05<00:13, 1.45s/it]Copied model-00009-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 33%|█████████████████████▋ | 4/12 [00:06<00:12, 1.62s/it]Copied model-00002-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 42%|███████████████████████████ | 5/12 [00:08<00:12, 1.76s/it]Copied model-00007-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 50%|████████████████████████████████▌ | 6/12 [00:10<00:10, 1.82s/it]Copied model-00010-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 58%|█████████████████████████████████████▉ | 7/12 [00:13<00:09, 1.96s/it]Copied model-00008-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 67%|███████████████████████████████████████████▎ | 8/12 [00:15<00:08, 2.00s/it]Copied model-00004-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 75%|████████████████████████████████████████████████▊ | 9/12 [00:17<00:06, 2.00s/it]Copied model-00001-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 83%|█████████████████████████████████████████████████████▎ | 10/12 [00:21<00:05, 2.60s/it]Copied model-00011-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 92%|██████████████████████████████████████████████████████████▋ | 11/12 [00:23<00:02, 2.45s/it]Copied model-00005-of-00012.safetensors from local model directory
Unsloth: Preparing safetensor model files: 100%|████████████████████████████████████████████████████████████████| 12/12 [00:25<00:00, 2.10s/it]
Unsloth: Merging weights into 16bit: 100%|██████████████████████████████████████████████████████████████████████| 12/12 [07:34<00:00, 37.89s/it]
Unsloth: Merge process complete. Saved to /home/ollam3/unsloth_finetune
Unsloth: Converting to GGUF format...
==((====))== Unsloth: Conversion from HF to GGUF information
\ /| [0] Installing llama.cpp might take 3 minutes.
O^O/ _/ \ [1] Converting HF to GGUF bf16 might take 3 minutes.
\ / [2] Converting GGUF bf16 to ['q4_k_m'] might take 10 minutes each.
"-____-" In total, you will have to wait at least 16 minutes.

Unsloth: llama.cpp found in the system. Skipping installation.
Unsloth: Preparing converter script...
Unsloth: [1] Converting model into bf16 GGUF format.
This might take 3 minutes...
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/unsloth_zoo/llama_cpp.py", line 991, in convert_to_gguf
subprocess.run(command, shell=True, check=True, capture_output=True)
File "/usr/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python llama.cpp/unsloth_convert_hf_to_gguf.py --outfile medgemma-27b-it.BF16.gguf --outtype bf16 --split-max-size 50G unsloth_finetune' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/unsloth/save.py", line 1835, in unsloth_save_pretrained_gguf
all_file_locations, want_full_precision, is_vlm_update = save_to_gguf(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth/save.py", line 1099, in save_to_gguf
initial_files, is_vlm_update = convert_to_gguf(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth_zoo/llama_cpp.py", line 995, in convert_to_gguf
raise RuntimeError(f"Unsloth: Failed to convert {description} to GGUF: {e}")
RuntimeError: Unsloth: Failed to convert text model to GGUF: Command 'python llama.cpp/unsloth_convert_hf_to_gguf.py --outfile medgemma-27b-it.BF16.gguf --outtype bf16 --split-max-size 50G unsloth_finetune' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ollam3/finetunevisionGemma3_Herz.py", line 217, in
model.save_pretrained_gguf("unsloth_finetune", tokenizer, quantization_method = "q4_k_m")
File "/usr/local/lib/python3.12/dist-packages/unsloth/save.py", line 1855, in unsloth_save_pretrained_gguf
raise RuntimeError(f"Unsloth: GGUF conversion failed: {e}")
RuntimeError: Unsloth: GGUF conversion failed: Unsloth: Failed to convert text model to GGUF: Command 'python llama.cpp/unsloth_convert_hf_to_gguf.py --outfile medgemma-27b-it.BF16.gguf --outtype bf16 --split-max-size 50G unsloth_finetune' returned non-zero exit status 1.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions