[Bug] [Qwen3 4-bit] GGUF Conversion in v2025.6.12 Produces Garbage Output (GGGG...)

Environment:
    unsloth==2025.6.12 (latest)
    unsloth-zoo==2025.6.12
    transformers==4.52.4
    torch==2.7.0+cu126
    CUDA 12.6
    Ubuntu 22.04
    RTX 4060 Ti (16GB VRAM)

Critical Observation:
The issue persists in the newest Unsloth version despite the closed #2098 fix. 

Technical Details:
    Quantization Artifacts:
        NaN values appear only during GGUF conversion (not in original HF model)
        Error occurs at layer blk.3.attn_k.weight (BF16→q4_k_m)
    Memory Analysis:
```
    bash
# During conversion:
GPU Memory: 14.2/16.0 GB utilized
System RAM: 32GB (70% free)
```

Reproduction Script:
```
python
    from unsloth import FastLanguageModel
    model, _ = FastLanguageModel.from_pretrained("unsloth/Qwen3-4B-unsloth-bnb-4bit")
    model.save_pretrained_gguf("test", quantization_method="q4_k_m")  # Fails
```

Request:
    Please provide:
        Recommended workaround for Qwen3 4-bit
        Expected timeline for hotfix
      
```
from unsloth import FastLanguageModel
from peft import PeftModel
import torch
import os
import gc

def clean_memory():
    """Clear GPU cache and free memory"""
    torch.cuda.empty_cache()
    gc.collect()

# 1. Load base model in original format
model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Qwen3-4B-unsloth-bnb-4bit",
    load_in_4bit=True,
    device_map="auto",
)

# 2. Load adapter with file validation
adapter_path = "Qwen3-4B-unsloth-bnb-4bit+dataset_PB_eng"
try:
    # Check for required adapter files
    required_files = ['adapter_config.json', 'adapter_model.safetensors']
    for file in required_files:
        if not os.path.exists(os.path.join(adapter_path, file)):
            raise FileNotFoundError(f"Missing adapter file: {file}")
    
    model = PeftModel.from_pretrained(model, adapter_path)
    print(f"✅ Adapter loaded. Model type: {type(model)}")
except Exception as e:
    print(f"❌ Adapter loading error: {e}")
    exit()

# 3. Merge adapter with NaN check
try:
    print("Checking for NaN values...")
    nan_found = False
    for name, param in model.named_parameters():
        if torch.isnan(param).any():
            print(f"⚠️ NaN detected in {name}, replacing with zeros")
            param[torch.isnan(param)] = 0
            nan_found = True
    
    if nan_found:
        print("⚠️ Warning: NaN values were found and replaced")

    with torch.inference_mode():
        model = model.merge_and_unload()
        model.config.use_cache = False
    
    print("✅ Model merged successfully")
    clean_memory()
except Exception as e:
    print(f"❌ Merge error: {e}")
    exit()

# 4. GGUF conversion with fallback
output_dir = "PB_eng"
os.makedirs(output_dir, exist_ok=True)

try:
    print("Starting GGUF conversion (Q4_K_M)...")
    model.save_pretrained_gguf(
        output_dir,
        tokenizer=tokenizer,
        quantization_method="q4_k_m",
        maximum_memory_usage=0.7,
    )
    
    # Verify output file
    gguf_file = os.path.join(output_dir, "unsloth.Q4_K_M.gguf")
    if os.path.exists(gguf_file):
        size_gb = os.path.getsize(gguf_file) / (1024 ** 3)
        print(f"✅ Conversion successful! File size: {size_gb:.2f}GB")
    else:
        print("❌ GGUF file not created!")
except Exception as e:
    print(f"❌ Q4_K_M conversion failed: {e}")
    
    # Fallback to F16
    try:
        print("Attempting F16 conversion as fallback...")
        model.save_pretrained_gguf(
            output_dir,
            tokenizer=tokenizer,
            quantization_method="f16",
        )
        print("✅ F16 conversion completed")
    except Exception as e:
        print(f"❌ F16 conversion failed: {e}")
        exit()
```

===================================================================================

Log:
(u_env) oleg@oleg-MS-7B86:~$ python z.py
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.6.12: Fast Qwen3 patching. Transformers: 4.52.4.
   \\   /|    NVIDIA GeForce RTX 4060 Ti. Num GPUs = 1. Max memory: 15.576 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
✅ Adapter loaded. Model type: <class 'peft.peft_model.PeftModelForCausalLM'>
Checking for NaN values...
/home/oleg/miniconda3/envs/u_env/lib/python3.10/site-packages/peft/tuners/lora/bnb.py:351: UserWarning: Merge lora module to 4-bit linear may get different generations due to rounding errors.
  warnings.warn(
✅ Model merged successfully
Starting GGUF conversion (Q4_K_M)...
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 37.11 out of 62.72 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...
  0%|                                                    | 0/36 [00:00<?, ?it/s]
We will save to Disk and not RAM now.
100%|███████████████████████████████████████████| 36/36 [00:07<00:00,  5.08it/s]
Unsloth: Saving tokenizer... Done.
Done.
Unsloth: Converting qwen3 model. Can use fast conversion = False.
==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: [1] Converting model at PB_eng into bf16 GGUF format.
The output location will be /home/oleg/PB_eng/unsloth.BF16.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: PB_eng
INFO:hf-to-gguf:Model architecture: Qwen3ForCausalLM
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00003.safetensors'
INFO:hf-to-gguf:token_embd.weight,         torch.bfloat16 --> BF16, shape = {2560, 151936}
INFO:hf-to-gguf:blk.0.attn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.ffn_down.weight,     torch.bfloat16 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,     torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.0.ffn_up.weight,       torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.0.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.0.attn_k_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.0.attn_k.weight,       torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.0.attn_output.weight,  torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.0.attn_q_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.0.attn_q.weight,       torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.0.attn_v.weight,       torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.1.attn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.ffn_down.weight,     torch.bfloat16 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.1.ffn_gate.weight,     torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.1.ffn_up.weight,       torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.1.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.1.attn_k_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.1.attn_k.weight,       torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.1.attn_output.weight,  torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.1.attn_q_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.1.attn_q.weight,       torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.1.attn_v.weight,       torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.10.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.10.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.10.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.10.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.10.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.10.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.10.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.10.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.10.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.11.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.11.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.11.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.11.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.11.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.11.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.11.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.11.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.11.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.12.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.12.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.12.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.12.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.12.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.2.attn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.ffn_down.weight,     torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.2.ffn_gate.weight,     torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.2.ffn_up.weight,       torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.2.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.2.attn_k_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.2.attn_k.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.2.attn_output.weight,  torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.2.attn_q_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.2.attn_q.weight,       torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.2.attn_v.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.3.attn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.ffn_down.weight,     torch.bfloat16 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.3.ffn_gate.weight,     torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.3.ffn_up.weight,       torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.3.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.3.attn_k_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.3.attn_k.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.3.attn_output.weight,  torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.3.attn_q_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.3.attn_q.weight,       torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.3.attn_v.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.4.attn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.ffn_down.weight,     torch.bfloat16 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.4.ffn_gate.weight,     torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.4.ffn_up.weight,       torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.4.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.4.attn_k_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.4.attn_k.weight,       torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.4.attn_output.weight,  torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.4.attn_q_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.4.attn_q.weight,       torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.4.attn_v.weight,       torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.5.attn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.ffn_down.weight,     torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.5.ffn_gate.weight,     torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.5.ffn_up.weight,       torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.5.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.5.attn_k_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.5.attn_k.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.5.attn_output.weight,  torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.5.attn_q_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.5.attn_q.weight,       torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.5.attn_v.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.6.attn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.ffn_down.weight,     torch.bfloat16 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.6.ffn_gate.weight,     torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.6.ffn_up.weight,       torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.6.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.6.attn_k_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.6.attn_k.weight,       torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.6.attn_output.weight,  torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.6.attn_q_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.6.attn_q.weight,       torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.6.attn_v.weight,       torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.7.attn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.ffn_down.weight,     torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.7.ffn_gate.weight,     torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.7.ffn_up.weight,       torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.7.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.7.attn_k_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.7.attn_k.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.7.attn_output.weight,  torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.7.attn_q_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.7.attn_q.weight,       torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.7.attn_v.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.8.attn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.ffn_down.weight,     torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.8.ffn_gate.weight,     torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.8.ffn_up.weight,       torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.8.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.8.attn_k_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.8.attn_k.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.8.attn_output.weight,  torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.8.attn_q_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.8.attn_q.weight,       torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.8.attn_v.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.9.attn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.ffn_down.weight,     torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.9.ffn_gate.weight,     torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.9.ffn_up.weight,       torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.9.ffn_norm.weight,     torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.9.attn_k_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.9.attn_k.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.9.attn_output.weight,  torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.9.attn_q_norm.weight,  torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.9.attn_q.weight,       torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.9.attn_v.weight,       torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:gguf: loading model part 'model-00002-of-00003.safetensors'
INFO:hf-to-gguf:blk.12.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.12.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.12.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.12.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.13.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.13.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.13.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.13.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.13.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.13.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.13.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.13.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.13.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.14.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.14.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.14.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.14.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.14.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.14.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.14.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.14.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.14.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.15.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.15.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.15.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.15.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.15.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.15.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.15.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.15.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.15.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.16.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.16.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.16.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.16.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.16.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.16.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.16.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.16.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.16.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.17.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.17.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.17.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.17.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.17.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.17.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.17.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.17.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.17.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.18.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.18.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.18.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.18.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.18.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.18.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.18.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.18.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.18.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.19.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.19.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.19.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.19.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.19.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.19.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.19.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.19.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.19.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.20.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.20.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.20.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.20.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.20.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.20.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.20.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.20.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.20.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.21.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.21.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.21.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.21.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.21.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.21.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.21.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.21.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.21.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.22.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.22.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.22.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.22.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.22.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.22.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.22.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.22.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.22.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.23.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.23.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.23.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.23.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.23.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.23.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.23.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.23.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.23.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.24.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.24.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.24.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.24.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.24.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.24.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.24.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.24.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.24.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.25.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.25.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:gguf: loading model part 'model-00003-of-00003.safetensors'
INFO:hf-to-gguf:blk.25.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.25.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.25.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.25.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.25.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.25.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.25.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.26.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.26.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.26.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.26.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.26.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.26.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.26.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.26.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.26.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.27.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.27.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.27.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.27.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.27.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.27.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.27.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.27.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.27.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.28.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.28.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.28.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.28.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.28.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.28.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.28.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.28.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.28.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.28.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.29.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.29.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.29.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.29.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.29.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.29.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.29.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.29.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.29.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.29.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.30.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.30.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.30.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.30.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.30.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.30.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.30.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.30.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.30.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.30.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.31.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.31.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.31.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.31.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.31.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.31.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.31.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.31.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.31.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.31.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.32.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.32.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.32.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.32.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.32.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.32.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.32.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.32.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.32.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.32.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.33.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.33.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.33.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.33.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.33.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.33.attn_k.weight,      torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.33.attn_output.weight, torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.33.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.33.attn_q.weight,      torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.33.attn_v.weight,      torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.34.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.ffn_down.weight,    torch.bfloat16 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.34.ffn_gate.weight,    torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.34.ffn_up.weight,      torch.bfloat16 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.34.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.34.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.34.attn_k.weight,      torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.34.attn_output.weight, torch.bfloat16 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.34.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.34.attn_q.weight,      torch.bfloat16 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.34.attn_v.weight,      torch.bfloat16 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.35.attn_norm.weight,   torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.ffn_down.weight,    torch.float32 --> BF16, shape = {9728, 2560}
INFO:hf-to-gguf:blk.35.ffn_gate.weight,    torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.35.ffn_up.weight,      torch.float32 --> BF16, shape = {2560, 9728}
INFO:hf-to-gguf:blk.35.ffn_norm.weight,    torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:blk.35.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.35.attn_k.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:blk.35.attn_output.weight, torch.float32 --> BF16, shape = {4096, 2560}
INFO:hf-to-gguf:blk.35.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {128}
INFO:hf-to-gguf:blk.35.attn_q.weight,      torch.float32 --> BF16, shape = {2560, 4096}
INFO:hf-to-gguf:blk.35.attn_v.weight,      torch.float32 --> BF16, shape = {2560, 1024}
INFO:hf-to-gguf:output_norm.weight,        torch.bfloat16 --> F32, shape = {2560}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 40960
INFO:hf-to-gguf:gguf: embedding length = 2560
INFO:hf-to-gguf:gguf: feed forward length = 9728
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 1000000
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-06
INFO:hf-to-gguf:gguf: file type = 32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Adding 151387 merge(s).
INFO:gguf.vocab:Setting special token type eos to 151645
INFO:gguf.vocab:Setting special token type pad to 151654
INFO:gguf.vocab:Setting add_bos_token to False
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/home/oleg/PB_eng/unsloth.BF16.gguf: n_tensors = 398, total_size = 8.0G
Writing: 100%|██████████| 8.05G/8.05G [00:32<00:00, 245Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to /home/oleg/PB_eng/unsloth.BF16.gguf
Unsloth: Conversion completed! Output location: /home/oleg/PB_eng/unsloth.BF16.gguf
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This might take 20 minutes...
main: build = 1 (bb16041)
main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
main: quantizing '/home/oleg/PB_eng/unsloth.BF16.gguf' to '/home/oleg/PB_eng/unsloth.Q4_K_M.gguf' as Q4_K_M using 24 threads
llama_model_loader: loaded meta data with 24 key-value pairs and 398 tensors from /home/oleg/PB_eng/unsloth.BF16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen3
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = PB_eng
llama_model_loader: - kv   3:                         general.size_label str              = 4.0B
llama_model_loader: - kv   4:                          qwen3.block_count u32              = 36
llama_model_loader: - kv   5:                       qwen3.context_length u32              = 40960
llama_model_loader: - kv   6:                     qwen3.embedding_length u32              = 2560
llama_model_loader: - kv   7:                  qwen3.feed_forward_length u32              = 9728
llama_model_loader: - kv   8:                 qwen3.attention.head_count u32              = 32
llama_model_loader: - kv   9:              qwen3.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:                       qwen3.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  11:     qwen3.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  12:                 qwen3.attention.key_length u32              = 128
llama_model_loader: - kv  13:               qwen3.attention.value_length u32              = 128
llama_model_loader: - kv  14:                          general.file_type u32              = 32
llama_model_loader: - kv  15:               general.quantization_version u32              = 2
llama_model_loader: - kv  16:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  17:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  18:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  19:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  20:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  21:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  22:            tokenizer.ggml.padding_token_id u32              = 151654
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - type  f32:  145 tensors
llama_model_loader: - type bf16:  253 tensors
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
[   1/ 398]                   output_norm.weight - [ 2560,     1,     1,     1], type =    f32, size =    0.010 MB
[   2/ 398]                    token_embd.weight - [ 2560, 151936,     1,     1], type =   bf16, converting to q6_K .. size =   741.88 MiB ->   304.28 MiB
[   3/ 398]                  blk.0.attn_k.weight - [ 2560,  1024,     1,     1], type =   bf16, converting to q4_K .. size =     5.00 MiB ->     1.41 MiB
[   4/ 398]             blk.0.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[   5/ 398]               blk.0.attn_norm.weight - [ 2560,     1,     1,     1], type =    f32, size =    0.010 MB
[   6/ 398]             blk.0.attn_output.weight - [ 4096,  2560,     1,     1], type =   bf16, converting to q4_K .. size =    20.00 MiB ->     5.62 MiB
[   7/ 398]                  blk.0.attn_q.weight - [ 2560,  4096,     1,     1], type =   bf16, converting to q4_K .. size =    20.00 MiB ->     5.62 MiB
[   8/ 398]             blk.0.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[   9/ 398]                  blk.0.attn_v.weight - [ 2560,  1024,     1,     1], type =   bf16, converting to q6_K .. size =     5.00 MiB ->     2.05 MiB
[  10/ 398]                blk.0.ffn_down.weight - [ 9728,  2560,     1,     1], type =   bf16, converting to q6_K .. size =    47.50 MiB ->    19.48 MiB
[  11/ 398]                blk.0.ffn_gate.weight - [ 2560,  9728,     1,     1], type =   bf16, converting to q4_K .. size =    47.50 MiB ->    13.36 MiB
[  12/ 398]                blk.0.ffn_norm.weight - [ 2560,     1,     1,     1], type =    f32, size =    0.010 MB
[  13/ 398]                  blk.0.ffn_up.weight - [ 2560,  9728,     1,     1], type =   bf16, converting to q4_K .. size =    47.50 MiB ->    13.36 MiB
[  14/ 398]                  blk.1.attn_k.weight - [ 2560,  1024,     1,     1], type =   bf16, converting to q4_K .. size =     5.00 MiB ->     1.41 MiB
[  15/ 398]             blk.1.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  16/ 398]               blk.1.attn_norm.weight - [ 2560,     1,     1,     1], type =    f32, size =    0.010 MB
[  17/ 398]             blk.1.attn_output.weight - [ 4096,  2560,     1,     1], type =   bf16, converting to q4_K .. size =    20.00 MiB ->     5.62 MiB
[  18/ 398]                  blk.1.attn_q.weight - [ 2560,  4096,     1,     1], type =   bf16, converting to q4_K .. size =    20.00 MiB ->     5.62 MiB
[  19/ 398]             blk.1.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  20/ 398]                  blk.1.attn_v.weight - [ 2560,  1024,     1,     1], type =   bf16, converting to q6_K .. size =     5.00 MiB ->     2.05 MiB
[  21/ 398]                blk.1.ffn_down.weight - [ 9728,  2560,     1,     1], type =   bf16, converting to q6_K .. size =    47.50 MiB ->    19.48 MiB
[  22/ 398]                blk.1.ffn_gate.weight - [ 2560,  9728,     1,     1], type =   bf16, converting to q4_K .. size =    47.50 MiB ->    13.36 MiB
[  23/ 398]                blk.1.ffn_norm.weight - [ 2560,     1,     1,     1], type =    f32, size =    0.010 MB
[  24/ 398]                  blk.1.ffn_up.weight - [ 2560,  9728,     1,     1], type =   bf16, converting to q4_K .. size =    47.50 MiB ->    13.36 MiB
[  25/ 398]                  blk.2.attn_k.weight - [ 2560,  1024,     1,     1], type =   bf16, converting to q4_K .. size =     5.00 MiB ->     1.41 MiB
[  26/ 398]             blk.2.attn_k_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  27/ 398]               blk.2.attn_norm.weight - [ 2560,     1,     1,     1], type =    f32, size =    0.010 MB
[  28/ 398]             blk.2.attn_output.weight - [ 4096,  2560,     1,     1], type =   bf16, converting to q4_K .. size =    20.00 MiB ->     5.62 MiB
[  29/ 398]                  blk.2.attn_q.weight - [ 2560,  4096,     1,     1], type =   bf16, converting to q4_K .. size =    20.00 MiB ->     5.62 MiB
[  30/ 398]             blk.2.attn_q_norm.weight - [  128,     1,     1,     1], type =    f32, size =    0.000 MB
[  31/ 398]                  blk.2.attn_v.weight - [ 2560,  1024,     1,     1], type =   bf16, converting to q6_K .. size =     5.00 MiB ->     2.05 MiB
[  32/ 398]                blk.2.ffn_down.weight - [ 9728,  2560,     1,     1], type =   bf16, converting to q6_K .. size =    47.50 MiB ->    19.48 MiB
[  33/ 398]                blk.2.ffn_gate.weight - [ 2560,  9728,     1,     1], type =   bf16, converting to q4_K .. size =    47.50 MiB ->    13.36 MiB
[  34/ 398]                blk.2.ffn_norm.weight - [ 2560,     1,     1,     1], type =    f32, size =    0.010 MB
[  35/ 398]                  blk.2.ffn_up.weight - [ 2560,  9728,     1,     1], type =   bf16, converting to q4_K .. size =    47.50 MiB ->    13.36 MiB
ggml_validate_row_data: found 1139 NaNs in row of 2621440 BF16 values
llama_model_quantize: failed to quantize: tensor 'blk.3.attn_k.weight' has invalid data
main: failed to quantize model from '/home/oleg/PB_eng/unsloth.BF16.gguf'
Unsloth: Conversion completed! Output location: /home/oleg/PB_eng/unsloth.Q4_K_M.gguf
✅ Conversion successful! File size: 0.48GB


/home/oleg/PB_eng/config.json
{
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 2560,
  "initializer_range": 0.02,
  "intermediate_size": 9728,
  "max_position_embeddings": 40960,
  "max_window_layers": 36,
  "model_type": "qwen3",
  "num_attention_heads": 32,
  "num_hidden_layers": 36,
  "num_key_value_heads": 8,
  "pad_token_id": 151654,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.52.4",
  "unsloth_fixed": true,
  "unsloth_version": "2025.6.12",
  "use_cache": false,
  "use_sliding_window": false,
  "vocab_size": 151936
}

/home/oleg/PB_eng/generation_config.json
{
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "max_length": 40960,
  "pad_token_id": 151654,
  "temperature": 0.6,
  "top_k": 20,
  "top_p": 0.95,
  "transformers_version": "4.52.4"
}

Log:
(u_env) oleg@oleg-MS-7B86:~$  llama-cli -m /home/oleg/PB_eng/unsloth.BF16.gguf -ngl 99 -p "Hi"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
build: 1 (bb16041) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15081 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 398 tensors from /home/oleg/PB_eng/unsloth.BF16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen3
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = PB_eng
llama_model_loader: - kv   3:                         general.size_label str              = 4.0B
llama_model_loader: - kv   4:                          qwen3.block_count u32              = 36
llama_model_loader: - kv   5:                       qwen3.context_length u32              = 40960
llama_model_loader: - kv   6:                     qwen3.embedding_length u32              = 2560
llama_model_loader: - kv   7:                  qwen3.feed_forward_length u32              = 9728
llama_model_loader: - kv   8:                 qwen3.attention.head_count u32              = 32
llama_model_loader: - kv   9:              qwen3.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:                       qwen3.rope.freq_base f32              = 1000000,000000
llama_model_loader: - kv  11:     qwen3.attention.layer_norm_rms_epsilon f32              = 0,000001
llama_model_loader: - kv  12:                 qwen3.attention.key_length u32              = 128
llama_model_loader: - kv  13:               qwen3.attention.value_length u32              = 128
llama_model_loader: - kv  14:                          general.file_type u32              = 32
llama_model_loader: - kv  15:               general.quantization_version u32              = 2
llama_model_loader: - kv  16:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  17:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  18:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  19:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  20:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  21:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  22:            tokenizer.ggml.padding_token_id u32              = 151654
llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - type  f32:  145 tensors
llama_model_loader: - type bf16:  253 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = BF16
print_info: file size   = 7,49 GiB (16,00 BPW) 
load: special tokens cache size = 26
load: token to piece cache size = 0,9311 MB
print_info: arch             = qwen3
print_info: vocab_only       = 0
print_info: n_ctx_train      = 40960
print_info: n_embd           = 2560
print_info: n_layer          = 36
print_info: n_head           = 32
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 4
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0,0e+00
print_info: f_norm_rms_eps   = 1,0e-06
print_info: f_clamp_kqv      = 0,0e+00
print_info: f_max_alibi_bias = 0,0e+00
print_info: f_logit_scale    = 0,0e+00
print_info: f_attn_scale     = 0,0e+00
print_info: n_ff             = 9728
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000,0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 40960
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 4B
print_info: model params     = 4,02 B
print_info: general.name     = PB_eng
print_info: vocab type       = BPE
print_info: n_vocab          = 151936
print_info: n_merges         = 151387
print_info: BOS token        = 11 ','
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151654 '<|vision_pad|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 36 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 37/37 layers to GPU
load_tensors:        CUDA0 model buffer size =  7672,62 MiB
load_tensors:   CPU_Mapped model buffer size =   741,88 MiB
.....................................................................................
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = 0
llama_context: freq_base     = 1000000,0
llama_context: freq_scale    = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
llama_context:  CUDA_Host  output buffer size =     0,58 MiB
llama_kv_cache_unified:      CUDA0 KV buffer size =   576,00 MiB
llama_kv_cache_unified: size =  576,00 MiB (  4096 cells,  36 layers,  1 seqs), K (f16):  288,00 MiB, V (f16):  288,00 MiB
llama_context:      CUDA0 compute buffer size =   301,75 MiB
llama_context:  CUDA_Host compute buffer size =    13,01 MiB
llama_context: graph nodes  = 1446
llama_context: graph splits = 2
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 6

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

sampler seed: 3425608011
sampler params: 
	repeat_last_n = 64, repeat_penalty = 1,000, frequency_penalty = 0,000, presence_penalty = 0,000
	dry_multiplier = 0,000, dry_base = 1,750, dry_allowed_length = 2, dry_penalty_last_n = 4096
	top_k = 40, top_p = 0,950, min_p = 0,050, xtc_probability = 0,000, xtc_threshold = 0,100, typical_p = 1,000, top_n_sigma = -1,000, temp = 0,800
	mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 0

HiGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
llama_perf_sampler_print:    sampling time =      37,21 ms /   459 runs   (    0,08 ms per token, 12335,06 tokens per second)
llama_perf_context_print:        load time =    2908,46 ms
llama_perf_context_print: prompt eval time =       0,00 ms /     1 tokens (    0,00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =   15375,31 ms /   458 runs   (   33,57 ms per token,    29,79 tokens per second)
llama_perf_context_print:       total time =   15516,83 ms /   459 tokens
Interrupted by user
(u_env) oleg@oleg-MS-7B86:~$ 

Log:
(u_env) oleg@oleg-MS-7B86:~$  llama-cli -m /home/oleg/PB_eng/unsloth.Q4_K_M.gguf -ngl 99 -p "Hi"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
build: 1 (bb16041) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15069 MiB free
gguf_init_from_file_impl: invalid magic characters: 'llama_model_load: error loading model: llama_model_loader: failed to load model from /home/oleg/PB_eng/unsloth.Q4_K_M.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/oleg/PB_eng/unsloth.Q4_K_M.gguf'
main: error: unable to load model
(u_env) oleg@oleg-MS-7B86:~$ 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] [Qwen3 4-bit] GGUF Conversion in v2025.6.12 Produces Garbage Output (GGGG...) #2860

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] [Qwen3 4-bit] GGUF Conversion in v2025.6.12 Produces Garbage Output (GGGG...) #2860

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions