Description
- Did you update?
pip install --upgrade unsloth unsloth_zoo
Yes.
Colab or Kaggle or local / cloud
Local
- Number GPUs used, use
nvidia-smi
Wed May 13 17:22:55 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.142 Driver Version: 580.142 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GB10 On | 0000000F:01:00.0 On | N/A |
| N/A 40C P8 4W / N/A | Not Supported | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3585 G /usr/lib/xorg/Xorg 103MiB |
| 0 N/A N/A 3726 G /usr/bin/gnome-shell 127MiB |
| 0 N/A N/A 4585 G /usr/share/code/code 99MiB |
+-----------------------------------------------------------------------------------------+
- Which Unsloth version, TRL version, transformers version, PyTorch version?
- Unsloth version: 2026.4.4 (Unsloth Zoo version: 2026.4.6)
- TRL version: 0.24.0
- Transformers version: 5.5.0
- PyTorch version: 2.10.0a0+b558c986e8.nv25.11
- Which trainer?
SFTTrainer, GRPOTrainer etc
SFTTrainer
After fine-tuning Qwen3.5 with Unsloth and saving adapters via trainer.save_model, loading the adapters separately through FastLanguageModel works perfectly. However, when using save_pretrained_merged to merge the LoRA adapters into the base model and then reloading it, the output is significantly worse/garbage compared to the unmerged adapter loading path.
Steps to reproduce
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="path/to/model",
max_seq_length=8192,
load_in_4bit=False,
load_in_16bit=True
)
model = FastLanguageModel.get_peft_model(
model,
r=32,
lora_alpha=64,
lora_dropout=0.05,
bias="none",
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
use_gradient_checkpointing="unsloth",
random_state=42,
)
trainer = SFTTrainer(...)
trainer.train()
### Save adapters
trainer.save_model("path/to/adapters")
### Save merged model
model.save_pretrained_merged("path/to/merged/model", tokenizer, save_method="merged_16bit")
# Loading adapters produces good output
model, tokenizer = FastLanguageModel.from_pretrained(
"path/to/adapters",
max_seq_length=8192,
load_in_4bit=False,
load_in_16bit=True
)
# Loading merged model produces garbage/significantly worse output
model, tokenizer = FastLanguageModel.from_pretrained(
"path/to/merged/model",
max_seq_length=8192,
load_in_4bit=False,
load_in_16bit=True
)
Description
pip install --upgrade unsloth unsloth_zooYes.
ColaborKaggleor local / cloudLocal
nvidia-smiSFTTrainer,GRPOTraineretcSFTTrainerAfter fine-tuning Qwen3.5 with Unsloth and saving adapters via
trainer.save_model, loading the adapters separately throughFastLanguageModelworks perfectly. However, when usingsave_pretrained_mergedto merge the LoRA adapters into the base model and then reloading it, the output is significantly worse/garbage compared to the unmerged adapter loading path.Steps to reproduce