🐛 Bug Description
When using model.save_pretrained_torchao(), the function incorrectly uses AutoModel instead of AutoModelForCausalLM to reload the 16-bit model.
This causes the saved config.json in the final -torchao directory to have the base model architecture (e.g., Qwen3Model) instead of the language modeling head architecture (e.g., Qwen3ModelForCausalLM).
reproducing the bug
You can see this in the unsloth/save.py file, inside the unsloth_save_pretrained_torchao function.
The problematic lines are:
On line 2772:
from transformers import AutoModel, AutoTokenizer, TorchAoConfig
And around line 2791:
model = AutoModel.from_pretrained(...)
✅ The Fix
This bug is fixed by changing the function to use AutoModelForCausalLM:
-
Change the import to:
from transformers import AutoModelForCausalLM, AutoTokenizer, TorchAoConfig
-
Change the model loading line to:
model = AutoModelForCausalLM.from_pretrained(...)
🐛 Bug Description
When using
model.save_pretrained_torchao(), the function incorrectly usesAutoModelinstead ofAutoModelForCausalLMto reload the 16-bit model.This causes the saved
config.jsonin the final-torchaodirectory to have the base model architecture (e.g.,Qwen3Model) instead of the language modeling head architecture (e.g.,Qwen3ModelForCausalLM).reproducing the bug
You can see this in the
unsloth/save.pyfile, inside theunsloth_save_pretrained_torchaofunction.The problematic lines are:
On line 2772:
from transformers import AutoModel, AutoTokenizer, TorchAoConfigAnd around line 2791:
model = AutoModel.from_pretrained(...)✅ The Fix
This bug is fixed by changing the function to use
AutoModelForCausalLM:Change the import to:
from transformers import AutoModelForCausalLM, AutoTokenizer, TorchAoConfigChange the model loading line to:
model = AutoModelForCausalLM.from_pretrained(...)