[Feature Request] Support DevStral Small 2 on Transformers v5

1. Did you update? `pip install --upgrade unsloth unsloth_zoo` 
Yes

2. `Colab` or `Kaggle` or local / cloud
Local

3. Number GPUs used, use `nvidia-smi`
1 GPU (DGX Spark)

4. Which notebook? Please link!
Personal notebook with custom dataset (file: [unsloth_qlora.ipynb](https://github.com/user-attachments/files/26462496/unsloth_qlora.ipynb))

5. Which Unsloth version, TRL version, transformers version, PyTorch version?
`unsloth==2026.4.1`, `unsloth_zoo==2026.4.2`, `trl==0.24.0`, `transformers==5.5.0`, `torch==2.10.0a0+b558c986e8.nv25.11`

6. Which trainer? `SFTTrainer`, `GRPOTrainer` etc
Saved fine-tuned (with `SFTTrainer`) Devstral Small 2 model warnings when loaded and generates gibberish when doing inference with model.generate(...) (which generated great answers after training). Not sure if this is an issue from Unsloth or Transformers.


```python
# Fine-tuned model saving
model.save_pretrained_merged(r"../finetuned_models/devstral-sft", tokenizer, save_method = "merged_16bit",)
```

```python
# Fine-tuned model loading
model, tokenizer = FastVisionModel.from_pretrained(
    model_name = r"../finetuned_models/devstral-sft",
    max_seq_length = 32768, # Choose any for long context!
    load_in_4bit = False,  # 4 bit quantization to reduce memory
    load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
)
```

Loading the fine-tuned model outputs the following : 

```bash
Loading weights: 100%|██████████| 585/585 [00:24<00:00, 23.74it/s]
[1mMistral3ForConditionalGeneration LOAD REPORT[0m from: ../finetuned_models/devstral-sft
Key                                                              | Status     |  | 
-----------------------------------------------------------------+------------+--+-
language_model.layers.{0...39}.mlp.up_proj.activation_scale      | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.o_proj.weight_scale_inv | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.v_proj.activation_scale | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.k_proj.weight_scale_inv | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.o_proj.activation_scale | UNEXPECTED |  | 
language_model.layers.{0...39}.mlp.down_proj.activation_scale    | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.q_proj.activation_scale | UNEXPECTED |  | 
language_model.layers.{0...39}.mlp.down_proj.weight_scale_inv    | UNEXPECTED |  | 
language_model.layers.{0...39}.mlp.gate_proj.weight_scale_inv    | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.k_proj.activation_scale | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.v_proj.weight_scale_inv | UNEXPECTED |  | 
language_model.layers.{0...39}.mlp.gate_proj.activation_scale    | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.q_proj.weight_scale_inv | UNEXPECTED |  | 
language_model.layers.{0...39}.mlp.up_proj.weight_scale_inv      | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.
The tokenizer you are loading from '../finetuned_models/devstral-sft' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Support DevStral Small 2 on Transformers v5 #4832

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature Request] Support DevStral Small 2 on Transformers v5 #4832

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions