Skip to content

[Feature Request] Support DevStral Small 2 on Transformers v5 #4832

@thomasmaindron

Description

@thomasmaindron
  1. Did you update? pip install --upgrade unsloth unsloth_zoo
    Yes

  2. Colab or Kaggle or local / cloud
    Local

  3. Number GPUs used, use nvidia-smi
    1 GPU (DGX Spark)

  4. Which notebook? Please link!
    Personal notebook with custom dataset (file: unsloth_qlora.ipynb)

  5. Which Unsloth version, TRL version, transformers version, PyTorch version?
    unsloth==2026.4.1, unsloth_zoo==2026.4.2, trl==0.24.0, transformers==5.5.0, torch==2.10.0a0+b558c986e8.nv25.11

  6. Which trainer? SFTTrainer, GRPOTrainer etc
    Saved fine-tuned (with SFTTrainer) Devstral Small 2 model warnings when loaded and generates gibberish when doing inference with model.generate(...) (which generated great answers after training). Not sure if this is an issue from Unsloth or Transformers.

# Fine-tuned model saving
model.save_pretrained_merged(r"../finetuned_models/devstral-sft", tokenizer, save_method = "merged_16bit",)
# Fine-tuned model loading
model, tokenizer = FastVisionModel.from_pretrained(
    model_name = r"../finetuned_models/devstral-sft",
    max_seq_length = 32768, # Choose any for long context!
    load_in_4bit = False,  # 4 bit quantization to reduce memory
    load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
)

Loading the fine-tuned model outputs the following :

Loading weights: 100%|██████████| 585/585 [00:24<00:00, 23.74it/s]
�[1mMistral3ForConditionalGeneration LOAD REPORT�[0m from: ../finetuned_models/devstral-sft
Key                                                              | Status     |  | 
-----------------------------------------------------------------+------------+--+-
language_model.layers.{0...39}.mlp.up_proj.activation_scale      | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.o_proj.weight_scale_inv | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.v_proj.activation_scale | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.k_proj.weight_scale_inv | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.o_proj.activation_scale | UNEXPECTED |  | 
language_model.layers.{0...39}.mlp.down_proj.activation_scale    | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.q_proj.activation_scale | UNEXPECTED |  | 
language_model.layers.{0...39}.mlp.down_proj.weight_scale_inv    | UNEXPECTED |  | 
language_model.layers.{0...39}.mlp.gate_proj.weight_scale_inv    | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.k_proj.activation_scale | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.v_proj.weight_scale_inv | UNEXPECTED |  | 
language_model.layers.{0...39}.mlp.gate_proj.activation_scale    | UNEXPECTED |  | 
language_model.layers.{0...39}.self_attn.q_proj.weight_scale_inv | UNEXPECTED |  | 
language_model.layers.{0...39}.mlp.up_proj.weight_scale_inv      | UNEXPECTED |  | 

Notes:
- UNEXPECTED:	can be ignored when loading from different task/architecture; not ok if you expect identical arch.
The tokenizer you are loading from '../finetuned_models/devstral-sft' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestFeature request pending on roadmap

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions