Skip to content

Transformer version: 5.0.0.dev0 breaks Qwen3VL Moe models loading #43299

@daulettoibazar

Description

@daulettoibazar

System Info

I am trying to run inference on Qwen3VL Moe models, both 30B-A3B and 235B-A22B; however, there is a size mismatch between the HF checkpoint and the model weights. Inference done on H100 GPU with standard inference script from Qwen repo. Error message shown below:

Note: downgrading transformer version to 4.57.0 solves the issue, but why v5.0.0 is causing this error

Qwen3VLMoeForConditionalGeneration LOAD REPORT from: Qwen/Qwen3-VL-30B-A3B-Thinking
Key                                                           | Status   |                                                                                                       
--------------------------------------------------------------+----------+-------------------------------------------------------------------------------------------------------
model.language_model.layers.{0...47}.mlp.experts.gate_up_proj | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 2048, 1536]) vs model:torch.Size([128, 1536, 2048])
model.language_model.layers.{0...47}.mlp.experts.down_proj    | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 768, 2048]) vs model:torch.Size([128, 2048, 768])  

Notes:
- MISMATCH      :ckpt weights were loaded, but they did not match the original empty weight shapes.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Install transformer version from source: pip install git+https://github.com/huggingface/transformers

Run simple inference on Qwen3VL models

Expected behavior

This will cause mismatch error between model checkpoints and model weights as shown in my report above

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions