Transformer version: 5.0.0.dev0 breaks Qwen3VL Moe models loading

### System Info

I am trying to run inference on Qwen3VL Moe models, both 30B-A3B and 235B-A22B; however, there is a size mismatch between the HF checkpoint and the model weights. Inference done on H100 GPU with standard inference script from Qwen repo. Error message shown below:


Note: downgrading transformer version to 4.57.0 solves the issue, but why v5.0.0 is causing this error


```code
Qwen3VLMoeForConditionalGeneration LOAD REPORT from: Qwen/Qwen3-VL-30B-A3B-Thinking
Key                                                           | Status   |                                                                                                       
--------------------------------------------------------------+----------+-------------------------------------------------------------------------------------------------------
model.language_model.layers.{0...47}.mlp.experts.gate_up_proj | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 2048, 1536]) vs model:torch.Size([128, 1536, 2048])
model.language_model.layers.{0...47}.mlp.experts.down_proj    | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 768, 2048]) vs model:torch.Size([128, 2048, 768])  

Notes:
- MISMATCH      :ckpt weights were loaded, but they did not match the original empty weight shapes.
```

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Install transformer version from source: pip install git+https://github.com/huggingface/transformers

Run simple inference on Qwen3VL models

### Expected behavior

This will cause mismatch error between model checkpoints and model weights as shown in my report above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer version: 5.0.0.dev0 breaks Qwen3VL Moe models loading #43299

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transformer version: 5.0.0.dev0 breaks Qwen3VL Moe models loading #43299

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions