-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Closed
Labels
Description
System Info
I am trying to run inference on Qwen3VL Moe models, both 30B-A3B and 235B-A22B; however, there is a size mismatch between the HF checkpoint and the model weights. Inference done on H100 GPU with standard inference script from Qwen repo. Error message shown below:
Note: downgrading transformer version to 4.57.0 solves the issue, but why v5.0.0 is causing this error
Qwen3VLMoeForConditionalGeneration LOAD REPORT from: Qwen/Qwen3-VL-30B-A3B-Thinking
Key | Status |
--------------------------------------------------------------+----------+-------------------------------------------------------------------------------------------------------
model.language_model.layers.{0...47}.mlp.experts.gate_up_proj | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 2048, 1536]) vs model:torch.Size([128, 1536, 2048])
model.language_model.layers.{0...47}.mlp.experts.down_proj | MISMATCH | Reinit due to size mismatch ckpt: torch.Size([128, 768, 2048]) vs model:torch.Size([128, 2048, 768])
Notes:
- MISMATCH :ckpt weights were loaded, but they did not match the original empty weight shapes.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Install transformer version from source: pip install git+https://github.com/huggingface/transformers
Run simple inference on Qwen3VL models
Expected behavior
This will cause mismatch error between model checkpoints and model weights as shown in my report above
Reactions are currently unavailable