Error when QWEN1.5 32B/9B de-quantizing GGUF

### System Info

- `transformers` version: 4.45.0.dev0
- Platform: Linux-5.15.0-113-generic-x86_64-with-glibc2.35
- Python version: 3.11.9
- Huggingface_hub version: 0.24.5
- Safetensors version: 0.4.3
- Accelerate version: 0.33.0
- Accelerate config:    not found
- PyTorch version (GPU?): 2.3.0+cpu (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: (No)

### Who can help?

@ArthurZucker @Isotr0py

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

run my script to save gguf model to pytorch model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Qwen/Qwen1.5-32B-Chat-GGUF"
filename = "qwen1_5-32b-chat-q4_k_m.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

print(model)

tokenizer.save_pretrained('/home/u1033079/LLM')
model.save_pretrained('/home/u1033079/LLM')
```

### Expected behavior

```
Converting and de-quantizing GGUF tensors...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 771/771 [08:00<00:00,  1.60it/s]
Traceback (most recent call last):
  File "/home/u1033079/LLM/run.py", line 16, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3942, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4339, in _load_pretrained_model
    error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/transformers/modeling_utils.py", line 937, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/u1033079/miniconda3/envs/LLM/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 373, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([152064, 5120]) in "weight" (which has shape torch.Size([151936, 5120])), this looks incorrect.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when QWEN1.5 32B/9B de-quantizing GGUF #32526

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error when QWEN1.5 32B/9B de-quantizing GGUF #32526

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions