Error loading llama 65b 4bit model (HFv2) converted from .pt format

I used this command to get the converted model:

`python3 convert-gptq-to-ggml.py "path/to/llama-65b-4bit.pt" "path/to/tokenizer.model" "./models/ggml-llama-65b-q4_0.bin"`

I run it with this command:

`./main -m ./models/ggml-llama-65b-q4_0.bin -n 128`

And this is what I get at the end of the output:

```
llama_model_load: loading model part 1/8 from './models/ggml-llama-65b-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
llama_init_from_file: failed to load model
main: error: failed to load model './models/ggml-llama-65b-q4_0.bin'
```

P. S. Yes, I'm using the latest (or at least today's) version of this repo. While I'm at it, many thanks to ggerganov and everyone else involved! Great job.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error loading llama 65b 4bit model (HFv2) converted from .pt format #538

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error loading llama 65b 4bit model (HFv2) converted from .pt format #538

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions