Skip to content

Error loading llama 65b 4bit model (HFv2) converted from .pt format #538

@ch3rn0v

Description

@ch3rn0v

I used this command to get the converted model:

python3 convert-gptq-to-ggml.py "path/to/llama-65b-4bit.pt" "path/to/tokenizer.model" "./models/ggml-llama-65b-q4_0.bin"

I run it with this command:

./main -m ./models/ggml-llama-65b-q4_0.bin -n 128

And this is what I get at the end of the output:

llama_model_load: loading model part 1/8 from './models/ggml-llama-65b-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
llama_init_from_file: failed to load model
main: error: failed to load model './models/ggml-llama-65b-q4_0.bin'

P. S. Yes, I'm using the latest (or at least today's) version of this repo. While I'm at it, many thanks to ggerganov and everyone else involved! Great job.

Metadata

Metadata

Assignees

No one assigned

    Labels

    invalidThis doesn't seem right

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions