I used this command to get the converted model:
python3 convert-gptq-to-ggml.py "path/to/llama-65b-4bit.pt" "path/to/tokenizer.model" "./models/ggml-llama-65b-q4_0.bin"
I run it with this command:
./main -m ./models/ggml-llama-65b-q4_0.bin -n 128
And this is what I get at the end of the output:
llama_model_load: loading model part 1/8 from './models/ggml-llama-65b-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
llama_init_from_file: failed to load model
main: error: failed to load model './models/ggml-llama-65b-q4_0.bin'
P. S. Yes, I'm using the latest (or at least today's) version of this repo. While I'm at it, many thanks to ggerganov and everyone else involved! Great job.