-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fails to load 30B model after quantization #27
Copy link
Copy link
Closed
Labels
buildCompilation issuesCompilation issues
Description
Trying the 30B model on an M1 MBP, 32GB ram, ran quantification on all 4 outputs of the converstion to ggml, but can't load the model for evaluaiton:
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 6656
llama_model_load: n_mult = 256
llama_model_load: n_head = 52
llama_model_load: n_layer = 60
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 17920
llama_model_load: ggml ctx size = 20951.50 MB
llama_model_load: memory_size = 1560.00 MB, n_mem = 30720
llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './models/30B/ggml-model-q4_0.bin'
llama_model_load: %
This issue does not happen when I run the 7B model.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
buildCompilation issuesCompilation issues