Fails to load 30B model after quantization

Trying the 30B model on an M1 MBP, 32GB ram, ran quantification on all 4 outputs of the converstion to ggml, but can't load the model for evaluaiton:
```llama_model_load: loading model from './models/30B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 6656
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 52
llama_model_load: n_layer = 60
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 17920
llama_model_load: ggml ctx size = 20951.50 MB
llama_model_load: memory_size =  1560.00 MB, n_mem = 30720
llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './models/30B/ggml-model-q4_0.bin'
llama_model_load: %
```



This issue does not happen when I run the 7B model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails to load 30B model after quantization #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fails to load 30B model after quantization #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions