Conversation
|
currently erroring out here: if I neuter diff --git a/convert.py b/convert.py
index c593abe..e7e7654 100644
--- a/convert.py
+++ b/convert.py
@@ -969,6 +969,7 @@ def bounded_parallel_map(func: Callable[[In], Out], iterable: Iterable[In], conc
def check_vocab_size(params: Params, vocab: Vocab) -> None:
+ return
if params.n_vocab != vocab.vocab_size:
# GGMLVocab comes from the same file as the model so shouldn't mismatch:
assert isinstance(vocab, SentencePieceVocab) or isinstance(vocab, XgenVocab) |
|
I've tried to use this PR to convert xgen-4k to GGML. It worked during the conversion but failed on loading it to inference. Here is the error message: |
|
@clyang that was with the patch to I wonder instead of |
I implemented this and I was able to use the model with |
|
What is needed to finalize this? |
|
this seems to work and could be merged, but i'm not sure about correctness. there may be tokens missing, or a better placeholder token value (vs also i haven't had a chance to look, but on the |
| token = self.xt._convert_id_to_token(index) | ||
| yield (token, float(index)) | ||
| for index in range(self.vocab_size_base, self.vocab_size): | ||
| yield (b'', float(index)) |
There was a problem hiding this comment.
| yield (b'', float(index)) | |
| yield (b'<|unk|>', float(index)) |
🤷
There was a problem hiding this comment.
I've tried the latest commit and this is what I got.
$ python3 convert.py models/xgen-4k-7b-orig --outtype f16
Loading model file models/xgen-4k-7b-orig/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-4k-7b-orig/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-4k-7b-orig/pytorch_model-00002-of-00003.bin
Loading model file models/xgen-4k-7b-orig/pytorch_model-00003-of-00003.bin
params: n_vocab:51200 n_embd:4096 n_mult:256 n_head:32 n_layer:32
Writing vocab...
Traceback (most recent call last):
File "/home/user/llama.cpp/convert.py", line 1255, in <module>
main()
File "/home/user/llama.cpp/convert.py", line 1250, in main
OutputFile.write_all(outfile, params, output_type, model, vocab)
File "/home/user/llama.cpp/convert.py", line 1041, in write_all
of.write_vocab(vocab)
File "/home/user/llama.cpp/convert.py", line 1022, in write_vocab
self.fout.write(text)
TypeError: a bytes-like object is required, not 'str'
There was a problem hiding this comment.
@clyang Try changing line 220, convert.py to the following:
token = self.xt.encoder.decode_single_token_bytes(index)
There was a problem hiding this comment.
I can confirm convert.py and quantize both work, thanks @smdesai !!
Btw, you still need to change EOS, BOS and NL token ID in llama.cpp to make it inference correctly.
There was a problem hiding this comment.
Can you share what you changed in llama.cpp?
There was a problem hiding this comment.
@tmm1 Here is what I modified in llama.cpp:
llama_token llama_token_bos() {
return 50256;
}
llama_token llama_token_eos() {
return 50256;
}
llama_token llama_token_nl() {
return 198;
}|
Hi, following this thread, I tried to convert to q4_0 or q4_1 but seem to be facing problems. ubuntu@ip-172-31-41-82:~/squeeze-llm/llama.cpp$ python3 convert.py models/xgen-7b-8k-base --outtype q4_1
Loading model file models/xgen-7b-8k-base/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00002-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00003-of-00003.bin
params: n_vocab:51200 n_embd:4096 n_mult:256 n_head:32 n_layer:32
Traceback (most recent call last):
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1255, in <module>
main()
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1248, in main
model = convert_to_output_type(model, output_type)
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1086, in convert_to_output_type
return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1086, in <dictcomp>
return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 575, in astype
self.validate_conversion_to(data_type)
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 586, in validate_conversion_to
raise Exception(f"Can't turn an unquantized tensor into a quantized type ({data_type})")
Exception: Can't turn an unquantized tensor into a quantized type (QuantizedDataType(groupsize=32, have_addends=True, have_g_idx=False))
|
|
Maybe we should finalize this and merge it? |
|
The code base has changed substantially since the initial PR. I can confirm with the code as of today, the Xgen model does convert and run f32, Q4_1 and Q4_0. In addition, llama.cpp also required changes to work with the Xgen model. I'm not sure the best way to proceed with these changes and for now have put those inside an ifdef/ifndef XGEN As for the best way to provide the changes, I'm not sure whether a separate PR would be better, to keep things local, here are the changes as patch files. Of course the Makefile would require a -DXGEN if compiling for the model. |
|
Obsolete PR? |
working toward converting XGen-7B to ggml