convert : update Falcon script for new HF config#3448
convert : update Falcon script for new HF config#3448cebtenzzre merged 2 commits intoggml-org:masterfrom
Conversation
|
is this a continuation of #3049 ? :) |
|
Yea, but your changes are a subset of the actual changes done for 180B 2c2
< # HF falcon--> gguf conversion
---
> # HF falcon180B--> gguf conversion
16a17
> from safetensors import safe_open
48c49
< if filename.startswith("pytorch_model-"):
---
> if filename.startswith("model-00"):
90c91
< if hparams["architectures"][0] != "RWForCausalLM":
---
> if hparams["architectures"][0] != "FalconForCausalLM":
103c104
< block_count = hparams["n_layer"]
---
> block_count = hparams["num_hidden_layers"]
111,113c112,114
< gguf_writer.add_head_count(hparams["n_head"])
< if "n_head_kv" in hparams:
< gguf_writer.add_head_count_kv(hparams["n_head_kv"])
---
> gguf_writer.add_head_count(hparams["num_attention_heads"])
> if "num_kv_heads" in hparams:
> gguf_writer.add_head_count_kv(hparams["num_kv_heads"])
181,182c182,183
< n_head = hparams["n_head"]
< n_head_kv = hparams["n_head_kv"] if "n_head_kv" in hparams else 1
---
> n_head = hparams["num_attention_heads"]
> n_head_kv = hparams["num_kv_heads"] if "num_kv_heads" in hparams else 1
193c194
< f"pytorch_model-{n:05}-of-{num_parts:05}.bin" for n in range(1, num_parts + 1)
---
> f"model-{n:05}-of-{num_parts:05}.safetensors" for n in range(1, num_parts + 1)
200c201
< model_part = torch.load(dir_model / part_name, map_location="cpu")
---
> with safe_open(dir_model / part_name, framework="pt", device="cpu") as model_part:
203c204
< data = model_part[name]
---
> data = model_part.get_tensor(name)(i compared an older version of convert-falcon, bc there have been some small changes on master since.) |
Okay, so what do you suggest I do in order to get Falcon <180B working on master again? |
The script in this PR does not know how to read safetensors files, so I think we still need that PR. |
|
By no means, why not merge #3049 first, abandon it then, finally merge this. No one will lose the contribution records, and the falcon inference keep working. |
31c88e3 to
94dd85c
Compare
|
Could someone test this script with Falcon-180B? |
|
Testing atm - will take some time. |
It works with this repo: https://huggingface.co/tiiuae/falcon-180B/tree/main Which is strange, since this looks like safetensors data, which we don't expect to be able to read with this script. |
I recently added safetensors support to this PR so it can support 180B, which is why I asked. |
Green-Sky
left a comment
There was a problem hiding this comment.
i dont have the means to test the 180b, but looks like you merged it nicely.
…example * 'master' of github.com:ggerganov/llama.cpp: kv cache slot search improvements (ggml-org#3493) prompts : fix editorconfig checks after ggml-org#3416 parallel : add option to load external prompt file (ggml-org#3416) server : reuse llama_sample_token common util (ggml-org#3494) llama : correct hparams comparison (ggml-org#3446) ci : fix xcodebuild destinations (ggml-org#3491) convert : update Falcon script for new HF config (ggml-org#3448) build : use std::make_tuple() for compatibility with older GCC versions (ggml-org#3488) common : process escape sequences in reverse prompts (ggml-org#3461) CLBlast: Fix handling of on-device tensor data server : fix incorrect num_tokens_predicted (ggml-org#3480) swift : disable ACCELERATE_NEW_LAPACK (ggml-org#3481) ci : add swift build via xcodebuild (ggml-org#3482)
Also adds Falcon-180B support. Closes ggml-org#3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>
|
This change breaks the old architecture (RWForCausalLM). Kind of annoying because there was another change that requires reconverting the original HuggingFace model: |
The official falcon-7b, falcon-40b, and falcon-180B models on HF are compatible with the new version of this script. What is your use case? |
|
I have this model downloaded from Hugging Face: Is this the official version of Falcon? |
The latest commit is 898df13 "Move to in-library checkpoint (for real this time)". That's the one you want. I suppose I could add back support for the older format until the finetunes are updated. |
|
Oh, I see; they updated the config.json file but the actual multi-GB model files did not change. I think this might not be clear to some people, who might end up re-downloading the whole model again, which is not a good user experience. Also, some fine-tunes might not ever be updated. In my opinion, it would be preferable to support both formats, but I will let you decide how to prioritize this. Thank you for looking into this issue. |
This aligns the Falcon convert script with the recent changes to the HuggingFace models.