Hi, I'm one of the maintainer working on LoRA support on llama.cpp
FYI, we already had a script convert_lora_to_gguf.py that can convert any PEFT-compatible LoRA adapter into GGUF, without merging into base model.
I would like to discuss if we can take advantage of this feature to convert fine-tuned adapter directly into GGUF. An idea could be:
# add save_method = "lora" to export just the adapter, not merging
model.save_pretrained_gguf("dir", tokenizer, save_method = "lora", quantization_method = "f16")
For demo, here is a list of GGUF LoRA adapter: https://huggingface.co/collections/ggml-org/gguf-lora-adapters-677c49455d8f7ee034dd46f1
Happy to discuss more if you find this interesting.
Thank you.
Hi, I'm one of the maintainer working on LoRA support on llama.cpp
FYI, we already had a script
convert_lora_to_gguf.pythat can convert any PEFT-compatible LoRA adapter into GGUF, without merging into base model.I would like to discuss if we can take advantage of this feature to convert fine-tuned adapter directly into GGUF. An idea could be:
For demo, here is a list of GGUF LoRA adapter: https://huggingface.co/collections/ggml-org/gguf-lora-adapters-677c49455d8f7ee034dd46f1
Happy to discuss more if you find this interesting.
Thank you.