Llama.cpp

Llama.cpp és una biblioteca dissenyada per proporcionar una implementació eficient del codi necessari per realitzar inferències amb models de llenguatge extensos (LLMs) de diverses arquitectures tant en GPUs com en CPUs. Llama.cpp és multiplataforma, de codi obert, està escrita principalment en C++ i es distribueix sota llicència MIT.^[1]^[2]^[3]^[4]

Tot i que originalment es va inspirar en el controlador LLaMA de Meta basat en Python, Llama.cpp pot emprar-se amb una àmplia varietat de models. El desenvolupador principal és Georgi Gerganov.^[3]

Llama.cpp destaca per l'eficiència i per la capacitat d'executar models localment en maquinari d'usuari final. Ha estat emprat en entorns de treball com Ollama, Jan, o LM Studio.^[1] S'ha centrat en la compressió i la portabilitat i una de les funcions més importants és la capacitat per a la quantificació, procés de compressió dels models d'aprenentatge automàtic.^[2]

Referències

1 2 «Tinker with LLMs in the privacy of your own home using Llama.cpp». The Register, 24-08-2025. [Consulta: 25 octubre 2025].
1 2 «Honey, I shrunk the LLM! A beginner's guide to quantization – and testing it». The Register, 14-07-2024. [Consulta: 20 octubre 2025].
1 2 «How this open source LLM chatbot runner hit the gas on x86, Arm CPUs». The Register, 03-04-2024. [Consulta: 25 octubre 2025].
↑ Alden, Daroc «Portable LLMs with llamafile». LWN.net, 14-05-2024.

[:0-1] 1 2 «Tinker with LLMs in the privacy of your own home using Llama.cpp». The Register, 24-08-2025. [Consulta: 25 octubre 2025].

[:1-2] 1 2 «Honey, I shrunk the LLM! A beginner's guide to quantization – and testing it». The Register, 14-07-2024. [Consulta: 20 octubre 2025].

[:2-3] 1 2 «How this open source LLM chatbot runner hit the gas on x86, Arm CPUs». The Register, 03-04-2024. [Consulta: 25 octubre 2025].

[4] Alden, Daroc «Portable LLMs with llamafile». LWN.net, 14-05-2024.

[1]

[2]

[3]

[4]