GitHub

Cross-Platform High-Level LLM Library

LlamaLib is a high-level C++ and C# library for running Large Language Models (LLMs) anywhere - from PCs to mobile devices and VR headsets.

At a glance

✅ High-Level API
C++ and C# implementations with intuitive object-oriented design
📦 Self-Contained and Embedded
Runs embedded within your application.
No need for a separate server or external processes. Zero external dependencies.
🌍 Runs Anywhere
Cross-platform and cross-device.
Works on all major platforms:
- Desktop: Windows, macOS, Linux
- Mobile: Android, iOS
- VR/AR: Meta Quest, Apple Vision, Magic Leap
and hardware architectures:
- CPU: Intel, AMD, Apple Silicon
- GPU: NVIDIA, AMD, Metal
🔍 Architecture Detection at runtime
Automatically selects the optimal backend at runtime supporting all major GPU and CPU architectures.
🤏 Tiny footprint
Integration requires only 10-200 MB depending on the embedded architectures.
Custom implementation of tinyBLAS reduces CUDA integration from 1.3GB to 130MB (cuBLAS also supported).
🛠 Production ready
Designed for easy integration into C++ and C# applications.
Supports both local and client-server deployment.

Why choose LlamaLib?

Developer experience:
Direct implementation of LLM operations (completion, tokenisation, embeddings).
Clean implementation of LLM service and clients, server-client architecture and LLM agents.
Universal deployment:
LlamaLib is the only library that lets you build your application for any hardware.
Unlike alternatives that allow you to only build for specific GPU vendor or CPU-only execution, our architecture detection happens at runtime.
If your application is developed for GPU: the GPU backend of the user hardware will be automatically selected (Nvidia, AMD, Metal) or fallback to CPU.
CPU detection will automatically identify the CPU hardware (CPU instruction set) of the user to select the optimal backend.
LlamaLib works for all platforms from PC to mobile and VR.
🛠 Production ready
Embeds directly in your application without opening ports or starting external servers.
LlamaLib has minimal disk space requirements allowing compact buids e.g. for mobile deployment.

How to help

⭐ Star the repo and spread the word!
developent!
Join our Discord community.
Contribute with feature requests, bug reports, or pull requests.

Projects using LlamaLib

LLM for Unity: The most widely used solution to integate LLMs in games

API

LlamaLib can be used with just a few lines of code.

The main classes are:

LLMService: Implementation of the LLM service. For desktop environments it implements runtime detection with multiple GPU and CPU backends.
LLMClient: Implementation of local or remote clients
LLMAgent: High-level conversational AI with persistent chat history

Core functionality:

LLM core methods: completion, embeddings, tokenization, LORAs
Agent functionality: chat template formatting, chat history
Server-client functionality: start/stop server, connect to local/remote server, SSL and authentication support

The methods API can be found here:

and basic examples here:

More detailed documentation on function level can be found on the docs:

Quick Start

(C++)

#include "LlamaLib.h"

// LlamaLib automatically detects your hardware and selects optimal backend
LLMService llm("path/to/model.gguf");
/* You can also specify:
  threads=-1,        // number of CPU threads to use
  gpu_layers=0,      // number of layers to offload to GPU (if 0, GPU is not used),
  num_slots=1     // number of slots / clients supported in parallel
*/

// Start service
llm.start();

// Generate completion
std::string response = llm.completion("Hello, how are you?");
std::cout << response << std::endl;

C#

using LlamaLib;

// Same API, different language
LLMService llm = new LLMService("path/to/model.gguf");
/* You can also specify:
  threads=-1,        // number of CPU threads to use
  gpu_layers=0,      // number of layers to offload to GPU (if 0, GPU is not used),
  num_slots=1     // number of slots / clients supported in parallel
*/
llm.Start();

string response = llm.Completion("Hello, how are you?");
Console.WriteLine(response);

Name		Name	Last commit message	Last commit date
Latest commit History 1,196 Commits
.github		.github
cmake		cmake
csharp		csharp
examples		examples
include		include
patches		patches
src		src
tests		tests
third_party		third_party
tools		tools
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
cpp_api.md		cpp_api.md
csharp_api.md		csharp_api.md
llm_api_docs.md		llm_api_docs.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cross-Platform High-Level LLM Library

At a glance

Why choose LlamaLib?

How to help

Projects using LlamaLib

API

Quick Start

(C++)

C#

About

Uh oh!

Releases 31

Packages

Uh oh!

Contributors 3

Languages

License

undreamai/LlamaLib

Folders and files

Latest commit

History

Repository files navigation

Cross-Platform High-Level LLM Library

At a glance

Why choose LlamaLib?

How to help

Projects using LlamaLib

API

Quick Start

(C++)

C#

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 31

Packages 0

Uh oh!

Contributors 3

Languages

Packages