Production-ready, open-source inference framework designed to deliver high-performance, cross-platform LLM deployments on edge devices

Why LiteRT-LM?

Deploy LLMs across Android, iOS, Web, and Desktop.
Maximize performance with GPU and NPU acceleration.
Support for popular LLMs as well as multi-modality (Vision, Audio) and Tool Use.
Run the latest open models optimized for the edge, including Gemma-3n, Gemma-3, FunctionGemma, TranslateGemma, Qwen3, Phi-4, and more.

Start building

Native Android apps and JVM-based desktop tools.
Native iOS and macOS integration with specialized Metal support (Swift APIs coming soon).
Run directly in the browser with WebAssembly and WebGPU (JS APIs coming soon).

Join the Community

Contribute to the source code, report issues, and see examples.
Download pre-converted models and join the discussion.