flutter_gemma is a multi-platform Flutter plugin designed to run Google's Gemma family of models and other Large Language Models (LLMs) locally on-device README.md1-14 By leveraging local execution, the plugin enables advanced AI capabilities—including multimodal vision, audio processing, and function calling—without relying on external servers, thereby enhancing user privacy and offline functionality README.md20-30
The project supports a wide range of model families beyond Gemma, such as Gemma 4, Gemma 3n, DeepSeek, Qwen, Phi, Llama, and SmolLM README.md10-12
The plugin follows the standard Flutter platform interface pattern, providing a unified Dart API that abstracts away platform-specific inference engines.
FlutterGemma class provides a high-level entry point for initialization and model management lib/flutter_gemma.dart23-25dart:ffi) CHANGELOG.md58-62dart:ffi to communicate directly with the LiteRT-LM C API, bypassing heavy overheads like the JVM on Android or separate gRPC servers on desktop CHANGELOG.md58-62The following diagram illustrates how the logical components of the plugin map to specific code entities.
Architecture Component Mapping
Sources: lib/flutter_gemma.dart1-55 CHANGELOG.md58-62 pubspec.yaml67-85 lib/mobile/flutter_gemma_mobile.dart18-21
flutter_gemma provides comprehensive support across mobile, desktop, and web. Recent updates (v0.15.x) have introduced advanced features like NPU acceleration and Speculative Decoding (Multi-Token Prediction) for Gemma 4 models CHANGELOG.md1-13
| Feature | Android | iOS | Web | Desktop (macOS/Win/Lin) |
|---|---|---|---|---|
| Text Inference | ✅ | ✅ | ✅ | ✅ |
| Multimodal (Vision) | ✅ | ✅ | ✅ | ✅ |
| Audio Input | ✅ | ✅ | ❌ | ✅ |
| Function Calling | ✅ | ✅ | ✅ | ✅ |
| Thinking Mode | ✅ | ✅ | ❌ | ✅ |
| Embeddings / RAG | ✅ | ✅ | ✅ | ✅ |
| GPU Acceleration | ✅ | ✅ (Metal) | ✅ (WebGPU) | ✅ (Vulkan/DX12/Metal) |
| NPU Support | ✅ (Qualcomm/Tensor) | ❌ | ❌ | ✅ (Intel LunarLake) |
Sources: README.md85-93 CHANGELOG.md1-14 pubspec.yaml7-13 lib/flutter_gemma_interface.dart44-47
The plugin supports two primary ways of handling models based on their file extension README.md65-85:
.task, .litertlm): The inference engine (MediaPipe or LiteRT-LM SDK) handles chat templates and tokenization internally..bin, .tflite): Requires manual formatting of chat prompts in Dart code before sending to the model.For a detailed catalog of supported model versions (e.g., Gemma 4 E2B, Qwen3) and their specific requirements, see Supported Models and Formats.
Sources: README.md65-85 CHANGELOG.md42-45
hook/build.dart CHANGELOG.md18-19 pubspec.yaml38-40The following diagram bridges the high-level features to the directory structure of the repository.
Codebase Organization
Sources: lib/flutter_gemma.dart1-55 pubspec.yaml67-85 lib/mobile/flutter_gemma_mobile.dart1-38
flutter_gemma to your project, including platform-specific setup like Podfile configurations and Android permissions.ModelType and ModelFileType enums, and which formats work best on which platforms.