Overview

Relevant source files

flutter_gemma is a multi-platform Flutter plugin designed to run Google's Gemma family of models and other Large Language Models (LLMs) locally on-device README.md1-14 By leveraging local execution, the plugin enables advanced AI capabilities—including multimodal vision, audio processing, and function calling—without relying on external servers, thereby enhancing user privacy and offline functionality README.md20-30

The project supports a wide range of model families beyond Gemma, such as Gemma 4, Gemma 3n, DeepSeek, Qwen, Phi, Llama, and SmolLM README.md10-12

High-Level Architecture

The plugin follows the standard Flutter platform interface pattern, providing a unified Dart API that abstracts away platform-specific inference engines.

Modern API Facade: The FlutterGemma class provides a high-level entry point for initialization and model management lib/flutter_gemma.dart23-25
Platform Routing: Depending on the platform and model file format, the plugin routes requests to either MediaPipe (via platform channels/WASM) or LiteRT-LM (via dart:ffi) CHANGELOG.md58-62
Native Bridge: On mobile and desktop, the plugin uses dart:ffi to communicate directly with the LiteRT-LM C API, bypassing heavy overheads like the JVM on Android or separate gRPC servers on desktop CHANGELOG.md58-62

System-to-Code Mapping

The following diagram illustrates how the logical components of the plugin map to specific code entities.

Architecture Component Mapping

Sources: lib/flutter_gemma.dart1-55 CHANGELOG.md58-62 pubspec.yaml67-85 lib/mobile/flutter_gemma_mobile.dart18-21

Supported Platforms and Capabilities

flutter_gemma provides comprehensive support across mobile, desktop, and web. Recent updates (v0.15.x) have introduced advanced features like NPU acceleration and Speculative Decoding (Multi-Token Prediction) for Gemma 4 models CHANGELOG.md1-13

Feature	Android	iOS	Web	Desktop (macOS/Win/Lin)
Text Inference	✅	✅	✅	✅
Multimodal (Vision)	✅	✅	✅	✅
Audio Input	✅	✅	❌	✅
Function Calling	✅	✅	✅	✅
Thinking Mode	✅	✅	❌	✅
Embeddings / RAG	✅	✅	✅	✅
GPU Acceleration	✅	✅ (Metal)	✅ (WebGPU)	✅ (Vulkan/DX12/Metal)
NPU Support	✅ (Qualcomm/Tensor)	❌	❌	✅ (Intel LunarLake)

Sources: README.md85-93 CHANGELOG.md1-14 pubspec.yaml7-13 lib/flutter_gemma_interface.dart44-47

Model Families and Formats

The plugin supports two primary ways of handling models based on their file extension README.md65-85:

Managed Templates (.task, .litertlm): The inference engine (MediaPipe or LiteRT-LM SDK) handles chat templates and tokenization internally.
Manual Templates (.bin, .tflite): Requires manual formatting of chat prompts in Dart code before sending to the model.

For a detailed catalog of supported model versions (e.g., Gemma 4 E2B, Qwen3) and their specific requirements, see Supported Models and Formats.

Sources: README.md65-85 CHANGELOG.md42-45

Key Capabilities

On-Device RAG: Includes a built-in vector store using SQLite and HNSW for local document retrieval pubspec.yaml32-35 lib/flutter_gemma_interface.dart76-116
Thinking Mode: Supports models like DeepSeek and Gemma 4 that provide "reasoning" or "thinking" blocks in their output README.md36 lib/flutter_gemma_interface.dart144-145
Multimodal Support: Capability to process multiple images and PCM audio (16kHz) alongside text lib/flutter_gemma.dart12-17 CHANGELOG.md4-6 lib/mobile/flutter_gemma_mobile.dart107-117
Native Assets: On desktop and mobile FFI paths, native libraries are automatically fetched and bundled at build time via hook/build.dart CHANGELOG.md18-19 pubspec.yaml38-40
Speculative Decoding: Support for Multi-Token Prediction (MTP) on Gemma 4 models to improve inference speed CHANGELOG.md11-12 lib/flutter_gemma_interface.dart44-47

The following diagram bridges the high-level features to the directory structure of the repository.

Codebase Organization

Sources: lib/flutter_gemma.dart1-55 pubspec.yaml67-85 lib/mobile/flutter_gemma_mobile.dart1-38

Overview

Relevant source files

The project supports a wide range of model families beyond Gemma, such as Gemma 4, Gemma 3n, DeepSeek, Qwen, Phi, Llama, and SmolLM README.md10-12

High-Level Architecture

The plugin follows the standard Flutter platform interface pattern, providing a unified Dart API that abstracts away platform-specific inference engines.

Modern API Facade: The FlutterGemma class provides a high-level entry point for initialization and model management lib/flutter_gemma.dart23-25
Platform Routing: Depending on the platform and model file format, the plugin routes requests to either MediaPipe (via platform channels/WASM) or LiteRT-LM (via dart:ffi) CHANGELOG.md58-62
Native Bridge: On mobile and desktop, the plugin uses dart:ffi to communicate directly with the LiteRT-LM C API, bypassing heavy overheads like the JVM on Android or separate gRPC servers on desktop CHANGELOG.md58-62

System-to-Code Mapping

The following diagram illustrates how the logical components of the plugin map to specific code entities.

Architecture Component Mapping

Sources: lib/flutter_gemma.dart1-55 CHANGELOG.md58-62 pubspec.yaml67-85 lib/mobile/flutter_gemma_mobile.dart18-21

Supported Platforms and Capabilities

Feature	Android	iOS	Web	Desktop (macOS/Win/Lin)
Text Inference	✅	✅	✅	✅
Multimodal (Vision)	✅	✅	✅	✅
Audio Input	✅	✅	❌	✅
Function Calling	✅	✅	✅	✅
Thinking Mode	✅	✅	❌	✅
Embeddings / RAG	✅	✅	✅	✅
GPU Acceleration	✅	✅ (Metal)	✅ (WebGPU)	✅ (Vulkan/DX12/Metal)
NPU Support	✅ (Qualcomm/Tensor)	❌	❌	✅ (Intel LunarLake)

Sources: README.md85-93 CHANGELOG.md1-14 pubspec.yaml7-13 lib/flutter_gemma_interface.dart44-47

Model Families and Formats

The plugin supports two primary ways of handling models based on their file extension README.md65-85:

Managed Templates (.task, .litertlm): The inference engine (MediaPipe or LiteRT-LM SDK) handles chat templates and tokenization internally.
Manual Templates (.bin, .tflite): Requires manual formatting of chat prompts in Dart code before sending to the model.

For a detailed catalog of supported model versions (e.g., Gemma 4 E2B, Qwen3) and their specific requirements, see Supported Models and Formats.

Sources: README.md65-85 CHANGELOG.md42-45

Key Capabilities

On-Device RAG: Includes a built-in vector store using SQLite and HNSW for local document retrieval pubspec.yaml32-35 lib/flutter_gemma_interface.dart76-116
Thinking Mode: Supports models like DeepSeek and Gemma 4 that provide "reasoning" or "thinking" blocks in their output README.md36 lib/flutter_gemma_interface.dart144-145
Multimodal Support: Capability to process multiple images and PCM audio (16kHz) alongside text lib/flutter_gemma.dart12-17 CHANGELOG.md4-6 lib/mobile/flutter_gemma_mobile.dart107-117
Native Assets: On desktop and mobile FFI paths, native libraries are automatically fetched and bundled at build time via hook/build.dart CHANGELOG.md18-19 pubspec.yaml38-40
Speculative Decoding: Support for Multi-Token Prediction (MTP) on Gemma 4 models to improve inference speed CHANGELOG.md11-12 lib/flutter_gemma_interface.dart44-47

The following diagram bridges the high-level features to the directory structure of the repository.

Codebase Organization

Sources: lib/flutter_gemma.dart1-55 pubspec.yaml67-85 lib/mobile/flutter_gemma_mobile.dart1-38

Overview

High-Level Architecture

System-to-Code Mapping

Supported Platforms and Capabilities

Model Families and Formats

Key Capabilities

Project Structure and Navigation

Further Reading

On this page

Overview

High-Level Architecture

System-to-Code Mapping

Supported Platforms and Capabilities

Model Families and Formats

Key Capabilities

Project Structure and Navigation

Further Reading

On this page