Granite

Open. Performant. Trusted. Apache 2.0 licensed. Cryptographically signed1. ISO certified2.

Illustration of layered geometric shapes in a gradient of colors
IBM Granite 4.1 is powering secure, on-prem AI deployment
Lightweight, performant models, released under an Apache 2.0 license, designed for scalable, enterprise workloads
Learn about Granite 4.1

Why build with Granite?

Build and scale AI faster with customizable, open-source models optimized for enterprise workloads, cost efficiency, and flexible deployments.

Open
Open source under Apache 2.0, Granite ensures transparency, while enabling full customizability and deployment flexibility across any infrastructure.
Download models
Performant
The small, high-performing models are designed to maximize efficiency and scalability for essential enterprise tasks
Review benchmarks
Trusted
Eliminate the risk of “black box” AI with transparency into training data and processes, harm detection capabilities and built-in guardrails.
Learn more
Introducing Granite 4.1
Granite 4.1 language models

Our most performant dense, non-thinking models yet. Competitive with larger, thinking models across a range of enterprise tasks—at a fraction of the cost. 

Download language models
Granite 4.1 speech models

Small yet powerful. Industry-leading transcription accuracy across accents, domains and noisy environments. 

Download speech models
Granite 4.1 vision models

Understand documents, charts and images with enterprise-grade precision.

Download vision models
Granite 4.1 guardian models

Guardrails to detect malicious content and harmful outputs. Built for enterprise compliance.

Download guardian models
Granite embedding models

Accurate semantic representations for retrieval, search and classification.

Download embedding models

Explore the benchmarks

These models were evaluated against a large collection of datasets and metrics to cover different aspects of text generation. See additional benchmarks in the Granite technical blog.​

 

Benchmark​Metricgranite-4.1-3b​granite-4.1-8b​granite-4.1-30b​
MMLU​5-shot​67.02​73.84​80.16​
IFEval Avg​ 82.3​87.06​89.65​
ArenaHard​ 37.8​68.98​71.02​
GSM8K​8-shot​86.88​92.49​94.16​
HumanEval​pass@1​79.27​87.2​89.63​
BFCL v3​ 60.8​68.27​73.68​
MMMLU​5-shot​57.61​64.84​73.71​
AttaQ​ 81.88​81.19​85.76​

Access and build

Hugging Face

Go to Hugging Face
Ollama

Go to Ollama
LM Studio

Go to LM Studio
watsonx.ai

Go to watsonx
OpenRouter

Go to OpenRouter
Replicate

Go to Replicate
Weights & Biases

Go to Weights & Biases
Unsloth

Go to Unsloth
AnythingLLM

Go to AnythingLLM

Performance and efficiency

Granite 4.1 delivers competitive instruction‑following and tool‑calling performance without relying on long chains of thought, offering predictable latency, stable token usage and lower operational cost. This makes it a strong, production‑ready choice for enterprise workloads where efficiency and reliability matter most.

Horizontal bar chart titled “Granite 4.1 language models offer superior tool calling capabilities,” based on BFCL V3 benchmark scores (higher is better). Granite-4.1-30B ranks highest at 73.7, followed by Gemma-4-31B-it at 72.7, and Granite-4.1-8B at 68.3. Remaining models score between about 61.7 and 67.8, including Gemma-4-26B-A4B-it (67.8), Qwen3-30B-A3B-Instruct-2507 (65.1), Granite-4.0-H-Small (64.7), Qwen3.5-35B-A3B (64.2), Gemma-4-E4B-it (63.2), Qwen3-4B-Instruct-2507 (61.9), and Qwen3.5-9B (61.7). Granite 4.1 models are highlighted in blue and outperform others.

Granite 4.1 language models understand and execute tool-based instructions, enabling seamless integration with various software tools and APIs. This capability allows enterprises to create powerful AI-driven workflows while automating complex tasks.

Run the Granite collection of models locally 

Granite Language Models

Core language models with reasoning, optimized for RAG and agentic workflows.

Granite Vision

Efficient vision-language models for document and image understanding, enabling OCR, chart analysis, and enterprise content extraction.

Granite Speech

Lightweight speech-language models for transcription and translation across 7 languages, delivering strong accuracy and efficiency.

Granite Guardian

Guardrail models detecting hallucinations, bias, harmful content, and jailbreaks, ensuring safe enterprise AI deployment across workflows.

Granite Libraries

Ultra-compact vision-language model converting documents into structured, machine-readable formats while preserving layout, tables, and equations.

Granite Embedding

Models generating high-quality text embeddings for semantic search, RAG, and contextual multi-turn information retrieval.

Granite Docling

NASA-IBM models for Earth observation, predicting biomass, climate, land temperature, and floods from large-scale satellite data.

Granite Time Series

Lightweight pre-trained models for fast, accurate time-series forecasting, optimized for efficient deployment across hardware environments.

Horizontal bar chart titled “Granite 4.1 language models offer competitive instruction following capabilities,” based on IFEval results. Gemma-4-31B-it ranks highest at 94.1, followed by Gemma-4-26B-A4B-it at 91.3. Granite-4.1-30B scores 89.7, performing slightly above Qwen3.5-35B-A3B at 89.1 and ahead of several models clustered between about 85 and 88, including Gemma-4-E4B-it (87.8), Granite-4.0-H-Small (87.5), Qwen3.5-9B (87.2), and Granite-4.1-8B (87.1). Lower scores include Granite-4.1-3B at 82.1 and Qwen3.5-2B at 70.6. Granite 4.1 models are highlighted in blue, showing competitive but not top performance compared to Gemma models.

Granite 4.1 language models comprehend and adhere to user instructions, ensuring reliable and accurate task completion. This capability is particularly valuable for enterprises looking to automate processes and provide consistent, high-quality results.

Table comparing model performance across evaluation datasets, with columns for Granite-Guardian-4.1-8B, OffsetBias-8B, Skywork-Reward-8B, Skywork-Reward-27B, SFR-Judge-70B, and an Oracle baseline. Granite-Guardian-4.1-8B (highlighted) achieves strong results across all datasets: GSM8k (93.71), MATH (50.79), HumanEval+ (80.08), MBPP+ (70.63), BigCodeBench (43.70), and IFEval (82.81), with an overall score of 70.29. It slightly outperforms other models in most categories, while Oracle scores remain highest overall, including 97.46 on GSM8k and 81.54 overall.

Granite Guardian 4.1 detects key risk dimensions catalogued in the IBM AI Risk Atlas. Trained on unique data comprising human annotations and synthetic data from internal red-teaming, Guardian outperforms similar models on standard benchmarks (including, but not limited to, jailbreak attempts, profanity and hallucinations related to tool calls and retrieval augmented generation in agent-based systems).

Grouped bar chart titled “Granite Speech 4.1 outperforms peers in transcription accuracy,” showing English ASR word error rates across nine datasets (lower is better): GigaSpeech, LScln, LSoth, SPGI, AMI_IHM, AMI_SDM, VoxPopuli, TED-LIUM, and Earnings-22. Multiple models are compared, including Whisper-large-v3, Gemini 2.0 Flash, phi-4-mm, Qwen ASR, Canary, and Granite Speech variants (lighter blue). Granite Speech models consistently achieve among the lowest error rates across most datasets. Error rates range from about 1–2 on LScln, 3–5 on LSoth and SPGI, around 9–16 on AMI_IHM, and the highest on AMI_SDM (roughly 22–41). The chart highlights Granite Speech 4.1 as delivering the best overall transcription accuracy relative to competing models.
Granite Speech 4.1 delivers highly accurate, enterprise-ready speech recognition across diverse real-world audio environments, achieving low word error rates on benchmarks spanning conversational speech, meetings, presentations and earnings calls.
Horizontal bar chart titled “Granite Vision 4.1 tops Claude Opus 4.6 in table extraction,” showing average scores across seven extraction benchmarks (higher is better). Granite-Vision-4.1-4B ranks highest with a score of 86.5, followed by Claude-Opus-4.6 at 83.8. Other models score lower: Gemma4-E4B (72.4), Qwen3.5-4B (71.7), Ministral-3-8B (68.2), and InternVL3.5-4B (66.4). Granite Vision is highlighted in blue, Claude in purple, and remaining models in gray, emphasizing Granite Vision as the top performer.

Granite Vision 4.1 delivers industry-leading performance in extracting structured information from visual content, achieving the highest average score across seven benchmarks spanning chart extraction, table extraction and key-value pair (KVP) extraction.

Horizontal bar chart titled “Granite 4.1 language models offer superior tool calling capabilities,” based on BFCL V3 benchmark scores (higher is better). Granite-4.1-30B ranks highest at 73.7, followed by Gemma-4-31B-it at 72.7, and Granite-4.1-8B at 68.3. Remaining models score between about 61.7 and 67.8, including Gemma-4-26B-A4B-it (67.8), Qwen3-30B-A3B-Instruct-2507 (65.1), Granite-4.0-H-Small (64.7), Qwen3.5-35B-A3B (64.2), Gemma-4-E4B-it (63.2), Qwen3-4B-Instruct-2507 (61.9), and Qwen3.5-9B (61.7). Granite 4.1 models are highlighted in blue and outperform others.

Granite 4.1 language models understand and execute tool-based instructions, enabling seamless integration with various software tools and APIs. This capability allows enterprises to create powerful AI-driven workflows while automating complex tasks.

Run the Granite collection of models locally 

Granite Language Models

Core language models with reasoning, optimized for RAG and agentic workflows.

Granite Vision

Efficient vision-language models for document and image understanding, enabling OCR, chart analysis, and enterprise content extraction.

Granite Speech

Lightweight speech-language models for transcription and translation across 7 languages, delivering strong accuracy and efficiency.

Granite Guardian

Guardrail models detecting hallucinations, bias, harmful content, and jailbreaks, ensuring safe enterprise AI deployment across workflows.

Granite Libraries

Ultra-compact vision-language model converting documents into structured, machine-readable formats while preserving layout, tables, and equations.

Granite Embedding

Models generating high-quality text embeddings for semantic search, RAG, and contextual multi-turn information retrieval.

Granite Docling

NASA-IBM models for Earth observation, predicting biomass, climate, land temperature, and floods from large-scale satellite data.

Granite Time Series

Lightweight pre-trained models for fast, accurate time-series forecasting, optimized for efficient deployment across hardware environments.

Horizontal bar chart titled “Granite 4.1 language models offer competitive instruction following capabilities,” based on IFEval results. Gemma-4-31B-it ranks highest at 94.1, followed by Gemma-4-26B-A4B-it at 91.3. Granite-4.1-30B scores 89.7, performing slightly above Qwen3.5-35B-A3B at 89.1 and ahead of several models clustered between about 85 and 88, including Gemma-4-E4B-it (87.8), Granite-4.0-H-Small (87.5), Qwen3.5-9B (87.2), and Granite-4.1-8B (87.1). Lower scores include Granite-4.1-3B at 82.1 and Qwen3.5-2B at 70.6. Granite 4.1 models are highlighted in blue, showing competitive but not top performance compared to Gemma models.

Granite 4.1 language models comprehend and adhere to user instructions, ensuring reliable and accurate task completion. This capability is particularly valuable for enterprises looking to automate processes and provide consistent, high-quality results.

Table comparing model performance across evaluation datasets, with columns for Granite-Guardian-4.1-8B, OffsetBias-8B, Skywork-Reward-8B, Skywork-Reward-27B, SFR-Judge-70B, and an Oracle baseline. Granite-Guardian-4.1-8B (highlighted) achieves strong results across all datasets: GSM8k (93.71), MATH (50.79), HumanEval+ (80.08), MBPP+ (70.63), BigCodeBench (43.70), and IFEval (82.81), with an overall score of 70.29. It slightly outperforms other models in most categories, while Oracle scores remain highest overall, including 97.46 on GSM8k and 81.54 overall.

Granite Guardian 4.1 detects key risk dimensions catalogued in the IBM AI Risk Atlas. Trained on unique data comprising human annotations and synthetic data from internal red-teaming, Guardian outperforms similar models on standard benchmarks (including, but not limited to, jailbreak attempts, profanity and hallucinations related to tool calls and retrieval augmented generation in agent-based systems).

Grouped bar chart titled “Granite Speech 4.1 outperforms peers in transcription accuracy,” showing English ASR word error rates across nine datasets (lower is better): GigaSpeech, LScln, LSoth, SPGI, AMI_IHM, AMI_SDM, VoxPopuli, TED-LIUM, and Earnings-22. Multiple models are compared, including Whisper-large-v3, Gemini 2.0 Flash, phi-4-mm, Qwen ASR, Canary, and Granite Speech variants (lighter blue). Granite Speech models consistently achieve among the lowest error rates across most datasets. Error rates range from about 1–2 on LScln, 3–5 on LSoth and SPGI, around 9–16 on AMI_IHM, and the highest on AMI_SDM (roughly 22–41). The chart highlights Granite Speech 4.1 as delivering the best overall transcription accuracy relative to competing models.
Granite Speech 4.1 delivers highly accurate, enterprise-ready speech recognition across diverse real-world audio environments, achieving low word error rates on benchmarks spanning conversational speech, meetings, presentations and earnings calls.
Horizontal bar chart titled “Granite Vision 4.1 tops Claude Opus 4.6 in table extraction,” showing average scores across seven extraction benchmarks (higher is better). Granite-Vision-4.1-4B ranks highest with a score of 86.5, followed by Claude-Opus-4.6 at 83.8. Other models score lower: Gemma4-E4B (72.4), Qwen3.5-4B (71.7), Ministral-3-8B (68.2), and InternVL3.5-4B (66.4). Granite Vision is highlighted in blue, Claude in purple, and remaining models in gray, emphasizing Granite Vision as the top performer.

Granite Vision 4.1 delivers industry-leading performance in extracting structured information from visual content, achieving the highest average score across seven benchmarks spanning chart extraction, table extraction and key-value pair (KVP) extraction.

Granite for developers

Recipe: Document summarization

Build a document summarizer with IBM Granite to process documents beyond context window limits.

RAG with Langchain

Build a RAG pipeline with Granite to answer queries using an external knowledge base.

Recipe: Multimodal RAG

Build a multimodal RAG pipeline with Granite and Docling to query text, tables, and images.

Guide: Open-Source Models

See how open-source LLMs enable autonomy, cut costs, and help developers with evaluation, tuning, and deployment.

Tutorial: Time series forecasting

Use Granite time series models to perform zero-shot and fine-tuned time series forecasting.

Granite Agent Cookbook

Granite recipes for agentic tasks.

Tutorial: Local AI co-pilot

Build a local AI co-pilot using IBM Granite Code, Ollama, and Continue.

Granite Cookbook

View the full Granite Cookbook

Build with Granite

Granite models drive the AI behind many IBM products and services. Discover ready-to-use solutions for code generation, application development, and model testing. All powered by IBM Granite.

IBM believes in the creation, deployment and utilization of AI models that advance innovation across the enterprise responsibly. IBM watsonx AI and data platform have an end-to-end process for building and testing foundation models and generative AI. For IBM-developed models, we search for and remove duplication, and we employ URL blocklists, filters for objectionable content and document quality, sentence splitting and tokenization techniques, all before model training.

During the data training process, we work to prevent misalignments in the model outputs and use supervised fine-tuning to enable better instruction following so that the model can be used to complete enterprise tasks via prompt engineering. We are continuing to develop the Granite models in several directions, including other modalities, industry-specific content and more data annotations for training, while also deploying regular, ongoing data protection safeguards for IBM developed models. 

Given the rapidly changing generative AI technology landscape, our end-to-end processes are expected to continuously evolve and improve. As a testament to the rigor IBM puts into the development and testing of its foundation models, the company provides its standard contractual intellectual property indemnification for IBM-developed models, similar to those it provides for IBM hardware and software products.

Moreover, contrary to some other providers of large language models and consistent with the IBM standard approach on indemnification, IBM does not require its customers to indemnify IBM for a customer's use of IBM-developed models. Also, consistent with the IBM approach to its indemnification obligation, IBM does not cap its indemnification liability for the IBM-developed models.

The current watsonx models now under these protections include:

(1) Slate family of encoder-only models.

(2) Granite family of a decoder-only model.

Learn more about licensing for Granite models

1As of 29 April 2026, released Granite language, vision, speech, embedding and guardian models are being cryptographically signed.

2ISO certification is for the Granite AI Management System (AIMS) of the Granite language models. The certificate may be found here: https://www.schellman.com/certificate-directory under certificate no. 1102257-1.