feat(google): allow using Gemini image models with generateImage (#12267)

felixarntz · gr2m · web-flow · commit 4c27179e68fa · 2026-02-11T15:49:45.000-06:00
## Background Gemini image models (e.g., `gemini-2.5-flash-image`) are multimodal output models that are primarily known for their image generation and editing capabilities. While they can technically be used via `generateText()`, `generateImage` provides a more intuitive API when working with these models for image generation tasks. The lack of dedicated `generateImage()` support meant users had to parse through result parts to find images, and there was no guarantee an image would be produced (as reported in #10674). ## Summary This PR adds support for using Gemini image models with `generateImage()` in the Google provider. The implementation internally calls the language model API with `responseModalities: ['IMAGE']` to generate images, then parses the response into the format expected by `generateImage()`. Key changes: - Added support for Gemini image models (e.g., `gemini-2.5-flash-image`, `gemini-3-pro-image-preview`) in `GoogleGenerativeAIImageModel` - Implemented `doGenerateGemini()` method that uses `GoogleGenerativeAILanguageModel` internally with `responseModalities: ['IMAGE']` - Added support for image editing by passing input files/URLs through to the language model - Proper error handling for unsupported options: - Throws error when `n` is explicitly used (Gemini doesn't support generating a set number of images per call) - Throws error when `mask` is provided (Gemini doesn't support mask-based inpainting) - Returns warning when `size` is used, in accordance with existing behavior for Imagen models (should use `aspectRatio` instead) - Updated documentation with dedicated "Gemini Image Models" section explaining usage and capabilities - Added comprehensive test coverage for Gemini image generation and editing scenarios - Updated image model ID types to include `gemini-2.5-flash-image` and `gemini-3-pro-image-preview` - Updated examples to use `generateImage()` instead of `generateText()` for Gemini image models The implementation is backward compatible - existing Imagen models continue to work as before, and Gemini image models can continue to be used with `generateText`. ## Manual Verification Ran the updated example scripts to verify: ```bash # Generate image pnpm tsx examples/ai-functions/src/generate-image/google-gemini-image.ts # Image editing with local file pnpm tsx examples/ai-functions/src/generate-image/google-gemini-editing.ts # Image editing with URL pnpm tsx examples/ai-functions/src/generate-image/google-gemini-editing-url.ts ``` All examples successfully generated and saved images to the `output/` directory. ## Checklist - [x] Tests have been added / updated (for bug fixes / features) - [x] Documentation has been added / updated (for bug fixes / features) - [x] A _patch_ changeset for relevant packages has been added (for bug fixes / features - run `pnpm changeset` in the project root) - [x] I have reviewed this pull request (self-review) ## Future Work We probably want to add the same handling to the `google-vertex` provider for these models. ## Related Issues Fixes #12252 --------- Co-authored-by: Gregor Martynus <39992+gr2m@users.noreply.github.com>
diff --git a/.changeset/fluffy-moles-press.md b/.changeset/fluffy-moles-press.md
@@ -0,0 +1,6 @@
+---
+'@example/ai-functions': patch
+'@ai-sdk/google': patch
+---
+
+feat(google): allow using Gemini image models with `generateImage`
diff --git a/content/providers/01-ai-sdk-providers/15-google-generative-ai.mdx b/content/providers/01-ai-sdk-providers/15-google-generative-ai.mdx
@@ -929,7 +929,7 @@ The `vertexRagStore` tool accepts the following configuration options:
 
 ### Image Outputs
 
-Gemini models with image generation capabilities (`gemini-2.5-flash-image`) support image generation. Images are exposed as files in the response.
+Gemini models with image generation capabilities (e.g. `gemini-2.5-flash-image`) support generating images as part of a multimodal response. Images are exposed as files in the response.
 
 ```ts
 import { google } from '@ai-sdk/google';
@@ -948,6 +948,12 @@ for (const file of result.files) {
 }
 ```
 
+<Note>
+  If you primarily want to generate images without text output, you can also use
+  Gemini image models with the `generateImage()` function. See [Gemini Image
+  Models](#gemini-image-models) for details.
+</Note>
+
 ### Safety Ratings
 
 The safety ratings provide insight into the safety of the model's response.
@@ -1146,9 +1152,18 @@ The following optional provider options are available for Google Generative AI e
 
 ## Image Models
 
-You can create [Imagen](https://ai.google.dev/gemini-api/docs/imagen) models that call the Google Generative AI API using the `.image()` factory method.
+You can create image models that call the Google Generative AI API using the `.image()` factory method.
 For more on image generation with the AI SDK see [generateImage()](/docs/reference/ai-sdk-core/generate-image).
 
+The Google provider supports two types of image models:
+
+- **Imagen models**: Dedicated image generation models using the `:predict` API
+- **Gemini image models**: Multimodal language models with image output capabilities using the `:generateContent` API
+
+### Imagen Models
+
+[Imagen](https://ai.google.dev/gemini-api/docs/imagen) models are dedicated image generation models.
+
 ```ts
 import { google } from '@ai-sdk/google';
 import { generateImage } from 'ai';
@@ -1178,7 +1193,7 @@ const { image } = await generateImage({
 });
 ```
 
-The following provider options are available:
+The following provider options are available for Imagen models:
 
 - **personGeneration** `allow_adult` | `allow_all` | `dont_allow`
   Whether to allow person generation. Defaults to `allow_adult`.
@@ -1188,10 +1203,84 @@ The following provider options are available:
   parameter instead.
 </Note>
 
-#### Model Capabilities
+#### Imagen Model Capabilities
 
 | Model                           | Aspect Ratios             |
 | ------------------------------- | ------------------------- |
 | `imagen-4.0-generate-001`       | 1:1, 3:4, 4:3, 9:16, 16:9 |
 | `imagen-4.0-ultra-generate-001` | 1:1, 3:4, 4:3, 9:16, 16:9 |
 | `imagen-4.0-fast-generate-001`  | 1:1, 3:4, 4:3, 9:16, 16:9 |
+
+### Gemini Image Models
+
+[Gemini image models](https://ai.google.dev/gemini-api/docs/image-generation) (e.g. `gemini-2.5-flash-image`) are technically multimodal output language models, but they can be used with the `generateImage()` function for a simpler image generation experience. Internally, the provider calls the language model API with `responseModalities: ['IMAGE']`.
+
+```ts
+import { google } from '@ai-sdk/google';
+import { generateImage } from 'ai';
+
+const { image } = await generateImage({
+  model: google.image('gemini-2.5-flash-image'),
+  prompt: 'A photorealistic image of a cat wearing a wizard hat',
+  aspectRatio: '1:1',
+});
+```
+
+Gemini image models also support image editing by providing input images:
+
+```ts
+import { google } from '@ai-sdk/google';
+import { generateImage } from 'ai';
+import fs from 'node:fs';
+
+const sourceImage = fs.readFileSync('./cat.png');
+
+const { image } = await generateImage({
+  model: google.image('gemini-2.5-flash-image'),
+  prompt: {
+    text: 'Add a small wizard hat to this cat',
+    images: [sourceImage],
+  },
+});
+```
+
+You can also use URLs for input images:
+
+```ts
+import { google } from '@ai-sdk/google';
+import { generateImage } from 'ai';
+
+const { image } = await generateImage({
+  model: google.image('gemini-2.5-flash-image'),
+  prompt: {
+    text: 'Add a small wizard hat to this cat',
+    images: ['https://example.com/cat.png'],
+  },
+});
+```
+
+<Note>
+  Gemini image models do not support the `size` or `n` parameters. Use
+  `aspectRatio` instead of `size`. Mask-based inpainting is also not supported.
+</Note>
+
+<Note>
+  For more advanced use cases where you need both text and image outputs, or
+  want more control over the generation process, you can use Gemini image models
+  directly with `generateText()`. See [Image Outputs](#image-outputs) for
+  details.
+</Note>
+
+#### Gemini Image Model Capabilities
+
+| Model                        | Image Generation    | Image Editing       | Aspect Ratios                                       |
+| ---------------------------- | ------------------- | ------------------- | --------------------------------------------------- |
+| `gemini-2.5-flash-image`     | <Check size={18} /> | <Check size={18} /> | 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9 |
+| `gemini-3-pro-image-preview` | <Check size={18} /> | <Check size={18} /> | 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9 |
+
+<Note>
+  `gemini-3-pro-image-preview` supports additional features including up to 14
+  reference images for editing (6 objects, 5 humans), resolution options (1K,
+  2K, 4K via `providerOptions.google.imageConfig.imageSize`), and Google Search
+  grounding.
+</Note>
diff --git a/examples/ai-functions/src/generate-image/google-gemini-editing-url.ts b/examples/ai-functions/src/generate-image/google-gemini-editing-url.ts
@@ -1,42 +1,19 @@
 import { google } from '@ai-sdk/google';
-import { generateText } from 'ai';
+import { generateImage } from 'ai';
 import fs from 'node:fs';
 import { run } from '../lib/run';
+import { presentImages } from '../lib/present-image';
 
 run(async () => {
-  const editResult = await generateText({
-    model: google('gemini-2.5-flash-image'),
-    prompt: [
-      {
-        role: 'user',
-        content: [
-          {
-            type: 'text',
-            text: 'Add a small wizard hat to this cat. Keep everything else the same.',
-          },
-          {
-            type: 'image',
-            image: new URL(
-              'https://raw.githubusercontent.com/vercel/ai/refs/heads/main/examples/ai-functions/data/comic-cat.png',
-            ),
-            mediaType: 'image/jpeg',
-          },
-        ],
-      },
-    ],
+  const editResult = await generateImage({
+    model: google.image('gemini-2.5-flash-image'),
+    prompt: {
+      text: 'Add a small wizard hat to this cat. Keep everything else the same.',
+      images: [
+        'https://raw.githubusercontent.com/vercel/ai/refs/heads/main/examples/ai-functions/data/comic-cat.png',
+      ],
+    },
   });
 
-  // Save the edited image
-  const timestamp = Date.now();
-  fs.mkdirSync('output', { recursive: true });
-
-  for (const file of editResult.files) {
-    if (file.mediaType.startsWith('image/')) {
-      await fs.promises.writeFile(
-        `output/edited-${timestamp}.png`,
-        file.uint8Array,
-      );
-      console.log(`Saved edited image: output/edited-${timestamp}.png`);
-    }
-  }
+  presentImages(editResult.images);
 });
diff --git a/examples/ai-functions/src/generate-image/google-gemini-editing.ts b/examples/ai-functions/src/generate-image/google-gemini-editing.ts
@@ -1,65 +1,36 @@
 import { google } from '@ai-sdk/google';
-import { generateText } from 'ai';
+import { generateImage } from 'ai';
 import fs from 'node:fs';
 import { run } from '../lib/run';
+import { presentImages } from '../lib/present-image';
 
 run(async () => {
   console.log('Generating base cat image...');
-  const baseResult = await generateText({
-    model: google('gemini-2.5-flash-image'),
+  const baseResult = await generateImage({
+    model: google.image('gemini-2.5-flash-image'),
     prompt:
       'A photorealistic picture of a fluffy ginger cat sitting on a wooden table',
   });
 
-  let baseImageData: Uint8Array | null = null;
   const timestamp = Date.now();
 
   fs.mkdirSync('output', { recursive: true });
 
-  for (const file of baseResult.files) {
-    if (file.mediaType.startsWith('image/')) {
-      baseImageData = file.uint8Array;
-      await fs.promises.writeFile(
-        `output/cat-base-${timestamp}.png`,
-        file.uint8Array,
-      );
-      console.log(`Saved base image: output/cat-base-${timestamp}.png`);
-      break;
-    }
-  }
-
-  if (!baseImageData) {
-    throw new Error('No base image generated');
-  }
+  const baseImage = baseResult.image;
+  await fs.promises.writeFile(
+    `output/cat-base-${timestamp}.png`,
+    baseImage.uint8Array,
+  );
+  console.log(`Saved base image: output/cat-base-${timestamp}.png`);
 
   console.log('Adding wizard hat...');
-  const editResult = await generateText({
-    model: google('gemini-2.5-flash-image'),
-    prompt: [
-      {
-        role: 'user',
-        content: [
-          {
-            type: 'text',
-            text: 'Add a small wizard hat to this cat. Keep everything else the same.',
-          },
-          {
-            type: 'file',
-            data: baseImageData,
-            mediaType: 'image/png',
-          },
-        ],
-      },
-    ],
+  const editResult = await generateImage({
+    model: google.image('gemini-2.5-flash-image'),
+    prompt: {
+      text: 'Add a small wizard hat to this cat. Keep everything else the same.',
+      images: [baseImage.uint8Array],
+    },
   });
 
-  for (const file of editResult.files) {
-    if (file.mediaType.startsWith('image/')) {
-      await fs.promises.writeFile(
-        `output/cat-wizard-${timestamp}.png`,
-        file.uint8Array,
-      );
-      console.log(`Saved edited image: output/cat-wizard-${timestamp}.png`);
-    }
-  }
+  presentImages(editResult.images);
 });
diff --git a/examples/ai-functions/src/generate-image/google-gemini-image.ts b/examples/ai-functions/src/generate-image/google-gemini-image.ts
@@ -1,28 +1,18 @@
 import { google } from '@ai-sdk/google';
-import { generateText } from 'ai';
+import { generateImage } from 'ai';
 import fs from 'node:fs';
 import { run } from '../lib/run';
+import { presentImages } from '../lib/present-image';
 
 run(async () => {
-  const result = await generateText({
-    model: google('gemini-2.5-flash-image'),
+  const result = await generateImage({
+    model: google.image('gemini-2.5-flash-image'),
     prompt:
       'Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme',
   });
 
-  for (const file of result.files) {
-    if (file.mediaType.startsWith('image/')) {
-      const timestamp = Date.now();
-      const fileName = `nano-banana-${timestamp}.png`;
-
-      fs.mkdirSync('output', { recursive: true });
-      await fs.promises.writeFile(`output/${fileName}`, file.uint8Array);
-
-      console.log(`Generated and saved image: output/${fileName}`);
-    }
-  }
+  presentImages(result.images);
 
   console.log();
   console.log('token usage:', result.usage);
-  console.log('finish reason:', result.finishReason);
 });
diff --git a/examples/ai-functions/src/generate-image/google-gemini-minimal.ts b/examples/ai-functions/src/generate-image/google-gemini-minimal.ts
@@ -1,12 +1,13 @@
 import { google } from '@ai-sdk/google';
-import { generateText } from 'ai';
+import { generateImage } from 'ai';
 import { run } from '../lib/run';
+import { presentImages } from '../lib/present-image';
 
 run(async () => {
-  const { files } = await generateText({
-    model: google('gemini-2.5-flash-image'),
+  const { images } = await generateImage({
+    model: google.image('gemini-2.5-flash-image'),
     prompt: 'A nano banana in a fancy restaurant',
   });
 
-  console.log(`Generated ${files.length} image files`);
+  presentImages(images);
 });
diff --git a/packages/google/src/google-generative-ai-image-model.test.ts b/packages/google/src/google-generative-ai-image-model.test.ts
diff --git a/packages/google/src/google-generative-ai-image-model.ts b/packages/google/src/google-generative-ai-image-model.ts
diff --git a/packages/google/src/google-generative-ai-image-settings.ts b/packages/google/src/google-generative-ai-image-settings.ts