Skip to content

Commit 4c27179

Browse files
felixarntzgr2m
andauthored
feat(google): allow using Gemini image models with generateImage (#12267)
## Background Gemini image models (e.g., `gemini-2.5-flash-image`) are multimodal output models that are primarily known for their image generation and editing capabilities. While they can technically be used via `generateText()`, `generateImage` provides a more intuitive API when working with these models for image generation tasks. The lack of dedicated `generateImage()` support meant users had to parse through result parts to find images, and there was no guarantee an image would be produced (as reported in #10674). ## Summary This PR adds support for using Gemini image models with `generateImage()` in the Google provider. The implementation internally calls the language model API with `responseModalities: ['IMAGE']` to generate images, then parses the response into the format expected by `generateImage()`. Key changes: - Added support for Gemini image models (e.g., `gemini-2.5-flash-image`, `gemini-3-pro-image-preview`) in `GoogleGenerativeAIImageModel` - Implemented `doGenerateGemini()` method that uses `GoogleGenerativeAILanguageModel` internally with `responseModalities: ['IMAGE']` - Added support for image editing by passing input files/URLs through to the language model - Proper error handling for unsupported options: - Throws error when `n` is explicitly used (Gemini doesn't support generating a set number of images per call) - Throws error when `mask` is provided (Gemini doesn't support mask-based inpainting) - Returns warning when `size` is used, in accordance with existing behavior for Imagen models (should use `aspectRatio` instead) - Updated documentation with dedicated "Gemini Image Models" section explaining usage and capabilities - Added comprehensive test coverage for Gemini image generation and editing scenarios - Updated image model ID types to include `gemini-2.5-flash-image` and `gemini-3-pro-image-preview` - Updated examples to use `generateImage()` instead of `generateText()` for Gemini image models The implementation is backward compatible - existing Imagen models continue to work as before, and Gemini image models can continue to be used with `generateText`. ## Manual Verification Ran the updated example scripts to verify: ```bash # Generate image pnpm tsx examples/ai-functions/src/generate-image/google-gemini-image.ts # Image editing with local file pnpm tsx examples/ai-functions/src/generate-image/google-gemini-editing.ts # Image editing with URL pnpm tsx examples/ai-functions/src/generate-image/google-gemini-editing-url.ts ``` All examples successfully generated and saved images to the `output/` directory. ## Checklist - [x] Tests have been added / updated (for bug fixes / features) - [x] Documentation has been added / updated (for bug fixes / features) - [x] A _patch_ changeset for relevant packages has been added (for bug fixes / features - run `pnpm changeset` in the project root) - [x] I have reviewed this pull request (self-review) ## Future Work We probably want to add the same handling to the `google-vertex` provider for these models. ## Related Issues Fixes #12252 --------- Co-authored-by: Gregor Martynus <39992+gr2m@users.noreply.github.com>
1 parent 8f7a309 commit 4c27179

File tree

9 files changed

+675
-111
lines changed

9 files changed

+675
-111
lines changed

.changeset/fluffy-moles-press.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
'@example/ai-functions': patch
3+
'@ai-sdk/google': patch
4+
---
5+
6+
feat(google): allow using Gemini image models with `generateImage`

content/providers/01-ai-sdk-providers/15-google-generative-ai.mdx

Lines changed: 93 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -929,7 +929,7 @@ The `vertexRagStore` tool accepts the following configuration options:
929929

930930
### Image Outputs
931931

932-
Gemini models with image generation capabilities (`gemini-2.5-flash-image`) support image generation. Images are exposed as files in the response.
932+
Gemini models with image generation capabilities (e.g. `gemini-2.5-flash-image`) support generating images as part of a multimodal response. Images are exposed as files in the response.
933933

934934
```ts
935935
import { google } from '@ai-sdk/google';
@@ -948,6 +948,12 @@ for (const file of result.files) {
948948
}
949949
```
950950

951+
<Note>
952+
If you primarily want to generate images without text output, you can also use
953+
Gemini image models with the `generateImage()` function. See [Gemini Image
954+
Models](#gemini-image-models) for details.
955+
</Note>
956+
951957
### Safety Ratings
952958

953959
The safety ratings provide insight into the safety of the model's response.
@@ -1146,9 +1152,18 @@ The following optional provider options are available for Google Generative AI e
11461152

11471153
## Image Models
11481154

1149-
You can create [Imagen](https://ai.google.dev/gemini-api/docs/imagen) models that call the Google Generative AI API using the `.image()` factory method.
1155+
You can create image models that call the Google Generative AI API using the `.image()` factory method.
11501156
For more on image generation with the AI SDK see [generateImage()](/docs/reference/ai-sdk-core/generate-image).
11511157

1158+
The Google provider supports two types of image models:
1159+
1160+
- **Imagen models**: Dedicated image generation models using the `:predict` API
1161+
- **Gemini image models**: Multimodal language models with image output capabilities using the `:generateContent` API
1162+
1163+
### Imagen Models
1164+
1165+
[Imagen](https://ai.google.dev/gemini-api/docs/imagen) models are dedicated image generation models.
1166+
11521167
```ts
11531168
import { google } from '@ai-sdk/google';
11541169
import { generateImage } from 'ai';
@@ -1178,7 +1193,7 @@ const { image } = await generateImage({
11781193
});
11791194
```
11801195

1181-
The following provider options are available:
1196+
The following provider options are available for Imagen models:
11821197

11831198
- **personGeneration** `allow_adult` | `allow_all` | `dont_allow`
11841199
Whether to allow person generation. Defaults to `allow_adult`.
@@ -1188,10 +1203,84 @@ The following provider options are available:
11881203
parameter instead.
11891204
</Note>
11901205

1191-
#### Model Capabilities
1206+
#### Imagen Model Capabilities
11921207

11931208
| Model | Aspect Ratios |
11941209
| ------------------------------- | ------------------------- |
11951210
| `imagen-4.0-generate-001` | 1:1, 3:4, 4:3, 9:16, 16:9 |
11961211
| `imagen-4.0-ultra-generate-001` | 1:1, 3:4, 4:3, 9:16, 16:9 |
11971212
| `imagen-4.0-fast-generate-001` | 1:1, 3:4, 4:3, 9:16, 16:9 |
1213+
1214+
### Gemini Image Models
1215+
1216+
[Gemini image models](https://ai.google.dev/gemini-api/docs/image-generation) (e.g. `gemini-2.5-flash-image`) are technically multimodal output language models, but they can be used with the `generateImage()` function for a simpler image generation experience. Internally, the provider calls the language model API with `responseModalities: ['IMAGE']`.
1217+
1218+
```ts
1219+
import { google } from '@ai-sdk/google';
1220+
import { generateImage } from 'ai';
1221+
1222+
const { image } = await generateImage({
1223+
model: google.image('gemini-2.5-flash-image'),
1224+
prompt: 'A photorealistic image of a cat wearing a wizard hat',
1225+
aspectRatio: '1:1',
1226+
});
1227+
```
1228+
1229+
Gemini image models also support image editing by providing input images:
1230+
1231+
```ts
1232+
import { google } from '@ai-sdk/google';
1233+
import { generateImage } from 'ai';
1234+
import fs from 'node:fs';
1235+
1236+
const sourceImage = fs.readFileSync('./cat.png');
1237+
1238+
const { image } = await generateImage({
1239+
model: google.image('gemini-2.5-flash-image'),
1240+
prompt: {
1241+
text: 'Add a small wizard hat to this cat',
1242+
images: [sourceImage],
1243+
},
1244+
});
1245+
```
1246+
1247+
You can also use URLs for input images:
1248+
1249+
```ts
1250+
import { google } from '@ai-sdk/google';
1251+
import { generateImage } from 'ai';
1252+
1253+
const { image } = await generateImage({
1254+
model: google.image('gemini-2.5-flash-image'),
1255+
prompt: {
1256+
text: 'Add a small wizard hat to this cat',
1257+
images: ['https://example.com/cat.png'],
1258+
},
1259+
});
1260+
```
1261+
1262+
<Note>
1263+
Gemini image models do not support the `size` or `n` parameters. Use
1264+
`aspectRatio` instead of `size`. Mask-based inpainting is also not supported.
1265+
</Note>
1266+
1267+
<Note>
1268+
For more advanced use cases where you need both text and image outputs, or
1269+
want more control over the generation process, you can use Gemini image models
1270+
directly with `generateText()`. See [Image Outputs](#image-outputs) for
1271+
details.
1272+
</Note>
1273+
1274+
#### Gemini Image Model Capabilities
1275+
1276+
| Model | Image Generation | Image Editing | Aspect Ratios |
1277+
| ---------------------------- | ------------------- | ------------------- | --------------------------------------------------- |
1278+
| `gemini-2.5-flash-image` | <Check size={18} /> | <Check size={18} /> | 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9 |
1279+
| `gemini-3-pro-image-preview` | <Check size={18} /> | <Check size={18} /> | 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9 |
1280+
1281+
<Note>
1282+
`gemini-3-pro-image-preview` supports additional features including up to 14
1283+
reference images for editing (6 objects, 5 humans), resolution options (1K,
1284+
2K, 4K via `providerOptions.google.imageConfig.imageSize`), and Google Search
1285+
grounding.
1286+
</Note>
Lines changed: 11 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,19 @@
11
import { google } from '@ai-sdk/google';
2-
import { generateText } from 'ai';
2+
import { generateImage } from 'ai';
33
import fs from 'node:fs';
44
import { run } from '../lib/run';
5+
import { presentImages } from '../lib/present-image';
56

67
run(async () => {
7-
const editResult = await generateText({
8-
model: google('gemini-2.5-flash-image'),
9-
prompt: [
10-
{
11-
role: 'user',
12-
content: [
13-
{
14-
type: 'text',
15-
text: 'Add a small wizard hat to this cat. Keep everything else the same.',
16-
},
17-
{
18-
type: 'image',
19-
image: new URL(
20-
'https://raw.githubusercontent.com/vercel/ai/refs/heads/main/examples/ai-functions/data/comic-cat.png',
21-
),
22-
mediaType: 'image/jpeg',
23-
},
24-
],
25-
},
26-
],
8+
const editResult = await generateImage({
9+
model: google.image('gemini-2.5-flash-image'),
10+
prompt: {
11+
text: 'Add a small wizard hat to this cat. Keep everything else the same.',
12+
images: [
13+
'https://raw.githubusercontent.com/vercel/ai/refs/heads/main/examples/ai-functions/data/comic-cat.png',
14+
],
15+
},
2716
});
2817

29-
// Save the edited image
30-
const timestamp = Date.now();
31-
fs.mkdirSync('output', { recursive: true });
32-
33-
for (const file of editResult.files) {
34-
if (file.mediaType.startsWith('image/')) {
35-
await fs.promises.writeFile(
36-
`output/edited-${timestamp}.png`,
37-
file.uint8Array,
38-
);
39-
console.log(`Saved edited image: output/edited-${timestamp}.png`);
40-
}
41-
}
18+
presentImages(editResult.images);
4219
});
Lines changed: 17 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,36 @@
11
import { google } from '@ai-sdk/google';
2-
import { generateText } from 'ai';
2+
import { generateImage } from 'ai';
33
import fs from 'node:fs';
44
import { run } from '../lib/run';
5+
import { presentImages } from '../lib/present-image';
56

67
run(async () => {
78
console.log('Generating base cat image...');
8-
const baseResult = await generateText({
9-
model: google('gemini-2.5-flash-image'),
9+
const baseResult = await generateImage({
10+
model: google.image('gemini-2.5-flash-image'),
1011
prompt:
1112
'A photorealistic picture of a fluffy ginger cat sitting on a wooden table',
1213
});
1314

14-
let baseImageData: Uint8Array | null = null;
1515
const timestamp = Date.now();
1616

1717
fs.mkdirSync('output', { recursive: true });
1818

19-
for (const file of baseResult.files) {
20-
if (file.mediaType.startsWith('image/')) {
21-
baseImageData = file.uint8Array;
22-
await fs.promises.writeFile(
23-
`output/cat-base-${timestamp}.png`,
24-
file.uint8Array,
25-
);
26-
console.log(`Saved base image: output/cat-base-${timestamp}.png`);
27-
break;
28-
}
29-
}
30-
31-
if (!baseImageData) {
32-
throw new Error('No base image generated');
33-
}
19+
const baseImage = baseResult.image;
20+
await fs.promises.writeFile(
21+
`output/cat-base-${timestamp}.png`,
22+
baseImage.uint8Array,
23+
);
24+
console.log(`Saved base image: output/cat-base-${timestamp}.png`);
3425

3526
console.log('Adding wizard hat...');
36-
const editResult = await generateText({
37-
model: google('gemini-2.5-flash-image'),
38-
prompt: [
39-
{
40-
role: 'user',
41-
content: [
42-
{
43-
type: 'text',
44-
text: 'Add a small wizard hat to this cat. Keep everything else the same.',
45-
},
46-
{
47-
type: 'file',
48-
data: baseImageData,
49-
mediaType: 'image/png',
50-
},
51-
],
52-
},
53-
],
27+
const editResult = await generateImage({
28+
model: google.image('gemini-2.5-flash-image'),
29+
prompt: {
30+
text: 'Add a small wizard hat to this cat. Keep everything else the same.',
31+
images: [baseImage.uint8Array],
32+
},
5433
});
5534

56-
for (const file of editResult.files) {
57-
if (file.mediaType.startsWith('image/')) {
58-
await fs.promises.writeFile(
59-
`output/cat-wizard-${timestamp}.png`,
60-
file.uint8Array,
61-
);
62-
console.log(`Saved edited image: output/cat-wizard-${timestamp}.png`);
63-
}
64-
}
35+
presentImages(editResult.images);
6536
});
Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,18 @@
11
import { google } from '@ai-sdk/google';
2-
import { generateText } from 'ai';
2+
import { generateImage } from 'ai';
33
import fs from 'node:fs';
44
import { run } from '../lib/run';
5+
import { presentImages } from '../lib/present-image';
56

67
run(async () => {
7-
const result = await generateText({
8-
model: google('gemini-2.5-flash-image'),
8+
const result = await generateImage({
9+
model: google.image('gemini-2.5-flash-image'),
910
prompt:
1011
'Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme',
1112
});
1213

13-
for (const file of result.files) {
14-
if (file.mediaType.startsWith('image/')) {
15-
const timestamp = Date.now();
16-
const fileName = `nano-banana-${timestamp}.png`;
17-
18-
fs.mkdirSync('output', { recursive: true });
19-
await fs.promises.writeFile(`output/${fileName}`, file.uint8Array);
20-
21-
console.log(`Generated and saved image: output/${fileName}`);
22-
}
23-
}
14+
presentImages(result.images);
2415

2516
console.log();
2617
console.log('token usage:', result.usage);
27-
console.log('finish reason:', result.finishReason);
2818
});
Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
import { google } from '@ai-sdk/google';
2-
import { generateText } from 'ai';
2+
import { generateImage } from 'ai';
33
import { run } from '../lib/run';
4+
import { presentImages } from '../lib/present-image';
45

56
run(async () => {
6-
const { files } = await generateText({
7-
model: google('gemini-2.5-flash-image'),
7+
const { images } = await generateImage({
8+
model: google.image('gemini-2.5-flash-image'),
89
prompt: 'A nano banana in a fancy restaurant',
910
});
1011

11-
console.log(`Generated ${files.length} image files`);
12+
presentImages(images);
1213
});

0 commit comments

Comments
 (0)