Skip to content

Commit 47114a3

Browse files
vercel-ai-sdk[bot]GidianBfelixarntz
authored
Backport: fix(provider/google): support multimodal tool-result parts in function responses (#13684)
This is an automated backport of #12777 to the release-v6.0 branch. FYI @GidianB ~~This backport has conflicts that need to be resolved manually.~~ Conflicts resolved. ### `git cherry-pick` output ``` Auto-merging packages/google/src/convert-to-google-generative-ai-messages.test.ts Auto-merging packages/google/src/convert-to-google-generative-ai-messages.ts CONFLICT (content): Merge conflict in packages/google/src/convert-to-google-generative-ai-messages.ts Auto-merging packages/google/src/google-generative-ai-language-model.ts error: could not apply 18c1970... fix(provider/google): support multimodal tool-result parts in function responses (#12777) hint: After resolving the conflicts, mark them with hint: "git add/rm <pathspec>", then run hint: "git cherry-pick --continue". hint: You can instead skip this commit with "git cherry-pick --skip". hint: To abort and get back to the state before "git cherry-pick", hint: run "git cherry-pick --abort". hint: Disable this message with "git config set advice.mergeConflict false" ``` --------- Co-authored-by: Gidian <gidianbateman@gmail.com> Co-authored-by: Felix Arntz <felix.arntz@vercel.com>
1 parent d99eb91 commit 47114a3

14 files changed

+987
-97
lines changed

.changeset/warm-yaks-hunt.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
'@ai-sdk/google': patch
3+
---
4+
5+
feat(provider/google): Add multimodal tool-result support for Google function responses.
6+
7+
Tool results with `output.type = 'content'` now map media parts into
8+
`functionResponse.parts` for Google models, including `image-data`,
9+
`file-data`, and base64 `data:` URLs in URL-style content parts.
10+
Remote HTTP(S) URLs in URL-style tool-result parts are not supported.

content/docs/03-ai-sdk-core/15-tools-and-tool-calling.mdx

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1027,10 +1027,14 @@ const { text } = await generateText({
10271027
## Multi-modal Tool Results
10281028

10291029
<Note type="warning">
1030-
Multi-modal tool results are experimental and only supported by Anthropic and
1031-
OpenAI.
1030+
Multi-modal tool results are experimental and supported by Anthropic, OpenAI,
1031+
and Google (Gemini 3 models).
10321032
</Note>
10331033

1034+
For Google, use base64 media parts (`image-data` / `file-data`) or base64
1035+
`data:` URLs in URL-style parts. Remote HTTP(S) URLs in tool-result URL parts
1036+
are not supported.
1037+
10341038
In order to send multi-modal tool results, e.g. screenshots, back to the model,
10351039
they need to be converted into a specific format.
10361040

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
import { google } from '@ai-sdk/google';
2+
import { generateText, stepCountIs, tool } from 'ai';
3+
import fs from 'node:fs/promises';
4+
import path from 'node:path';
5+
import { run } from '../../lib/run';
6+
import { z } from 'zod';
7+
8+
run(async () => {
9+
const readImage = tool({
10+
description: `Read and return an image`,
11+
inputSchema: z.object({}),
12+
execute: async () => {
13+
try {
14+
const imagePath = path.join(__dirname, '../../../data/comic-cat.png');
15+
const imageData = await fs.readFile(imagePath);
16+
17+
return {
18+
success: true,
19+
description: 'Successfully loaded image',
20+
imageData: imageData.toString('base64'),
21+
};
22+
} catch (error) {
23+
throw new Error(`Failed to analyze image: ${error}`);
24+
}
25+
},
26+
toModelOutput({ output }) {
27+
return {
28+
type: 'content',
29+
value: [
30+
{
31+
type: 'text',
32+
text: output.description,
33+
},
34+
{
35+
type: 'image-data',
36+
mediaType: 'image/png',
37+
data: output.imageData,
38+
},
39+
],
40+
};
41+
},
42+
});
43+
44+
const result = await generateText({
45+
model: google('gemini-3-flash-preview'),
46+
prompt:
47+
'Please read the image using the tool provided and return the summary of that image',
48+
tools: {
49+
readImage,
50+
},
51+
stopWhen: stepCountIs(4),
52+
});
53+
54+
console.log(`Assistant response : ${JSON.stringify(result.text, null, 2)}`);
55+
});
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
import { google } from '@ai-sdk/google';
2+
import { generateText, stepCountIs, tool } from 'ai';
3+
import fs from 'node:fs/promises';
4+
import path from 'node:path';
5+
import { run } from '../../lib/run';
6+
import { z } from 'zod';
7+
8+
run(async () => {
9+
const readImage = tool({
10+
description: `Read and return an image`,
11+
inputSchema: z.object({}),
12+
execute: async () => {
13+
try {
14+
const imagePath = path.join(__dirname, '../../../data/comic-cat.png');
15+
const imageData = await fs.readFile(imagePath);
16+
const base64Data = imageData.toString('base64');
17+
18+
return {
19+
success: true,
20+
description: 'Successfully loaded image',
21+
imageUrl: `data:image/png;base64,${base64Data}`,
22+
};
23+
} catch (error) {
24+
throw new Error(`Failed to analyze image: ${error}`);
25+
}
26+
},
27+
toModelOutput({ output }) {
28+
return {
29+
type: 'content',
30+
value: [
31+
{
32+
type: 'text',
33+
text: output.description,
34+
},
35+
{
36+
type: 'image-url',
37+
url: output.imageUrl,
38+
},
39+
],
40+
};
41+
},
42+
});
43+
44+
const result = await generateText({
45+
model: google('gemini-3-flash-preview'),
46+
prompt:
47+
'Please read the image using the tool provided and return the summary of that image',
48+
tools: {
49+
readImage,
50+
},
51+
stopWhen: stepCountIs(4),
52+
});
53+
54+
console.log(`Assistant response : ${JSON.stringify(result.text, null, 2)}`);
55+
});

examples/ai-functions/src/generate-text/google/image-tool-results.ts renamed to examples/ai-functions/src/generate-text/google/image-tool-result-old.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ const imageAnalysisTool = tool({
1515
inputSchema: z.object({}),
1616
execute: async ({}) => {
1717
try {
18-
const imagePath = path.join(__dirname, '../../data/comic-cat.png');
18+
const imagePath = path.join(__dirname, '../../../data/comic-cat.png');
1919
const base64Image = await fileToBase64(imagePath);
2020

2121
return {
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
import { google } from '@ai-sdk/google';
2+
import { generateText, stepCountIs, tool } from 'ai';
3+
import fs from 'node:fs/promises';
4+
import path from 'node:path';
5+
import { run } from '../../lib/run';
6+
import { z } from 'zod';
7+
8+
run(async () => {
9+
const readPDFDocument = tool({
10+
description: `Read and return a PDF document`,
11+
inputSchema: z.object({}),
12+
execute: async () => {
13+
try {
14+
const pdfPath = path.join(__dirname, '../../../data/ai.pdf');
15+
const pdfData = await fs.readFile(pdfPath);
16+
17+
return {
18+
success: true,
19+
description: 'Successfully loaded PDF document',
20+
pdfData: pdfData.toString('base64'),
21+
};
22+
} catch (error) {
23+
throw new Error(`Failed to analyze PDF: ${error}`);
24+
}
25+
},
26+
toModelOutput({ output }) {
27+
return {
28+
type: 'content',
29+
value: [
30+
{
31+
type: 'text',
32+
text: output.description,
33+
},
34+
{
35+
type: 'file-data',
36+
data: output.pdfData,
37+
mediaType: 'application/pdf',
38+
filename: 'ai.pdf',
39+
},
40+
],
41+
};
42+
},
43+
});
44+
45+
const result = await generateText({
46+
model: google('gemini-3-flash-preview'),
47+
prompt:
48+
'Please read the pdf document using the tool provided and return the summary of that pdf',
49+
tools: {
50+
readPDFDocument,
51+
},
52+
stopWhen: stepCountIs(4),
53+
});
54+
55+
console.log(`Assistant response : ${JSON.stringify(result.text, null, 2)}`);
56+
});
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
import { google } from '@ai-sdk/google';
2+
import { generateText, stepCountIs, tool } from 'ai';
3+
import fs from 'node:fs/promises';
4+
import path from 'node:path';
5+
import { run } from '../../lib/run';
6+
import { z } from 'zod';
7+
8+
run(async () => {
9+
const readPDFDocument = tool({
10+
description: `Read and return a PDF document`,
11+
inputSchema: z.object({}),
12+
execute: async () => {
13+
try {
14+
const pdfPath = path.join(__dirname, '../../../data/ai.pdf');
15+
const pdfData = await fs.readFile(pdfPath);
16+
const base64Data = pdfData.toString('base64');
17+
18+
return {
19+
success: true,
20+
description: 'Successfully loaded PDF document',
21+
pdfUrl: `data:application/pdf;base64,${base64Data}`,
22+
};
23+
} catch (error) {
24+
throw new Error(`Failed to analyze PDF: ${error}`);
25+
}
26+
},
27+
toModelOutput({ output }) {
28+
return {
29+
type: 'content',
30+
value: [
31+
{
32+
type: 'text',
33+
text: output.description,
34+
},
35+
{
36+
type: 'file-url',
37+
url: output.pdfUrl,
38+
},
39+
],
40+
};
41+
},
42+
});
43+
44+
const result = await generateText({
45+
model: google('gemini-3-flash-preview'),
46+
prompt:
47+
'Please read the pdf document using the tool provided and return the summary of that pdf',
48+
tools: {
49+
readPDFDocument,
50+
},
51+
stopWhen: stepCountIs(4),
52+
});
53+
54+
console.log(`Assistant response : ${JSON.stringify(result.text, null, 2)}`);
55+
});
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
import { google } from '@ai-sdk/google';
2+
import { stepCountIs, streamText, tool } from 'ai';
3+
import fs from 'node:fs/promises';
4+
import path from 'node:path';
5+
import { run } from '../../lib/run';
6+
import { z } from 'zod';
7+
8+
run(async () => {
9+
const readImage = tool({
10+
description: `Read and return an image`,
11+
inputSchema: z.object({}),
12+
execute: async () => {
13+
try {
14+
const imagePath = path.join(__dirname, '../../../data/comic-cat.png');
15+
const imageData = await fs.readFile(imagePath);
16+
17+
return {
18+
success: true,
19+
description: 'Successfully loaded image',
20+
imageData: imageData.toString('base64'),
21+
};
22+
} catch (error) {
23+
throw new Error(`Failed to analyze image: ${error}`);
24+
}
25+
},
26+
toModelOutput({ output }) {
27+
return {
28+
type: 'content',
29+
value: [
30+
{
31+
type: 'text',
32+
text: output.description,
33+
},
34+
{
35+
type: 'image-data',
36+
mediaType: 'image/png',
37+
data: output.imageData,
38+
},
39+
],
40+
};
41+
},
42+
});
43+
44+
const result = streamText({
45+
model: google('gemini-3-flash-preview'),
46+
prompt:
47+
'Please read the image using the tool provided and return the summary of that image',
48+
tools: {
49+
readImage,
50+
},
51+
stopWhen: stepCountIs(4),
52+
});
53+
54+
for await (const part of result.fullStream) {
55+
switch (part.type) {
56+
case 'text-delta':
57+
process.stdout.write(part.text);
58+
break;
59+
case 'tool-call':
60+
process.stdout.write(
61+
`Tool call: ${part.toolName}(${JSON.stringify(part.input)})\n`,
62+
);
63+
break;
64+
case 'tool-result':
65+
process.stdout.write(
66+
`Tool result: ${part.toolName} -> ${JSON.stringify(part.output)}\n`,
67+
);
68+
break;
69+
case 'finish-step':
70+
process.stdout.write('\n');
71+
process.stdout.write(`Finish step: ${part.finishReason}\n`);
72+
break;
73+
case 'finish':
74+
process.stdout.write('\n');
75+
process.stdout.write(`Finish reason: ${part.finishReason}\n`);
76+
break;
77+
case 'error':
78+
process.stderr.write(`Error: ${part.error}\n`);
79+
break;
80+
}
81+
}
82+
});

0 commit comments

Comments
 (0)