Description
Technically speaking, the Gemini image models are multimodal output models. The AI SDK doesn't have first-class support for that yet, but they can be used pretty much without limitations via generateText.
However, the Gemini image models (Nano Banana) are widely known purely for their image generation and image editing capabilities. It's unlikely people use them to generate text without any images.
For more intuitive DX, we should consider offering their image capabilities via generateImage. The lack of this can cause problems like #10674. It also means users have no guarantee an image was produced, and they'll have to cycle through the result parts to "find" the image.
We don't need to reinvent the wheel to make that possible. For Gemini models, we could call Google's text generation model class from Google's image model class, setting response modalities to image, etc. and parsing the response into the format expected by generateImage.
Related: #5949 (comment) (this will basically add support for image editing to the Google provider)
AI SDK Version
No response
Code of Conduct
Description
Technically speaking, the Gemini image models are multimodal output models. The AI SDK doesn't have first-class support for that yet, but they can be used pretty much without limitations via
generateText.However, the Gemini image models (Nano Banana) are widely known purely for their image generation and image editing capabilities. It's unlikely people use them to generate text without any images.
For more intuitive DX, we should consider offering their image capabilities via
generateImage. The lack of this can cause problems like #10674. It also means users have no guarantee an image was produced, and they'll have to cycle through the result parts to "find" the image.We don't need to reinvent the wheel to make that possible. For Gemini models, we could call Google's text generation model class from Google's image model class, setting response modalities to
image, etc. and parsing the response into the format expected bygenerateImage.Related: #5949 (comment) (this will basically add support for image editing to the Google provider)
AI SDK Version
No response
Code of Conduct