Skip to content

Allow using Gemini image models (Nano Banana) with generateImage #12252

@felixarntz

Description

@felixarntz

Description

Technically speaking, the Gemini image models are multimodal output models. The AI SDK doesn't have first-class support for that yet, but they can be used pretty much without limitations via generateText.

However, the Gemini image models (Nano Banana) are widely known purely for their image generation and image editing capabilities. It's unlikely people use them to generate text without any images.

For more intuitive DX, we should consider offering their image capabilities via generateImage. The lack of this can cause problems like #10674. It also means users have no guarantee an image was produced, and they'll have to cycle through the result parts to "find" the image.

We don't need to reinvent the wheel to make that possible. For Gemini models, we could call Google's text generation model class from Google's image model class, setting response modalities to image, etc. and parsing the response into the format expected by generateImage.

Related: #5949 (comment) (this will basically add support for image editing to the Google provider)

AI SDK Version

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    ai/corecore functions like generateText, streamText, etc. Provider utils, and provider spec.provider/googleIssues related to the @ai-sdk/google provider

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions