Fix proper Google API implementation #159

felixarntz · 2025-12-30T19:23:04Z

Follow up to #155.

I hadn't yet tested things when it was merged, so this PR has the fixes needed to actually make things work.

Note: A workaround is included to return the image generation specific class when using Gemini multimodal output models that primarily are used for image generation. This is only since we don't have model class implementations yet that are actually multimodal. Let's discuss in a separate issue what's the best way to go about that, for a proper solution. Doesn't have to block this work.

…f proper multimodal model classes.

github-actions · 2025-12-30T19:23:14Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: felixarntz <flixos90@git.wordpress.org>
Co-authored-by: JasonTheAdams <jason_the_adams@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

felixarntz · 2025-12-30T19:26:55Z

Quick experiment I did to test this:

Imagen 4 image generation

php cli.php 'Photo of a tricolor Cavalier King Charles Spaniel on an airfield in the desert of Peru' --outputFormat=image-base64 --providerId=google

Output:

Gemini Nano Banana image generation

php cli.php 'Photo of a tricolor Cavalier King Charles Spaniel on an airfield in the desert of Peru' --outputFormat=image-base64 --providerId=google --modelId=gemini-2.5-flash-image

Output:

Summary

Confirms what's widely known: multimodal output models that do image generation understand things much better. Both images look solid, but only the Gemini Nano Banana image has a tricolor Cavalier King Charles Spaniel, like I asked for in both cases :)

The other thing is that those models create more realistic looking images, while classic diffusion models create more "artsy" images. The depth of field in the Imagen-generated image is way too extreme - it looks cool, but not realistic.

JasonTheAdams

Glad you tested! Let me know which Issue you open to discuss multi-modal models.

felixarntz added 2 commits December 30, 2025 14:18

Fix remaining quirks to make Google API implementation work as expected.

6319bba

Add temporary workaround for multimodal image output models in lieu o…

807658e

…f proper multimodal model classes.

felixarntz added this to the 0.4.0 milestone Dec 30, 2025

felixarntz requested a review from JasonTheAdams December 30, 2025 19:23

felixarntz added the [Type] Bug An existing feature does not function as intended label Dec 30, 2025

felixarntz mentioned this pull request Dec 30, 2025

Implement proper multimodal output model classes #160

Open

JasonTheAdams approved these changes Dec 30, 2025

View reviewed changes

felixarntz merged commit e0acf10 into trunk Dec 30, 2025
7 checks passed

felixarntz deleted the add/proper-google-provider-implementation branch December 30, 2025 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix proper Google API implementation #159

Fix proper Google API implementation #159

Uh oh!

felixarntz commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025 •

edited

Loading

Uh oh!

felixarntz commented Dec 30, 2025 •

edited

Loading

Uh oh!

JasonTheAdams left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix proper Google API implementation #159

Fix proper Google API implementation #159

Uh oh!

Conversation

felixarntz commented Dec 30, 2025

Uh oh!

github-actions bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felixarntz commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Imagen 4 image generation

Gemini Nano Banana image generation

Summary

Uh oh!

JasonTheAdams left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Dec 30, 2025 •

edited

Loading

felixarntz commented Dec 30, 2025 •

edited

Loading