Fix proper Google API implementation#159
Conversation
…f proper multimodal model classes.
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
|
Quick experiment I did to test this: Imagen 4 image generationphp cli.php 'Photo of a tricolor Cavalier King Charles Spaniel on an airfield in the desert of Peru' --outputFormat=image-base64 --providerId=googleGemini Nano Banana image generationphp cli.php 'Photo of a tricolor Cavalier King Charles Spaniel on an airfield in the desert of Peru' --outputFormat=image-base64 --providerId=google --modelId=gemini-2.5-flash-imageSummaryConfirms what's widely known: multimodal output models that do image generation understand things much better. Both images look solid, but only the Gemini Nano Banana image has a tricolor Cavalier King Charles Spaniel, like I asked for in both cases :) The other thing is that those models create more realistic looking images, while classic diffusion models create more "artsy" images. The depth of field in the Imagen-generated image is way too extreme - it looks cool, but not realistic. |
JasonTheAdams
left a comment
There was a problem hiding this comment.
Glad you tested! Let me know which Issue you open to discuss multi-modal models.


Follow up to #155.
I hadn't yet tested things when it was merged, so this PR has the fixes needed to actually make things work.
Note: A workaround is included to return the image generation specific class when using Gemini multimodal output models that primarily are used for image generation. This is only since we don't have model class implementations yet that are actually multimodal. Let's discuss in a separate issue what's the best way to go about that, for a proper solution. Doesn't have to block this work.