Skip to content

Conversation

@ericstj
Copy link
Member

@ericstj ericstj commented Jul 23, 2025

Microsoft Reviewers: Open in CodeFlow

@github-actions github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Jul 23, 2025
ericstj added 4 commits July 25, 2025 10:10
These are all nullable now so that the client can use defaults where
appropriate.

Remove quality default since it's not consistent across models.

Also remove setting ResponseFormat since this is not supported by
gpt-image-1.
@ericstj ericstj marked this pull request as ready for review July 30, 2025 14:46
Copilot AI review requested due to automatic review settings July 30, 2025 14:46
@ericstj ericstj requested a review from a team as a code owner July 30, 2025 14:46
@ericstj ericstj marked this pull request as draft July 30, 2025 14:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds comprehensive support for text-to-image functionality to the Microsoft.Extensions.AI library. It introduces a new ITextToImageClient interface with implementations for OpenAI, complete with dependency injection support, middleware pipeline capabilities, and comprehensive test coverage.

  • Defines core text-to-image abstractions including ITextToImageClient, TextToImageOptions, and TextToImageResponse
  • Implements OpenAI text-to-image client integration with support for image generation and editing
  • Adds middleware pipeline support with logging and configuration options

Reviewed Changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 8 comments.

File Description
src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/ Core abstractions and interfaces for text-to-image functionality
src/Libraries/Microsoft.Extensions.AI/TextToImage/ Client builder, middleware, and dependency injection extensions
src/Libraries/Microsoft.Extensions.AI.OpenAI/ OpenAI-specific implementation of ITextToImageClient
test/Libraries/Microsoft.Extensions.AI.*Tests/TextToImage/ Comprehensive test coverage for all text-to-image components

ericstj added 3 commits July 30, 2025 11:22
OpenAI's image API supports multiple images and this does seem to be
common functionality and a better generalization.  The client library
doesn't expose this yet, but we should account for it.

Image models may be capable of things like
"Combine the subjects of these images into a single image" or
"Create a single image that uses the subject from the first image and
background for the second" etc.
@ericstj ericstj marked this pull request as ready for review August 8, 2025 20:08
@ericstj ericstj merged commit 7d2ea04 into dotnet:main Aug 12, 2025
6 checks passed
@ericstj
Copy link
Member Author

ericstj commented Aug 12, 2025

/backport to release/9.8

@github-actions
Copy link
Contributor

Started backporting to release/9.8: https://github.com/dotnet/extensions/actions/runs/16914549812

@github-actions
Copy link
Contributor

@ericstj backporting to "release/9.8" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Add ITextToImageClient
Applying: Remove URI based edit since it's not available
Applying: Add filename for edit
Applying: Add OpenAI implmentation of ITextToImageClient
.git/rebase-apply/patch:158: trailing whitespace.
    /// <summary>Initializes a new instance of the <see cref="OpenAITextToImageClient"/> class for the specified <see cref="OpenAIClient"/> and model.  
.git/rebase-apply/patch:161: trailing whitespace.
    /// <param name="model">The default model to use for image generation.</param>  
.git/rebase-apply/patch:371: trailing whitespace.
        
.git/rebase-apply/patch:421: trailing whitespace.
        
.git/rebase-apply/patch:430: trailing whitespace.
        
warning: squelched 1 whitespace error
warning: 6 lines add whitespace errors.
Using index info to reconstruct a base tree...
M	src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIClientExtensions.cs
Falling back to patching base and 3-way merge...
Auto-merging src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIClientExtensions.cs
CONFLICT (content): Merge conflict in src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIClientExtensions.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0004 Add OpenAI implmentation of ITextToImageClient
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

ericstj added a commit to ericstj/extensions that referenced this pull request Aug 12, 2025
* Add ITextToImageClient

* Remove URI based edit since it's not available

* Add filename for edit

* Add OpenAI implmentation of ITextToImageClient

* Fix tests

* Add tests for TextToImage

* Add DeletgatingTextToImageClient and tests

* Add integration test and fix some bugs

* Add remaining support to MEAI for TextToImage

* Make all TextToImageOptions optional

These are all nullable now so that the client can use defaults where
appropriate.

Remove quality default since it's not consistent across models.

Also remove setting ResponseFormat since this is not supported by
gpt-image-1.

* Address feedback

* Document some exceptions

* Address feedback

* Make EditImageAsync plural

OpenAI's image API supports multiple images and this does seem to be
common functionality and a better generalization.  The client library
doesn't expose this yet, but we should account for it.

Image models may be capable of things like
"Combine the subjects of these images into a single image" or
"Create a single image that uses the subject from the first image and
background for the second" etc.

* Address feedback and add/fix tests.

* Fix bad merge

* Address feedback

* Fix test

* Use DataContent.Name for filename.

* Add extensions for EditImageAsync

Extension that accepts a single DataContent and one that accepts a byte[].

I've left out streams and file paths, since these require more opinions
about how to load them.  I filed dotnet#6683 to address streams.

* Fix test

* Remove use of `_model` field.

* Rename ImageToText to Image

* Rename TextToImage directories to Image

* Rename files TextToImage -> Image

* Add new request and response type

* Make GenerateImagesAsync accept ImageRequest

* Remove EditImageAsync

* Adding GenerateStreamingImagesAsync

* Update docs

* Rename ImageClient ImageGenerator

* Fix up some text-to-image references

* Rename Image(Options|Request|Response)

* Remove `Images` from `GenerateImagesAsync`

* Remove streaming method

We don't yet have any good public support for streaming to vet this API

We can guess at how it might behave for OpenAI, but that doesn't
really give enough confidence to build the API around it.

* Address feedback

* Provide OpenAI an appropriate filename

* Remove Style from ImageGenerationOptions
jeffhandley pushed a commit that referenced this pull request Aug 12, 2025
* Add ITextToImageClient

* Remove URI based edit since it's not available

* Add filename for edit

* Add OpenAI implmentation of ITextToImageClient

* Fix tests

* Add tests for TextToImage

* Add DeletgatingTextToImageClient and tests

* Add integration test and fix some bugs

* Add remaining support to MEAI for TextToImage

* Make all TextToImageOptions optional

These are all nullable now so that the client can use defaults where
appropriate.

Remove quality default since it's not consistent across models.

Also remove setting ResponseFormat since this is not supported by
gpt-image-1.

* Address feedback

* Document some exceptions

* Address feedback

* Make EditImageAsync plural

OpenAI's image API supports multiple images and this does seem to be
common functionality and a better generalization.  The client library
doesn't expose this yet, but we should account for it.

Image models may be capable of things like
"Combine the subjects of these images into a single image" or
"Create a single image that uses the subject from the first image and
background for the second" etc.

* Address feedback and add/fix tests.

* Fix bad merge

* Address feedback

* Fix test

* Use DataContent.Name for filename.

* Add extensions for EditImageAsync

Extension that accepts a single DataContent and one that accepts a byte[].

I've left out streams and file paths, since these require more opinions
about how to load them.  I filed #6683 to address streams.

* Fix test

* Remove use of `_model` field.

* Rename ImageToText to Image

* Rename TextToImage directories to Image

* Rename files TextToImage -> Image

* Add new request and response type

* Make GenerateImagesAsync accept ImageRequest

* Remove EditImageAsync

* Adding GenerateStreamingImagesAsync

* Update docs

* Rename ImageClient ImageGenerator

* Fix up some text-to-image references

* Rename Image(Options|Request|Response)

* Remove `Images` from `GenerateImagesAsync`

* Remove streaming method

We don't yet have any good public support for streaming to vet this API

We can guess at how it might behave for OpenAI, but that doesn't
really give enough confidence to build the API around it.

* Address feedback

* Provide OpenAI an appropriate filename

* Remove Style from ImageGenerationOptions
@github-actions github-actions bot locked and limited conversation to collaborators Sep 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-ai Microsoft.Extensions.AI libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants