-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Closed
Labels
.NETIssue or Pull requests regarding .NET codeIssue or Pull requests regarding .NET code
Description
When using the Vision models of AzureOpenAI and OpenAI, the API allows for an Image to be attached either as an URL or as a Data URI. In Semantic Kernel we do have ImageContent as a KernelContent-Component, but unfortunately, it takes only an System.Uri as the parameter for the object.
I suggest to following addition to ImageContent:
Proposed API
public sealed class ImageContent : KernelContent
{
// existing, I think we should just change the type
// this is a breaking change, although it don't think we should have both
// but alternatively we could just add another Property 'DataUri' instead of changing the type
public Uri? Uri { get; set; }
// replaced by this:
public string Uri { get; set; }
// existing, though changed to call the new overload instead of base
public ImageContent(
Uri uri,
string? modelId = null,
object? innerContent = null,
Encoding? encoding = null,
IReadOnlyDictionary<string, object?>? metadata = null)
: this(uri.ToString(), modelId, innerContent, encoding, metadata)
{
this.Uri = uri;
}
public ImageContent(
string dataUri,
string? modelId = null,
object? innerContent = null,
Encoding? encoding = null,
IReadOnlyDictionary<string, object?>? metadata = null)
: base(innerContent, modelId, metadata)
{
this.Uri = uri;
}
}This would allow us to use Data URIs with embedded images for the Vision APIs.
Usage
var chatHistory = new ChatHistory();
var image = System.Convert.ToBase64String(imageBytes);
var imageDataLink = $"data:{imageContentType};base64,{image}";
chatHistory.Add(new ImageContent(imageDataLink));
var result = await chat.GetChatMessageContentAsync(chatHistory);Alternate Design
This would be a breaking Change, since we change the Type of the Uri property on ImageContent, so instead
- we could add another property
DataUritoImageContent, but this would also mean that we have to check which of the two properties are used and prefer one other the other, if both are filled. - We could also just make a
private string fieldthat holds the DataUri and when theUrictor is used we just store the Uri as string in that field and only use the field on implementation site. - We could also make
ImageContenttake a ROS, byte[] or even just a Stream of the image and handle all the Base64Encoding in theImageContentctor (I think I would prefer a static Create method over ctor for this option, sonce we can than read the stream async), but this would mean, that we also need a second parameter with the image-ContentType and this would not allow for non base64 encoded data uris
I am happy to work on the implementation, since I have to do it anyway, because we need the solution fairly soon.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
.NETIssue or Pull requests regarding .NET codeIssue or Pull requests regarding .NET code
Type
Projects
Status
Sprint: Done