Skip to content

.Net: Add overload ctor to ImageContent that takes an string "url" #4781

@dersia

Description

@dersia

When using the Vision models of AzureOpenAI and OpenAI, the API allows for an Image to be attached either as an URL or as a Data URI. In Semantic Kernel we do have ImageContent as a KernelContent-Component, but unfortunately, it takes only an System.Uri as the parameter for the object.

I suggest to following addition to ImageContent:

Proposed API

public sealed class ImageContent : KernelContent
{
    // existing, I think we should just change the type
    // this is a breaking change, although it don't think we should have both
    // but alternatively we could just add another Property 'DataUri' instead of changing the type
    public Uri? Uri { get; set; }
    // replaced by this:
    public string Uri { get; set; }

    // existing, though changed to call the new overload instead of base
    public ImageContent(
        Uri uri,
        string? modelId = null,
        object? innerContent = null,
        Encoding? encoding = null,
        IReadOnlyDictionary<string, object?>? metadata = null)
        : this(uri.ToString(), modelId, innerContent, encoding, metadata)
    {
        this.Uri = uri;
    }

    public ImageContent(
        string dataUri,
        string? modelId = null,
        object? innerContent = null,
        Encoding? encoding = null,
        IReadOnlyDictionary<string, object?>? metadata = null)
        : base(innerContent, modelId, metadata)
    {
        this.Uri = uri;
    }
}

This would allow us to use Data URIs with embedded images for the Vision APIs.

Usage

var chatHistory = new ChatHistory();
var image = System.Convert.ToBase64String(imageBytes);
var imageDataLink = $"data:{imageContentType};base64,{image}";
chatHistory.Add(new ImageContent(imageDataLink));
var result = await chat.GetChatMessageContentAsync(chatHistory);

Alternate Design

This would be a breaking Change, since we change the Type of the Uri property on ImageContent, so instead

  • we could add another property DataUri to ImageContent, but this would also mean that we have to check which of the two properties are used and prefer one other the other, if both are filled.
  • We could also just make a private string field that holds the DataUri and when the Uri ctor is used we just store the Uri as string in that field and only use the field on implementation site.
  • We could also make ImageContent take a ROS, byte[] or even just a Stream of the image and handle all the Base64Encoding in the ImageContent ctor (I think I would prefer a static Create method over ctor for this option, sonce we can than read the stream async), but this would mean, that we also need a second parameter with the image-ContentType and this would not allow for non base64 encoded data uris

I am happy to work on the implementation, since I have to do it anyway, because we need the solution fairly soon.

Metadata

Metadata

Assignees

Labels

.NETIssue or Pull requests regarding .NET code

Type

No type

Projects

Status

Sprint: Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions