Skip to content

Conversation

@stephentoub
Copy link
Member

@stephentoub stephentoub commented Jul 16, 2025

Closes #6614

cc: @rogerbarreto

Microsoft Reviewers: Open in CodeFlow

@github-actions github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Jul 16, 2025
@rogerbarreto
Copy link
Contributor

rogerbarreto commented Jul 21, 2025

Proposal: CodeExecutionResultContent

Context

The Microsoft.Extensions.AI.Abstractions library needed a unified AIContent subclass to represent code execution results from various AI providers. This content type is used in both ChatMessage.Contents (non-streaming) and ChatResponseUpdate.Contents (streaming) scenarios to provide a consistent abstraction across different provider implementations.

Decision

Implemented CodeExecutionResultContent as a minimal, provider-agnostic abstraction that focuses only on properties proven to be common across multiple AI providers, while leveraging existing AIContent types for specialized use cases like file handling.

Design Investigation & Rationale

Provider Research Findings

Investigated code execution response formats from four major AI providers:

1. Anthropic Claude

Response Format:

{
  "role": "assistant",
  "container": {
    "id": "container_011CPR5CNjB747bTd36fQLFk",
    "expires_at": "2025-05-23T21:13:31.749448Z"
  },
  "content": [
    {
      "type": "text",
      "text": "I'll calculate the mean and standard deviation for you."
    },
    {
      "type": "server_tool_use",
      "id": "srvtoolu_01A2B3C4D5E6F7G8H9I0J1K2",
      "name": "code_execution",
      "input": {
        "code": "import numpy as np\ndata = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\nmean = np.mean(data)\nstd = np.std(data)\nprint(f\"Mean: {mean}\")\nprint(f\"Standard deviation: {std}\")"
      }
    },
    {
      "type": "code_execution_tool_result",
      "tool_use_id": "srvtoolu_01A2B3C4D5E6F7G8H9I0J1K2",
      "content": {
        "type": "code_execution_result",
        "stdout": "Mean: 5.5\nStandard deviation: 2.8722813232690143\n",
        "stderr": "",
        "return_code": 0,
        "content": [
           {
              "file_id": "generated_file_id",
              "type": "code_execution_output"
           }
        ]
      }
    },
    {
      "type": "text",
      "text": "The mean of the dataset is 5.5 and the standard deviation is approximately 2.87."
    }
  ],
  "id": "msg_01BqK2v4FnRs4xTjgL8EuZxz",
  "model": "claude-opus-4-20250514",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 45,
    "output_tokens": 187,
  }
}

Key Features:

  • ✅ Provides stdout, stderr, return_code
  • ✅ Returns executed code
  • ✅ Supports streaming via Server-Sent Events
  • ✅ Supports built-in file generation

2. Google Vertex AI/Gemini

Response Format:

{
  "executableCode": {
    "language": "PYTHON",
    "code": "total = 0\nfor i in range(1, 11):\n    total += i\nprint(f'{total=}')\n"
  },
  "codeExecutionResult": {
    "outcome": "OUTCOME_OK",
    "output": "total=55\n"
  }
}

Key Features:

  • ✅ Provides executed code and output
  • ✅ Uses outcome enum instead of numeric exit codes
  • ✅ Supports inline images: "The output files are returned as inline images in the response"
  • ✅ Supports streaming via streamGenerateContent

3. OpenAI Code Interpreter (Assistants API)

Response Format:

  • Uses Assistants API pattern with tool calls and results
  • ✅ Provides execution output
  • ✅ Does NOT return executed code
  • ✅ Supports generated files with file IDs
  • ✅ Supports streaming for code execution

Response Format:

{
  "id": "msg_abc123",
  "object": "thread.message",
  "created_at": 1698983503,
  "thread_id": "thread_abc123",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": {
        "value": "Hi! How can I help you today?",
        "annotations": [
           {
              "type": "file_path",
              "end_index": "integer",
              "start_index": "integer",
              "file_path": 
              { 
                 "file_id": "string" 
              }
           }
        ]
      }
    }
  ],
  "assistant_id": "asst_abc123",
  "run_id": "run_abc123",
  "attachments": 
  [
     {
        "file_id", "string",
        "tools": [{ "type": "code_interpreter" }]
     }
  ],
  "metadata": {}
}

Streaming Response Format:
https://platform.openai.com/docs/api-reference/assistants-streaming/run-step-delta-object

{
  "id": "step_123",
  "object": "thread.run.step.delta",
  "delta": {
    "step_details": {
      "type": "tool_calls",
      "tool_calls": [
        {
          "index": 0,
          "id": "call_123",
          "type": "code_interpreter",
          "code_interpreter": 
          { 
             "input": "string", 
             "outputs": 
             [
               { 
                  "type": "logs", 
                  "logs": "string" 
               },
               { 
                  "type": "image", 
                  "image": { "file_id", "string" }
               }
            ]
          }
        }
      ]
    }
  }
}

3b. OpenAI Responses API (NEW)

Response Format:

{
  "type": "code_interpreter_call",
  "outputs": [
    {
      "type": "output_text",
      "text": "Calculation result: 42"
    },
    {
      "type": "output_image",
      "image": "base64_encoded_image_data"
    }
  ]
}

Streaming Events:

response.code_interpreter_call.in_progress
response.code_interpreter_call_code.delta  // Streams executed code
response.code_interpreter_call_output.delta // Streams execution output

Key Features:

  • ✅ Provides execution output via output_text and output_image
  • STREAMS executed code via code.delta events
  • STREAMS execution output via output.delta events
  • ✅ Supports generated files and images
  • Full streaming support for code execution

4. Azure AI Foundry

Response Format:

  • Follows OpenAI Responses API pattern (supports both Assistants and Responses APIs)
  • ✅ Supports "file path annotations" and generated files
  • STREAMS executed code (via Responses API)
  • STREAMS execution output (via Responses API)
  • Full streaming support for code execution (via Responses API)

Decision-Making Process

Based on this analysis, applied the principle of minimal proven abstractions:

Properties INCLUDED (Multi-Provider Support):

  • Stdout: Supported by Anthropic (stdout), Google Vertex AI (output), OpenAI/Azure (execution output)
  • Stderr: Supported by Anthropic (stderr), others use similar error patterns
  • GeneratedContents: Supported by Google Vertex AI (inline images), Azure AI Foundry (file annotations), OpenAI (file references)

Properties EXCLUDED (Not Proven Across Multiple Providers):

  • Code: Only Google Vertex AI and OpenAI Responses API return executed code (2/5 provider implementations, but OpenAI has two different APIs)
  • ExitCode: Only Anthropic Claude uses return_code (1/5 provider implementations)
  • Language: Provider-specific metadata
  • ExecutionTimeMs: Provider-specific metadata
  • ErrorCode/ErrorMessage: Provider-specific error handling

Note: While OpenAI Responses API does stream executed code, the Code property was excluded because:

  1. OpenAI has two different APIs (Assistants vs Responses) with different capabilities
  2. The executed code is primarily useful for debugging/transparency rather than core functionality
  3. The abstraction focuses on execution results rather than the code itself
  4. Providers can include executed code in GeneratedContents if needed

Streaming Investigation

Providers with Streaming Support:

  • Anthropic Claude: Full streaming with fine-grained tool streaming
  • Google Vertex AI: Real-time streaming with incremental processing
  • OpenAI Responses API: Full streaming including executed code and output
  • Azure AI Foundry: Full streaming support (via Responses API)
  • OpenAI Assistants API: Full streaming support (via delta)

Streaming Patterns Identified:

  1. Incremental Output Streaming: Real-time stdout/stderr updates
  2. Complete Result Streaming: Single update with full execution results
  3. Mixed Content Streaming: Code execution mixed with text explanations

Final Design Implementation

CodeExecutionResultContent Class

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Diagnostics.CodeAnalysis;
using System.Text.Json.Serialization;

namespace Microsoft.Extensions.AI;

/// <summary>
/// Represents the result of code execution in a chat.
/// </summary>
/// <remarks>
/// This content type represents the core output from executing code in various AI provider environments,
/// focusing on the essential properties: stdout, stderr, and generated content.
/// It provides a minimal unified format that accommodates the common execution result patterns
/// from providers such as Anthropic Claude, Azure AI Foundry, OpenAI Code Interpreter, and Google Vertex AI.
/// Generated files are represented using existing AIContent types like DataContent, UriContent, or HostedFileContent.
/// </remarks>
[DebuggerDisplay("{DebuggerDisplay,nq}")]
public sealed class CodeExecutionResultContent : AIContent
{
    /// <summary>The standard output from code execution.</summary>
    private string? _stdout;

    /// <summary>The standard error output from code execution.</summary>
    private string? _stderr;

    /// <summary>
    /// Initializes a new instance of the <see cref="CodeExecutionResultContent"/> class.
    /// </summary>
    /// <param name="stdout">The standard output from execution.</param>
    /// <param name="stderr">The standard error output from execution.</param>
    /// <param name="generatedContents">Collection of content generated during execution (images, files, etc.).</param>
    public CodeExecutionResultContent(string? stdout = null, string? stderr = null, IList<AIContent>? generatedContents = null)
    {
        _stdout = stdout;
        _stderr = stderr;
        GeneratedContents = generatedContents;
    }

    /// <summary>
    /// Initializes a new instance of the <see cref="CodeExecutionResultContent"/> class for JSON deserialization.
    /// </summary>
    /// <param name="stdout">The standard output from execution.</param>
    /// <param name="stderr">The standard error output from execution.</param>
    /// <param name="generatedContents">Collection of content generated during execution.</param>
    [JsonConstructor]
    public CodeExecutionResultContent(
        string? stdout = null,
        string? stderr = null,
        IList<AIContent>? generatedContents = null)
    {
        _stdout = stdout;
        _stderr = stderr;
        GeneratedContents = generatedContents;
    }

    /// <summary>
    /// Gets or sets the standard output from code execution.
    /// </summary>
    /// <remarks>
    /// This contains the text output that would normally be written to stdout during code execution,
    /// such as print statements, successful computation results, etc.
    /// </remarks>
    [AllowNull]
    public string Stdout
    {
        get => _stdout ?? string.Empty;
        set => _stdout = value;
    }

    /// <summary>
    /// Gets or sets the standard error output from code execution.
    /// </summary>
    /// <remarks>
    /// This contains error messages, warnings, and other diagnostic information that would
    /// normally be written to stderr during code execution.
    /// </remarks>
    [AllowNull]
    public string Stderr
    {
        get => _stderr ?? string.Empty;
        set => _stderr = value;
    }

    /// <summary>
    /// Gets or sets the collection of content generated during code execution.
    /// </summary>
    /// <remarks>
    /// This includes images, charts, data files, and other artifacts created by the executed code.
    /// Generated content is represented using existing AIContent types such as DataContent for inline data,
    /// UriContent for downloadable files, or HostedFileContent for provider-hosted files.
    /// This property is supported by multiple providers including Google Vertex AI (inline images),
    /// Azure AI Foundry (file path annotations), and OpenAI Code Interpreter (file references).
    /// </remarks>
    public IList<AIContent>? GeneratedContents { get; set; }

    /// <summary>
    /// Gets the combined text from code execution, including both stdout and stderr.
    /// </summary>
    /// <remarks>
    /// This property combines the standard output and standard error streams into a single string.
    /// If both stdout and stderr have content, stderr is appended after stdout.
    /// If only one stream has content, only that content is returned.
    /// </remarks>
    [JsonIgnore]
    public string Text
    {
        get
        {
            var hasStdout = !string.IsNullOrEmpty(Stdout);
            var hasStderr = !string.IsNullOrEmpty(Stderr);

            if (hasStdout && hasStderr)
            {
                return $"{Stdout}\n{Stderr}";
            }

            if (hasStdout)
            {
                return Stdout;
            }

            if (hasStderr)
            {
                return Stderr;
            }
            
            return string.Empty;
        }
    }

    /// <summary>
    /// Gets a value indicating whether the code execution was successful.
    /// </summary>
    /// <remarks>
    /// This is determined by checking if there are no errors in stderr.
    /// </remarks>
    [JsonIgnore]
    public bool IsSuccess => string.IsNullOrEmpty(Stderr);

    /// <inheritdoc/>
    public override string ToString() => Text;

    /// <summary>Gets a string representing this instance to display in the debugger.</summary>
    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    private string DebuggerDisplay
    {
        get
        {
            var status = IsSuccess ? "Success" : "Failed";
            var output = !string.IsNullOrEmpty(Text) ? $", Text: \"{Text.Substring(0, Math.Min(50, Text.Length))}\"" : string.Empty;
            var generated = GeneratedContents?.Count > 0 ? $", Generated: {GeneratedContents.Count}" : string.Empty;

            return $"CodeExecution = {status}{output}{generated}";
        }
    }
}

AIContent.cs JsonDerivedType Addition

/// <summary>Represents content used by AI services.</summary>
[JsonPolymorphic(TypeDiscriminatorPropertyName = "$type")]
[JsonDerivedType(typeof(CodeExecutionResultContent), typeDiscriminator: "codeExecutionResult")]
[JsonDerivedType(typeof(DataContent), typeDiscriminator: "data")]
[JsonDerivedType(typeof(ErrorContent), typeDiscriminator: "error")]
// ... other existing types
public class AIContent
{
    // ... existing implementation
}

Text Property Rationale

The Text property combines stdout and stderr into a single string because:

  1. Simplifies consumption: ToString() can simply return Text
  2. Avoids naming confusion: "Output" implied only stdout, "Text" is neutral
  3. Consistent with other AIContent types: Similar to how TextContent has a Text property
  4. Streaming friendly: Allows incremental text updates in streaming scenarios

Consumer Usage Examples

Non-Streaming Scenario

using Microsoft.Extensions.AI;

IChatClient client = // ... initialize your provider client

// Request code execution
var response = await client.GetResponseAsync([
    new ChatMessage(ChatRole.User, "Calculate the mean of [1, 2, 3, 4, 5] and create a chart")
]);

// Process the response
foreach (var content in response.Message.Contents)
{
    switch (content)
    {
        case TextContent textContent:
            Console.WriteLine($"AI Response: {textContent.Text}");
            break;

        case CodeExecutionResultContent codeResult:
            Console.WriteLine($"Code Execution Output: {codeResult.Text}");
            Console.WriteLine($"Success: {codeResult.IsSuccess}");

            // Handle generated files
            if (codeResult.GeneratedContents != null)
            {
                foreach (var generated in codeResult.GeneratedContents)
                {
                    switch (generated)
                    {
                        case DataContent dataContent:
                            Console.WriteLine($"Generated inline file: {dataContent.MediaType}");
                            // Save dataContent.Data to file
                            break;

                        case HostedFileContent hostedFile:
                            Console.WriteLine($"Generated hosted file: {hostedFile.FileId}");
                            // Download file using provider's file API
                            break;

                        case UriContent uriContent:
                            Console.WriteLine($"Generated file available at: {uriContent.Uri}");
                            // Download from URI
                            break;
                    }
                }
            }
            break;
    }
}

Streaming Scenario

using Microsoft.Extensions.AI;

IChatClient client = // ... initialize your provider client

// Request streaming code execution
await foreach (var update in client.GetStreamingResponseAsync([
    new ChatMessage(ChatRole.User, "Run a data analysis and generate visualizations")
]))
{
    foreach (var content in update.Contents)
    {
        switch (content)
        {
            case TextContent textContent:
                Console.Write(textContent.Text); // Stream text as it arrives
                break;

            case CodeExecutionResultContent codeResult:
                // Handle incremental execution output
                if (!string.IsNullOrEmpty(codeResult.Stdout))
                {
                    Console.Write($"[STDOUT] {codeResult.Stdout}");
                }

                if (!string.IsNullOrEmpty(codeResult.Stderr))
                {
                    Console.Write($"[STDERR] {codeResult.Stderr}");
                }

                // Handle generated content as it becomes available
                if (codeResult.GeneratedContents?.Count > 0)
                {
                    Console.WriteLine($"\n[FILES] Generated {codeResult.GeneratedContents.Count} files");
                    foreach (var generated in codeResult.GeneratedContents)
                    {
                        Console.WriteLine($"  - {generated.GetType().Name}");
                    }
                }
                break;
        }
    }
}

Different Content Combinations

// Example 1: Successful execution with output
var successResult = new CodeExecutionResultContent(
    stdout: "Calculation complete!\nMean: 3.0\nStandard deviation: 1.58",
    stderr: ""
);
Console.WriteLine(successResult.Text); // Prints the stdout
Console.WriteLine(successResult.IsSuccess); // True

// Example 2: Failed execution with error
var errorResult = new CodeExecutionResultContent(
    stdout: "Starting calculation...",
    stderr: "NameError: name 'undefined_variable' is not defined"
);
Console.WriteLine(errorResult.Text); // Prints: "Starting calculation...\nNameError: name 'undefined_variable' is not defined"
Console.WriteLine(errorResult.IsSuccess); // False

// Example 3: Execution with generated files
var resultWithFiles = new CodeExecutionResultContent(
    stdout: "Chart generated successfully",
    generatedContents: [
        new DataContent(chartImageBytes, "image/png"),
        new DataContent(csvData, "text/csv")
    ]
);

Streaming Scenarios

Based on this investigation, identified three streaming patterns that the design accommodates:

Pattern 1: Incremental Output Streaming (Anthropic/Google/OpenAI Responses Style)

Real-time streaming of stdout/stderr as code executes. Supported by: Anthropic Claude, Google Vertex AI, OpenAI Responses API, Azure AI Foundry:

// Provider implementation for incremental streaming
public async IAsyncEnumerable<ChatResponseUpdate> StreamCodeExecution(string code)
{
    // Stream 1: Execution start
    yield return new ChatResponseUpdate(ChatRole.Assistant, [
        new CodeExecutionResultContent(stdout: "Starting execution...\n")
    ]);

    // Stream 2: Incremental output
    yield return new ChatResponseUpdate(ChatRole.Assistant, [
        new CodeExecutionResultContent(stdout: "Processing data...\n")
    ]);

    // Stream 3: More output
    yield return new ChatResponseUpdate(ChatRole.Assistant, [
        new CodeExecutionResultContent(stdout: "Generating visualization...\n")
    ]);

    // Stream 4: Final result with generated content
    yield return new ChatResponseUpdate(ChatRole.Assistant, [
        new CodeExecutionResultContent(
            stdout: "Execution complete!\nResults saved.\n",
            generatedContents: [
                new DataContent(chartImageBytes, "image/png"),
                new DataContent(csvData, "text/csv")
            ]
        )
    ]);
}

Pattern 2: Complete Result Streaming (OpenAI Assistants Style)

Single update with complete execution results. Supported by: OpenAI Assistants API (legacy):

// Provider implementation for complete result streaming
public async IAsyncEnumerable<ChatResponseUpdate> StreamCodeExecution(string code)
{
    // Execute code completely, then stream result
    var executionResult = await ExecuteCodeAsync(code);

    yield return new ChatResponseUpdate(ChatRole.Assistant, [
        new CodeExecutionResultContent(
            stdout: executionResult.CompleteOutput,
            stderr: executionResult.ErrorOutput,
            generatedContents: executionResult.GeneratedFiles?.Select(f =>
                new HostedFileContent(f.FileId)).ToList()
        )
    ]);
}

Pattern 3: Mixed Content Streaming

Code execution mixed with explanatory text:

// Provider implementation for mixed content streaming
public async IAsyncEnumerable<ChatResponseUpdate> StreamCodeExecution(string code)
{
    // Stream 1: AI explanation
    yield return new ChatResponseUpdate(ChatRole.Assistant, [
        new TextContent("I'll run this code to analyze your data:")
    ]);

    // Stream 2: Code execution result
    yield return new ChatResponseUpdate(ChatRole.Assistant, [
        new CodeExecutionResultContent(
            stdout: "Data analysis complete\nMean: 42.5, Std: 12.3",
            generatedContents: [
                new DataContent(chartBytes, "image/png")
            ]
        )
    ]);

    // Stream 3: Follow-up explanation
    yield return new ChatResponseUpdate(ChatRole.Assistant, [
        new TextContent("The results show a normal distribution with...")
    ]);
}

Consumer Streaming Handling

Consumers can handle all streaming patterns uniformly:

var allExecutionResults = new List<CodeExecutionResultContent>();
var allText = new StringBuilder();

await foreach (var update in client.GetStreamingResponseAsync(messages))
{
    foreach (var content in update.Contents)
    {
        switch (content)
        {
            case CodeExecutionResultContent codeResult:
                allExecutionResults.Add(codeResult);

                // Handle incremental output
                if (!string.IsNullOrEmpty(codeResult.Text))
                {
                    Console.Write(codeResult.Text);
                    allText.Append(codeResult.Text);
                }

                // Handle generated content
                if (codeResult.GeneratedContents?.Count > 0)
                {
                    ProcessGeneratedContent(codeResult.GeneratedContents);
                }
                break;

            case TextContent textContent:
                Console.Write(textContent.Text);
                break;
        }
    }
}

// Final combined result
var combinedResult = new CodeExecutionResultContent(
    stdout: allText.ToString(),
    generatedContents: allExecutionResults
        .SelectMany(r => r.GeneratedContents ?? [])
        .ToList()
);

Benefits of This Design

  1. Minimal Surface Area: Only includes properties proven across multiple providers
  2. Leverages Existing Abstractions: Uses DataContent, UriContent, HostedFileContent for files
  3. Streaming Compatible: Works with both incremental and complete-result streaming (4/5 provider implementations support streaming)
  4. Provider Agnostic: Accommodates different provider response formats including both OpenAI APIs
  5. Future-Proof: Design accommodates the evolution from OpenAI Assistants API to Responses API
  6. Extensible: Can add properties later if they prove common across providers
  7. Type Safe: Full polymorphic JSON serialization support
  8. Developer Friendly: Simple Text property and ToString() for easy consumption

Alternatives Considered

  1. Comprehensive Design: Including all provider-specific properties (rejected due to complexity)
  2. Provider-Specific Subclasses: Separate classes per provider (rejected due to fragmentation)
  3. Generic Content with Metadata: Using AdditionalProperties for everything (rejected due to lack of type safety)
  4. Custom File Type: New CodeExecutionFile class (rejected in favor of existing AIContent types)

Future Considerations

  • Monitor provider evolution for new common patterns
  • Consider adding properties if they become proven across multiple providers
  • Evaluate streaming optimizations based on real-world usage
  • Assess need for provider-specific extensions through AdditionalProperties

@stephentoub
Copy link
Member Author

stephentoub commented Jul 21, 2025

@rogerbarreto:

Stderr: Supported by Anthropic (stderr), others use similar error patterns

I only see error information from anthropic. How is it represented in the others? I also find it a bit odd calling it stdout/stderr, as that presumes a particular mode of execution.

Could error information not be conveyed using ErrorContent in the AIContent collection?

And could textual output not just be TextContent in the AIContent collection, along with any other generated output? Then a Text property could just do what ChatResponse/ChatMessage/etc. do, which is have Text just concat any TextContent in their collection.

That breeds:

public sealed class CodeExecutionResultContent : AIContent
{
    public IList<AIContent>? GeneratedContents { get; set; }

    [JsonIgnore]
    public string Text { get; }
}

Is that sufficient / sufficiently flexible?

Code: Only Google Vertex AI and OpenAI Responses API return executed code (2/5 provider implementations, but OpenAI has two different APIs)

Presumably we'd just store the code as DataContent (with an appropriate mime type for language) in GeneratedContents? How would we handle streaming? I believe the code itself can be streamed as it's generated... would we have distinct DataContent objects for each part, or would we hold onto that content until we can reassemble it, ala function call arguments?

@rogerbarreto
Copy link
Contributor

Could error information not be conveyed using ErrorContent in the AIContent collection?

I think this is a good point, StdErr can use the ErrorContent when available in the list instead of string or it could be a standalone ErrorContent? Error property.

And could textual output not just be TextContent in the AIContent collection, along with any other generated output? Then a Text property could just do what ChatResponse/ChatMessage/etc. do, which is have Text just concat any TextContent in their collection.
Is that sufficient / sufficiently flexible?

IMHO, too much flexible, we definitely could, my initial thought was that, since we have a dedicated type that allow us to make clear distinction between what is a CodeExecution data from what are CodeArtifacs.

This extra flexible approach (using just GeneratedArtifacs) can be problematic if you want to iterate in the contents and need to know what is a code execution block from what is a file/artifact generated information which can happen more often in streaming handling code.

As an alternative we may have try TextContent? Output { get; } for this, thoughts?

I believe the code itself can be streamed as it's generated... would we have distinct DataContent objects for each part, or would we hold onto that content until we can reassemble it, ala function call arguments?

Ideally as far I see in the OpenAI Responses, we have the tool_id, which I probably missed in the design, we should consider having an identifier for the tool, this can be very useful in streaming multiple code executions.

@stephentoub
Copy link
Member Author

since we have a dedicated type that allow us to make clear distinction between what is a CodeExecution data from what are CodeArtifacs

I was imagining that would be TextContent vs everything else, e.g. there's a distinction between TextContent and DataContent(..., "text/plain"). TextContent would be the tool output, and everything else is a DataContent with an appropriate media type.

@stephentoub
Copy link
Member Author

@rogerbarreto, I've started looking at this more, and I'm seeing what look like discrepancies from the AI generated analysis you shared. OpenAI Assistants appears to provide not only the code that was generated but also any inputs generated to that code. Anthropic does provide the generated code. Can you double-check some of your analysis?

@rogerbarreto
Copy link
Contributor

Can you double-check some of your analysis?

I double checked, missed the details in the doc, they are quite hidden in the expandables. Provided an updated information.

@rogerbarreto
Copy link
Contributor

I was imagining that would be TextContent vs everything else, e.g. there's a distinction between TextContent and DataContent(..., "text/plain"). TextContent would be the tool output, and everything else is a DataContent with an appropriate media type.

We can start with this assumption, so any generated content is a DataContent, including plain text.

@rogerbarreto
Copy link
Contributor

rogerbarreto commented Jul 24, 2025

One thing to note after having to deal with the FileSearch is that we have now other hosted references like VectorSearch and File and for everything that is hosted the Id will be a commonality, given that we might consider having an abstract class HostedContent { string? Id { get; set; } }.

This adds 3 level hierarchy to the repo (FileHostedContent -> HostedContent -> AIContent), not sure how I feel about it though or we just use the HostedContent for both, for now. HostedContent : AIContent

Note: VectorSearch actually is a HostedService not a "Content", but we might consider the reference to it the "content", thoughts?

@stephentoub
Copy link
Member Author

we just use the HostedContent for both, for now. HostedContent : AIContent

How would an implementation distinguish whether an incoming ID was for a file or a store?

we might consider having an abstract class HostedContent { string? Id { get; set; } }.

In what situation would someone want to use the base type directly (if the only benefit of the base type is avoiding duplicating an Id property, it's not worth it, it needs to have some polymorphic use)?

VectorSearch actually is a HostedService not a "Content", but we might consider the reference to it the "content", thoughts?

I'm not understanding the distinction being made. VectorSearch as a tool wouldn't be an AIContent, it'd be an AITool. And VectorStore as content is because the content is the ID / reference / whatever data is being passed around.

@stephentoub
Copy link
Member Author

I double checked, missed the details in the doc, they are quite hidden in the expandables. Provided an updated information.

  1. OpenAI Code Interpreter (Assistants API)
    ❌ Does NOT return executed code

It does, actually. In the response where it provides "input" to the code execution tool, that's the code.

@rogerbarreto
Copy link
Contributor

It does, actually. In the response where it provides "input" to the code execution tool, that's the code.

It was not clear for me if this was the input message (Natural language) for generating the code or the code itself to be executed.

@rogerbarreto
Copy link
Contributor

How would an implementation distinguish whether an incoming ID was for a file or a store?

In what situation would someone want to use the base type directly (if the only benefit of the base type is avoiding duplicating an Id property, it's not worth it, it needs to have some polymorphic use)?

If you are the caller, when dealing with the HostedContent, you shouldn't what is a File or a Vector. But as you also made a good point, I don't think this is a common situation, most of the time you want to know directly the type, doesn't worth having that abstraction.

I'm not understanding the distinction being made. VectorSearch as a tool wouldn't be an AIContent, it'd be an AITool. And VectorStore as content is because the content is the ID / reference / whatever data is being passed around.

I mispelled "VectorSeach" for "VectorStore", agree, happy to go forward as VectorStore as content.

@stephentoub
Copy link
Member Author

@rogerbarreto, other than the design of the code interpreter content type, how are you feeling about everything else in this PR? Does the rest of it look good to you design-wise? I'm wondering if we should get everything else merged, and then work on how we want to represent output content from all of these tools.

@rogerbarreto
Copy link
Contributor

rogerbarreto commented Jul 31, 2025

@stephentoub I'm happy with the current state representing the inputs/definition of the tooling. Happy to progress into the output content in later additions

Sorry, not understanding, can you elaborate?

Thinking in a name for the contents that can be used both for input / output scenarios.
i.e: Inputs.Add(new CodeExecutionContent)

For example, a code I want to be executed server-side by the tool.

@stephentoub stephentoub changed the title Add HostedFileContent and HostedCodeInterpreterTool.Inputs Add HostedFile/VectorStoreContent, HostedFileSearchTool, and HostedCodeInterpreterTool.Inputs Aug 1, 2025
@stephentoub stephentoub marked this pull request as ready for review August 1, 2025 02:35
Copilot AI review requested due to automatic review settings August 1, 2025 02:35
@stephentoub stephentoub requested a review from a team as a code owner August 1, 2025 02:35

This comment was marked as outdated.

@stephentoub
Copy link
Member Author

I've removed the CodeInterpreterResultContent from this PR, though I left some of the support that will be necessary to add it back (e.g. changes in the coalescing code). Ready for review.

@stephentoub stephentoub requested a review from Copilot August 1, 2025 02:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for hosted file and vector store content types, along with new tools for file search and enhanced code interpreter functionality. It introduces new content types for representing files and vector stores that are hosted by AI services, as well as extending existing tools with input capabilities.

  • Adds HostedFileContent and HostedVectorStoreContent classes for representing AI service-hosted resources
  • Introduces HostedFileSearchTool for file search operations with configurable inputs and result limits
  • Extends HostedCodeInterpreterTool with an Inputs property to support file attachments

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
Contents/HostedFileContent.cs New content type for AI service-hosted files referenced by ID
Contents/HostedVectorStoreContent.cs New content type for AI service-hosted vector stores referenced by ID
HostedFileSearchTool.cs New tool for file search operations with input files and result count limits
HostedCodeInterpreterTool.cs Added Inputs property to support file attachments for code execution
OpenAIAssistantsChatClient.cs Implementation of new hosted tools for OpenAI Assistants API
OpenAIResponsesChatClient.cs Implementation support for new content types in OpenAI Responses API
OpenAIChatClient.cs Content conversion support for hosted file content
ChatResponseExtensions.cs Enhanced text content coalescing algorithm
Test files Comprehensive unit tests for new classes and integration tests

@stephentoub stephentoub force-pushed the hostedfilecontent branch 2 times, most recently from 2c8f92e to f1f6afa Compare August 7, 2025 01:14
Copy link

@eavanvalkenburg eavanvalkenburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit

@stephentoub stephentoub enabled auto-merge (squash) August 9, 2025 17:37
@stephentoub stephentoub merged commit ed336d1 into dotnet:main Aug 9, 2025
6 checks passed
stephentoub added a commit to stephentoub/extensions that referenced this pull request Aug 10, 2025
…deInterpreterTool.Inputs (dotnet#6620)

* Add HostedFileContent, HostedVectorStoreContent, HostedFileSearchTool, and HostedCodeInterpreterTool.Inputs
joperezr pushed a commit that referenced this pull request Aug 11, 2025
* Update Azure.AI.OpenAI test dependency to 2.3.0-beta.1 (#6698)

* Bring back per library CHANGELOGS for M.E.AI (#6697)

* Revert "Delete M.E.AI changelog files (#6467)"

This reverts commit 2ab21ec.

* Bring back per library CHANGELOGS for M.E.AI

By popular demand.

* Fix typos

* Add HostedFile/VectorStoreContent, HostedFileSearchTool, and HostedCodeInterpreterTool.Inputs (#6620)

* Add HostedFileContent, HostedVectorStoreContent, HostedFileSearchTool, and HostedCodeInterpreterTool.Inputs
@github-actions github-actions bot locked and limited conversation to collaborators Sep 9, 2025
@stephentoub stephentoub deleted the hostedfilecontent branch December 12, 2025 23:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-ai Microsoft.Extensions.AI libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: Add Files resource to HostedCodeInterpreterTool abstraction.

5 participants