Skip to content

Add multimodal uri, file, and blob parts to GenAI JSON Schemas#2754

Merged
lmolkova merged 20 commits intoopen-telemetry:mainfrom
aabmass:genai-multimodal-1556
Oct 29, 2025
Merged

Add multimodal uri, file, and blob parts to GenAI JSON Schemas#2754
lmolkova merged 20 commits intoopen-telemetry:mainfrom
aabmass:genai-multimodal-1556

Conversation

@aabmass
Copy link
Member

@aabmass aabmass commented Sep 9, 2025

Fixes #1556

Changes

Added two new types to the MessagePart union for capturing multimodal prompt/response data:

  • BlobPart which contains inline base64.
  • FileDataPart which contains a URI referencing data.

Also updated the ipynb to directly write to the JSON Schemas for simpler updating.

Prototypes

Merge requirement checklist

  • CONTRIBUTING.md guidelines followed.
  • Change log entry added, according to the guidelines in When to add a changelog entry.
    • If your PR does not need a change log, start the PR title with [chore]
  • Links to the prototypes or existing instrumentations (when adding or changing conventions)

@github-actions github-actions bot added the enhancement New feature or request label Sep 9, 2025
@aabmass aabmass marked this pull request as ready for review September 9, 2025 04:44
@aabmass aabmass requested review from a team as code owners September 9, 2025 04:44
@alexmojaki
Copy link
Contributor

Noting the questions I asked in #1556 (comment), especially:

  • Do we distinguish between inline data and URLs? Note that e.g. OpenAI's image_url field doesn't, it just uses a data: URL for inline data. This seems pretty simple and convenient. Maybe an additional boolean to indicate which it is? Or people could just check if the URL starts with data:. Alternatively, we have two different types, e.g. media_inline and media_url, or 2 per media type e.g. image_inline etc.

You've chosen two types:

class BlobPart(BaseModel):
    type: Literal["blob"] = Field(description="The type of the content captured in this part.")
    mime_type: str = Field(description="The IANA MIME type of the attached data.")
    data: bytes = Field(description="base64 encoded bytes of the attached data.")

    class Config:
        extra = "allow"

class FileDataPart(BaseModel):
    type: Literal["file_data"] = Field(description="The type of the content captured in this part.")
    mime_type: str = Field(description="The IANA MIME type of the attached data.")
    file_uri: str = Field(description="A URI referencing to reference attached data. Should be recorded without modification, as it was sent to the model.")

    class Config:
        extra = "allow"

The only difference between these part types is the data vs file_uri. What do you think is the advantage of having an extra type for this instead of using data URIs containing base64?

@aabmass
Copy link
Member Author

aabmass commented Sep 9, 2025

What do you think is the advantage of having an extra type for this instead of using data URIs containing base64?

Hadn't considered this before, mainly because I'm more familiar with the Gemini format that inspired this. But a few thoughts

  • Having the type as data: bytes would encode to OTLP protobuf as AnyValue.bytes_value, so you don't have to pay base64 size overhead. I think this would apply to many storage formats like parquet vs requiring base64 URI in a string.

    The way I wrote the description requires base64, so I would need to tweak it.

  • Using data URLs for raw bytes does feel a little arbitrary because you could make the same argument for text: data:text/plain,This%20is%20a%20plain%20text%20string. Ofc, the escaping is ugly and takes extra space.

@alexmojaki
Copy link
Contributor

I see bytes in the protobuf, but not the spec, although byte arrays are allowed: https://opentelemetry.io/docs/specs/otel/logs/data-model/#type-any

@aabmass
Copy link
Member Author

aabmass commented Sep 9, 2025

Is bytes not a byte array? This applies to attributes too https://opentelemetry.io/docs/specs/otel/common/attribute-type-mapping/#byte-sequences.

@alexmojaki
Copy link
Contributor

I guess that must be right and I misinterpreted.

Do we have any precedent/guidance for bytes that can end up inside a complex attribute on a span? Right now it has to convert to JSON, which means base64 encoding, and then to the backend the field will just look like a string. The backend then has to know that it should decode that based on semconv if it wants to end up with the same thing as if it had received the attribute in a log body using protobuf instead of JSON.

@alexmojaki
Copy link
Contributor

cc @lmolkova for the above question

@lmolkova
Copy link
Member

I guess that must be right and I misinterpreted.

Do we have any precedent/guidance for bytes that can end up inside a complex attribute on a span? Right now it has to convert to JSON, which means base64 encoding, and then to the backend the field will just look like a string. The backend then has to know that it should decode that based on semconv if it wants to end up with the same thing as if it had received the attribute in a log body using protobuf instead of JSON.

It's temporary that we serialize to json string, byte array is a good long-term solution and would work nicely with spec changes that are in-flight - open-telemetry/opentelemetry-specification#4651, specifically

https://github.com/open-telemetry/opentelemetry-specification/blob/4f92d3dce6224797045da43e680a1a3dcebeac89/specification/common/README.md?plain=1#L38

@aabmass
Copy link
Member Author

aabmass commented Sep 10, 2025

One other thing to note, we have this code for non-complex attributes which attempts to interpret python bytes as unicode and drops the attribute if not. I think this predates bytes attributes in OTLP. I think we can remove it independently of open-telemetry/opentelemetry-specification#4651

aabmass and others added 2 commits September 10, 2025 18:57
Also updated the ipynb to directly write to JSON Schemas to make it easier to update things. This might be easier to convert to a script though and would be easy to add to the Makefile
Co-authored-by: Liudmila Molkova <neskazu@gmail.com>
@aabmass aabmass force-pushed the genai-multimodal-1556 branch from c8311e6 to 5075167 Compare September 10, 2025 18:57
@alexmojaki
Copy link
Contributor

alexmojaki commented Sep 11, 2025

How should this be instrumented? https://platform.openai.com/docs/guides/images-vision?api-mode=responses&format=base64-encoded#analyze-images

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "role": "user",
            "content": [
                { "type": "input_text", "text": "what's in this image?" },
                {
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                },
            ],
        }
    ],
)

A FileDataPart with image_url kept as is, or a BlobPart where the data URL is decoded?

@aabmass
Copy link
Member Author

aabmass commented Sep 11, 2025

My preference would be to generate a Blob for the reason above about bytes being serialized more efficiently. I appreciate the ambiguity though.

@alexmojaki do you have a strong preference for using data URLs?

@alexmojaki
Copy link
Contributor

Not strongly, but it feels like something should be stated in the PR.

@lmolkova lmolkova moved this from Untriaged to Needs More Approval in Semantic Conventions Triage Sep 15, 2025
@lmolkova lmolkova moved this from Needs More Approval to Awaiting codeowners approval in Semantic Conventions Triage Sep 22, 2025
- `FileData.file_uri` -> `FileData.uri`
- `Blob.data` -> `Blob.content`
- `mime_type` fields optional
- added `file_id` field
Copy link
Contributor

@alexmojaki alexmojaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going offline now until the 20th, leaving my approval to unblock this PR technically but please get approval from @Kludex in my place if not another approver's.

@github-project-automation github-project-automation bot moved this from Awaiting codeowners approval to Needs More Approval in Semantic Conventions Triage Oct 9, 2025
@aabmass aabmass changed the title Extend gen_ai content JSON Schemas with multimodal blob and file_data Add multimodal uri, file, and blob parts to GenAI JSON Schemas Oct 14, 2025
@aabmass aabmass requested a review from Kludex October 14, 2025 03:17
@lmolkova lmolkova self-requested a review October 14, 2025 16:56
Copy link
Member

@Cirilla-zmh Cirilla-zmh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aabmass
It's what we really need. 👍
However, what's the meaning of 'file id' of 'FilePart'? Is it related to the 'file id' in openai and the 'video id' in openai?

@alexmojaki alexmojaki self-requested a review October 20, 2025 13:24
Copy link
Member

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great, with some final concerns on bytes / base64 string for content

Co-authored-by: Liudmila Molkova <neskazu@gmail.com>
Co-authored-by: Liudmila Molkova <neskazu@gmail.com>
@lmolkova lmolkova added this pull request to the merge queue Oct 29, 2025
Merged via the queue into open-telemetry:main with commit b2a742e Oct 29, 2025
16 checks passed
@aabmass aabmass deleted the genai-multimodal-1556 branch November 3, 2025 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:gen-ai enhancement New feature or request

Development

Successfully merging this pull request may close these issues.

Support for Multi modal inputs and generations

7 participants