Skip to content

embeddings: base64 encoding fix #12715

Merged
npardal merged 3 commits intomainfrom
nicole/embeddings
Oct 22, 2025
Merged

embeddings: base64 encoding fix #12715
npardal merged 3 commits intomainfrom
nicole/embeddings

Conversation

@npardal
Copy link
Contributor

@npardal npardal commented Oct 21, 2025

WHAT
added support for encoding_format param in OpenAI-compatible /v1/embeddings endpoint, allows clients to request embeddings in either float (JSON array) or base64 (base64-encoded string) format.

WHY
The /v1/embeddings endpoint was not respecting the encoding_format parameter that OpenAI clients send in requests. This caused compatibility issues with:

  • C# clients (Azure.AI.OpenAI SDK) - which default to requesting base64 format and would crash with "Invalid base64 string" errors

CHANGES

  • Added EncodingFormat field to openai.EmbedRequest struct to accept the parameter from clients
  • Modified Embedding struct to use any type for the Embedding field, allowing it to hold either []float32 or string
  • Updated ToEmbeddingList() function to accept encodingFormat parameter and convert embeddings to JSON or base-64 encoded string
  • Updated middleware to pass encoding_format from request through to response transformation

@npardal npardal changed the title [not ready] embeddings: base64 encoding fix embeddings: base64 encoding fix Oct 21, 2025
@npardal npardal requested a review from jmorganca October 21, 2025 17:45
@npardal npardal marked this pull request as ready for review October 21, 2025 17:45
if inputArray, ok := req.Input.([]any); ok {
if len(inputArray) > 0 {
if _, isNestedArray := inputArray[0].([]any); isNestedArray {
c.AbortWithStatusJSON(http.StatusBadRequest, openai.NewError(http.StatusBadRequest, "Tokenized input not supported. Please send text input instead of token IDs."))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check that this error is similar to the one OpenAI sends, it seems close!

Copy link
Contributor Author

@npardal npardal Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm nevermind - openAI supports tokenized input (however we do not). I think this directly relates to this issue?

image

I removed the validation but lmk if you prefer keeping it until the fix is implemented!

@npardal npardal marked this pull request as draft October 21, 2025 18:48
@npardal npardal marked this pull request as ready for review October 21, 2025 21:25
@npardal npardal requested review from jmorganca and mxyng October 21, 2025 21:25
Copy link
Member

@jmorganca jmorganca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@npardal npardal marked this pull request as draft October 21, 2025 21:28
@npardal npardal marked this pull request as ready for review October 21, 2025 21:32
@npardal npardal merged commit e0ead1a into main Oct 22, 2025
8 checks passed
@npardal npardal deleted the nicole/embeddings branch October 22, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants