Summary
In src/embedding_client.py, the Gemini transport forwards output_dimensionality=self.vector_dimensions to the provider, but the OpenAI transport never passes the corresponding dimensions parameter on embeddings.create(...). As a result, EMBEDDING_VECTOR_DIMENSIONS is validated against the response via _validate_embedding_dimensions, but never actually requested from the provider. Any OpenAI-compatible server whose native embedding size differs from the configured value fails validation, even when the underlying model fully supports MRL (Matryoshka) truncation.
Where (on main)
Three OpenAI call sites in src/embedding_client.py:
_EmbeddingClient.embed (single query)
_EmbeddingClient.simple_batch_embed
_EmbeddingClient._process_batch
Each invokes self.client.embeddings.create(model=..., input=...) with no dimensions= argument. The Gemini branch alongside each does pass config={"output_dimensionality": self.vector_dimensions}.
Related work
Motivating example
Running Qwen3-Embedding-4B (2560-dim native, MRL-trained) locally via oMLX with honcho's pgvector mode at EMBEDDING_VECTOR_DIMENSIONS=1536:
- Honcho calls
self.client.embeddings.create(model="qwen3-embedding-4b", input=[query]) — no dimensions.
- oMLX's OpenAI-compatible
/v1/embeddings endpoint returns 2560 floats (the model's native size).
_validate_embedding_dimensions raises: Embedding dimension mismatch ... Expected 1536, got 2560.
oMLX's endpoint already honors the OpenAI dimensions request parameter end-to-end (slices + L2 renormalizes server-side — correct for MRL-trained models), so if honcho passed dimensions=1536, it would receive a valid, correctly-renormalized 1536-dim vector and pgvector ingestion would work. The same applies to any MRL-capable model served by an OpenAI-compatible backend (Nomic v1.5, mxbai-embed-large-v1, OpenAI's own text-embedding-3-small/-large, etc.).
Related server-side work on oMLX: jundot/omlx#901 (adding a per-model default so even clients that don't send dimensions get the right size). That is a complementary fix on the server side; honcho is the right place to fix it on the client side regardless of which backend it talks to.
Proposal
Pass dimensions=self.vector_dimensions on the three OpenAI embeddings.create calls, mirroring the existing Gemini behavior.
Backward-compatibility note
OpenAI rejects the dimensions parameter for pre-text-embedding-3 models (notably text-embedding-ada-002 — the API returns a 400 if dimensions is supplied). To avoid regressing ada-002 users, the cleanest option is a config toggle:
EMBEDDING.SEND_DIMENSIONS: bool = True # opt-out for ada-002 or providers that reject the field
An alternative would be to condition on model-name heuristics (e.g., skip when the model starts with text-embedding-ada), but that gets fragile against OpenAI-compatible servers using custom model names. A config flag is clearer.
Scope
Happy to open a PR from a fork with the three-call-site change plus the SEND_DIMENSIONS flag and a docs update (.env.template, docs/v3/contributing/configuration.mdx) if the direction above works for you. Let me know if you'd prefer a different approach (e.g., always-send vs. flag-gated, or model-name heuristic) before I put the PR together.
Summary
In
src/embedding_client.py, the Gemini transport forwardsoutput_dimensionality=self.vector_dimensionsto the provider, but the OpenAI transport never passes the correspondingdimensionsparameter onembeddings.create(...). As a result,EMBEDDING_VECTOR_DIMENSIONSis validated against the response via_validate_embedding_dimensions, but never actually requested from the provider. Any OpenAI-compatible server whose native embedding size differs from the configured value fails validation, even when the underlying model fully supports MRL (Matryoshka) truncation.Where (on
main)Three OpenAI call sites in
src/embedding_client.py:_EmbeddingClient.embed(single query)_EmbeddingClient.simple_batch_embed_EmbeddingClient._process_batchEach invokes
self.client.embeddings.create(model=..., input=...)with nodimensions=argument. The Gemini branch alongside each does passconfig={"output_dimensionality": self.vector_dimensions}.Related work
output_dimensionalityread from configuration instead of hardcoded1536. That covered the Gemini branch and thepgvectorcolumn — not the OpenAI branch.VECTOR_STORE_DIMENSIONS=768but honcho still expects 1536), but it conflates model-name hardcoding, deployment concerns, and the dimensions issue; not a clean report.Motivating example
Running Qwen3-Embedding-4B (2560-dim native, MRL-trained) locally via oMLX with honcho's pgvector mode at
EMBEDDING_VECTOR_DIMENSIONS=1536:self.client.embeddings.create(model="qwen3-embedding-4b", input=[query])— nodimensions./v1/embeddingsendpoint returns 2560 floats (the model's native size)._validate_embedding_dimensionsraises:Embedding dimension mismatch ... Expected 1536, got 2560.oMLX's endpoint already honors the OpenAI
dimensionsrequest parameter end-to-end (slices + L2 renormalizes server-side — correct for MRL-trained models), so if honcho passeddimensions=1536, it would receive a valid, correctly-renormalized 1536-dim vector and pgvector ingestion would work. The same applies to any MRL-capable model served by an OpenAI-compatible backend (Nomic v1.5, mxbai-embed-large-v1, OpenAI's owntext-embedding-3-small/-large, etc.).Related server-side work on oMLX: jundot/omlx#901 (adding a per-model default so even clients that don't send
dimensionsget the right size). That is a complementary fix on the server side; honcho is the right place to fix it on the client side regardless of which backend it talks to.Proposal
Pass
dimensions=self.vector_dimensionson the three OpenAIembeddings.createcalls, mirroring the existing Gemini behavior.Backward-compatibility note
OpenAI rejects the
dimensionsparameter for pre-text-embedding-3models (notablytext-embedding-ada-002— the API returns a 400 ifdimensionsis supplied). To avoid regressing ada-002 users, the cleanest option is a config toggle:An alternative would be to condition on model-name heuristics (e.g., skip when the model starts with
text-embedding-ada), but that gets fragile against OpenAI-compatible servers using custom model names. A config flag is clearer.Scope
Happy to open a PR from a fork with the three-call-site change plus the
SEND_DIMENSIONSflag and a docs update (.env.template,docs/v3/contributing/configuration.mdx) if the direction above works for you. Let me know if you'd prefer a different approach (e.g., always-send vs. flag-gated, or model-name heuristic) before I put the PR together.