Summary
Honcho's embedding client (src/embedding_client.py) hardcodes both the model name and, for two of three providers, the base URL. That makes it impossible to use a self-hosted embedding endpoint (e.g. Ollama with bge-m3, llama.cpp, TEI, Infinity) without forking the repo or deploying a translation proxy.
Current state
Three providers supported:
| Provider |
Model (hardcoded) |
Base URL |
openai |
text-embedding-3-small |
api.openai.com (no override) |
gemini |
gemini-embedding-001 |
Google (no override) |
openrouter |
openai/text-embedding-3-small |
honors LLM_OPENAI_COMPATIBLE_BASE_URL |
The openrouter path is the closest to self-hosted-friendly because it accepts a custom base URL, but it still sends a fixed model string (openai/text-embedding-3-small) that a local embedder typically doesn't recognize.
Why this matters
- Sovereignty / cost — operators running Honcho alongside their own LLM stack (e.g. Ollama on a Jetson) want to keep embedding workloads local for both cost and data-residency reasons.
- Model flexibility — some domains need specific embedders (multilingual, code, long-context).
text-embedding-3-small and gemini-embedding-001 are reasonable defaults but not universally optimal.
- Consistency with chat provider config — Honcho already lets users set
DERIVER_PROVIDER, DERIVER_MODEL, DIALECTIC_PROVIDER, DIALECTIC_MODEL, DREAM_PROVIDER, DREAM_MODEL, including a custom provider with LLM_OPENAI_COMPATIBLE_BASE_URL for chat. Embeddings are the odd one out.
Proposed
Add these env vars (matching the chat-side naming convention):
LLM_EMBEDDING_MODEL — free-form model string; default to current hardcoded value per provider
LLM_EMBEDDING_BASE_URL — custom base URL override; applies regardless of provider when set
- Accept
custom as a new EMBEDDING_PROVIDER value (mirrors the chat-side custom provider) that uses LLM_EMBEDDING_BASE_URL + LLM_EMBEDDING_MODEL + reuses LLM_OPENAI_COMPATIBLE_API_KEY for auth.
Minimal code change in _EmbeddingClient.__init__ — thread api_key, base_url, model through the existing provider branches rather than hardcoding them.
Workaround today
Operators are left writing a proxy service that accepts the OpenAI embeddings shape, rewrites the hardcoded model name, and forwards to their embedder. Happy to contribute a PR if the design above is acceptable.
Environment
Honcho 3.0.6 (image ghcr.io/plastic-labs/honcho:latest) — self-hosted in K3s behind an existing LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.x.ai/v1 for chat providers.
Summary
Honcho's embedding client (
src/embedding_client.py) hardcodes both the model name and, for two of three providers, the base URL. That makes it impossible to use a self-hosted embedding endpoint (e.g. Ollama withbge-m3, llama.cpp, TEI, Infinity) without forking the repo or deploying a translation proxy.Current state
Three providers supported:
openaitext-embedding-3-smallapi.openai.com(no override)geminigemini-embedding-001openrouteropenai/text-embedding-3-smallLLM_OPENAI_COMPATIBLE_BASE_URLThe
openrouterpath is the closest to self-hosted-friendly because it accepts a custom base URL, but it still sends a fixed model string (openai/text-embedding-3-small) that a local embedder typically doesn't recognize.Why this matters
text-embedding-3-smallandgemini-embedding-001are reasonable defaults but not universally optimal.DERIVER_PROVIDER,DERIVER_MODEL,DIALECTIC_PROVIDER,DIALECTIC_MODEL,DREAM_PROVIDER,DREAM_MODEL, including acustomprovider withLLM_OPENAI_COMPATIBLE_BASE_URLfor chat. Embeddings are the odd one out.Proposed
Add these env vars (matching the chat-side naming convention):
LLM_EMBEDDING_MODEL— free-form model string; default to current hardcoded value per providerLLM_EMBEDDING_BASE_URL— custom base URL override; applies regardless of provider when setcustomas a newEMBEDDING_PROVIDERvalue (mirrors the chat-sidecustomprovider) that usesLLM_EMBEDDING_BASE_URL+LLM_EMBEDDING_MODEL+ reusesLLM_OPENAI_COMPATIBLE_API_KEYfor auth.Minimal code change in
_EmbeddingClient.__init__— threadapi_key,base_url,modelthrough the existing provider branches rather than hardcoding them.Workaround today
Operators are left writing a proxy service that accepts the OpenAI embeddings shape, rewrites the hardcoded model name, and forwards to their embedder. Happy to contribute a PR if the design above is acceptable.
Environment
Honcho 3.0.6 (image
ghcr.io/plastic-labs/honcho:latest) — self-hosted in K3s behind an existingLLM_OPENAI_COMPATIBLE_BASE_URL=https://api.x.ai/v1for chat providers.