Skip to content

Allow configurable embedding model name + custom base URL for self-hosted embeddings #578

@vkrmch

Description

@vkrmch

Summary

Honcho's embedding client (src/embedding_client.py) hardcodes both the model name and, for two of three providers, the base URL. That makes it impossible to use a self-hosted embedding endpoint (e.g. Ollama with bge-m3, llama.cpp, TEI, Infinity) without forking the repo or deploying a translation proxy.

Current state

Three providers supported:

Provider Model (hardcoded) Base URL
openai text-embedding-3-small api.openai.com (no override)
gemini gemini-embedding-001 Google (no override)
openrouter openai/text-embedding-3-small honors LLM_OPENAI_COMPATIBLE_BASE_URL

The openrouter path is the closest to self-hosted-friendly because it accepts a custom base URL, but it still sends a fixed model string (openai/text-embedding-3-small) that a local embedder typically doesn't recognize.

Why this matters

  1. Sovereignty / cost — operators running Honcho alongside their own LLM stack (e.g. Ollama on a Jetson) want to keep embedding workloads local for both cost and data-residency reasons.
  2. Model flexibility — some domains need specific embedders (multilingual, code, long-context). text-embedding-3-small and gemini-embedding-001 are reasonable defaults but not universally optimal.
  3. Consistency with chat provider config — Honcho already lets users set DERIVER_PROVIDER, DERIVER_MODEL, DIALECTIC_PROVIDER, DIALECTIC_MODEL, DREAM_PROVIDER, DREAM_MODEL, including a custom provider with LLM_OPENAI_COMPATIBLE_BASE_URL for chat. Embeddings are the odd one out.

Proposed

Add these env vars (matching the chat-side naming convention):

  • LLM_EMBEDDING_MODEL — free-form model string; default to current hardcoded value per provider
  • LLM_EMBEDDING_BASE_URL — custom base URL override; applies regardless of provider when set
  • Accept custom as a new EMBEDDING_PROVIDER value (mirrors the chat-side custom provider) that uses LLM_EMBEDDING_BASE_URL + LLM_EMBEDDING_MODEL + reuses LLM_OPENAI_COMPATIBLE_API_KEY for auth.

Minimal code change in _EmbeddingClient.__init__ — thread api_key, base_url, model through the existing provider branches rather than hardcoding them.

Workaround today

Operators are left writing a proxy service that accepts the OpenAI embeddings shape, rewrites the hardcoded model name, and forwards to their embedder. Happy to contribute a PR if the design above is acceptable.

Environment

Honcho 3.0.6 (image ghcr.io/plastic-labs/honcho:latest) — self-hosted in K3s behind an existing LLM_OPENAI_COMPATIBLE_BASE_URL=https://api.x.ai/v1 for chat providers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions