Skip to content

Bug: VECTOR_STORE_DIMENSIONS not respected in models + Feature: separate custom embedding provider #564

@hxcn

Description

@hxcn

Bug: VECTOR_STORE_DIMENSIONS setting has no effect

The config option VECTOR_STORE_DIMENSIONS (default: 1536) exists in src/config.py but is never used in src/models.py. Both MessageEmbedding.embedding and Document.embedding are hardcoded to Vector(1536), which means changing the setting has no effect on the actual database column or SQLAlchemy model.

Affected files:

  • src/models.py line 281: mapped_column(Vector(1536), nullable=True)
  • src/models.py line 389: mapped_column(Vector(1536), nullable=True)

Suggested fix:

Add from src.config import settings to src/models.py and replace the hardcoded values:

# Before
embedding: MappedColumn[Any] = mapped_column(Vector(1536), nullable=True)

# After
embedding: MappedColumn[Any] = mapped_column(Vector(settings.VECTOR_STORE.DIMENSIONS), nullable=True)

Feature Request: Separate custom embedding provider endpoint

Currently, LLM_EMBEDDING_PROVIDER=openrouter reuses LLM_OPENAI_COMPATIBLE_BASE_URL for both LLM inference and embeddings. This makes it impossible to use a self-hosted embedding service independently from the LLM inference endpoint.

Use case: Running a local embedding model (e.g. llama.cpp, Ollama) on a separate server while using a different endpoint for LLM inference.

Suggested addition to LLMSettings in src/config.py:

EMBEDDING_PROVIDER: Literal["openai", "gemini", "openrouter", "custom"] = "openai"
EMBEDDING_COMPATIBLE_BASE_URL: str | None = None
EMBEDDING_COMPATIBLE_API_KEY: str | None = None
EMBEDDING_MODEL: str | None = None

And a new branch in src/embedding_client.py:

elif self.provider == "custom":
    if api_key is None:
        api_key = settings.LLM.EMBEDDING_COMPATIBLE_API_KEY
    if not api_key:
        raise ValueError("LLM_EMBEDDING_COMPATIBLE_API_KEY is required")
    base_url = settings.LLM.EMBEDDING_COMPATIBLE_BASE_URL
    if not base_url:
        raise ValueError("LLM_EMBEDDING_COMPATIBLE_BASE_URL is required")
    self.client = AsyncOpenAI(api_key=api_key, base_url=base_url)
    self.model = settings.LLM.EMBEDDING_MODEL or "text-embedding-3-small"
    self.max_embedding_tokens = settings.MAX_EMBEDDING_TOKENS
    self.max_batch_size = 2048

Example .env usage:

LLM_EMBEDDING_PROVIDER=custom
LLM_EMBEDDING_COMPATIBLE_BASE_URL=http://my-embedding-server:8080/v1
LLM_EMBEDDING_COMPATIBLE_API_KEY=my-key
LLM_EMBEDDING_MODEL=my-embed-model
VECTOR_STORE_DIMENSIONS=1024

This allows full decoupling of embedding and inference providers, which is especially useful for self-hosted deployments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions