Skip to content

support custom OpenAI-compatible embedding servers and other models#516

Closed
vazir wants to merge 1 commit into
garrytan:masterfrom
vazir:feat/env-overridable-embedding-model
Closed

support custom OpenAI-compatible embedding servers and other models#516
vazir wants to merge 1 commit into
garrytan:masterfrom
vazir:feat/env-overridable-embedding-model

Conversation

@vazir

@vazir vazir commented Apr 29, 2026

Copy link
Copy Markdown

Currently embedding.ts hardcodes text-embedding-3-large at 1536 dims.
This patch makes the model and dimensions overridable via env vars, plus
adds a workaround for self-hosted endpoints that don't accept OpenAI's
dimensions request param.

The OpenAI SDK already reads OPENAI_BASE_URL and OPENAI_API_KEY from env,
so pointing gbrain at a self-hosted server was already half-possible.
The remaining gaps were the hardcoded model + dims.

New env vars added by this patch (all optional):

  • GBRAIN_EMBEDDING_MODEL — override the model name
  • GBRAIN_EMBEDDING_DIMENSIONS — override the target dim (default 1536)
  • GBRAIN_EMBEDDING_OMIT_DIMENSIONS — don't send the dimensions param

Existing env vars used by the OpenAI SDK (already supported, listed for
completeness so a self-hosted user has the full set in one place):

  • OPENAI_BASE_URL — point at your self-hosted endpoint
  • OPENAI_API_KEY — required non-empty by the SDK; any string works if your server has no auth

If the server returns more dims than configured (e.g. a Matryoshka-trained
model where the param is omitted or ignored), we slice the prefix and
L2-renormalize. That keeps cosine retrieval quality on MRL models.

Tested with vLLM 0.20 + Qwen/Qwen3-Embedding-4B (native 2560-dim)
truncated to 1536. Full re-index of 4098 chunks across 736 pages,
gbrain doctor reports 100% coverage. Default behavior unchanged for
users on OpenAI.


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

… truncation

Make embedding.ts backend-agnostic so gbrain can run against any
OpenAI-compatible embedding endpoint (vLLM, sentence-transformers, etc.),
not just OpenAI's text-embedding-3-large@1536.

Three new optional env vars (defaults preserve current behavior):
  - GBRAIN_EMBEDDING_MODEL          - model name override
  - GBRAIN_EMBEDDING_DIMENSIONS     - schema dimension override
  - GBRAIN_EMBEDDING_OMIT_DIMENSIONS - skip the OpenAI dimensions request
                                      param for servers that reject it
                                      (e.g. non-Matryoshka models)

Adds client-side dimension truncation + L2 renormalization for
Matryoshka-trained models that return more dims than configured.
Slicing the first N dims of an MRL embedding preserves cosine retrieval
quality (Qwen3-Embedding, OpenAI text-embedding-3, etc.).

Tested against vLLM 0.20 + Qwen/Qwen3-Embedding-4B (native 2560-dim,
truncated to 1536 to match existing schema): 4098 chunks across
736 pages, 100% coverage in gbrain doctor.
@garrytan

garrytan commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Thanks for this contribution — and apologies for the slow triage. We did a full pass over the entire PR backlog. gbrain has moved fast, and the maintainer's larger "cathedral" rewrites have superseded a big share of community PRs: the AI gateway + recipes + user_provided_models system replaced almost all individual provider PRs; #1805 fixed the whole Postgres module-singleton class; #1542 unified the type taxonomy; #1657 the retrieval path; #1802 the doctor; and so on.

We're closing this one in that cleanup — either the fix already landed on master, it duplicates another PR or merged change, or it's outside the current merge bar. Where a closed PR carried a genuinely valuable idea, we've recorded it in docs/designs/COMMUNITY_IDEAS.md so nothing good is lost (a few may graduate into TODOs).

Please don't read the close as a judgment of the work — thank you for contributing. If you believe the underlying issue is still live on the latest master, reopen with a quick note and we'll take another look. 🙏

@garrytan garrytan closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants