support custom OpenAI-compatible embedding servers and other models#516
support custom OpenAI-compatible embedding servers and other models#516vazir wants to merge 1 commit into
Conversation
… truncation
Make embedding.ts backend-agnostic so gbrain can run against any
OpenAI-compatible embedding endpoint (vLLM, sentence-transformers, etc.),
not just OpenAI's text-embedding-3-large@1536.
Three new optional env vars (defaults preserve current behavior):
- GBRAIN_EMBEDDING_MODEL - model name override
- GBRAIN_EMBEDDING_DIMENSIONS - schema dimension override
- GBRAIN_EMBEDDING_OMIT_DIMENSIONS - skip the OpenAI dimensions request
param for servers that reject it
(e.g. non-Matryoshka models)
Adds client-side dimension truncation + L2 renormalization for
Matryoshka-trained models that return more dims than configured.
Slicing the first N dims of an MRL embedding preserves cosine retrieval
quality (Qwen3-Embedding, OpenAI text-embedding-3, etc.).
Tested against vLLM 0.20 + Qwen/Qwen3-Embedding-4B (native 2560-dim,
truncated to 1536 to match existing schema): 4098 chunks across
736 pages, 100% coverage in gbrain doctor.
|
Thanks for this contribution — and apologies for the slow triage. We did a full pass over the entire PR backlog. gbrain has moved fast, and the maintainer's larger "cathedral" rewrites have superseded a big share of community PRs: the AI gateway + recipes + user_provided_models system replaced almost all individual provider PRs; #1805 fixed the whole Postgres module-singleton class; #1542 unified the type taxonomy; #1657 the retrieval path; #1802 the doctor; and so on. We're closing this one in that cleanup — either the fix already landed on master, it duplicates another PR or merged change, or it's outside the current merge bar. Where a closed PR carried a genuinely valuable idea, we've recorded it in docs/designs/COMMUNITY_IDEAS.md so nothing good is lost (a few may graduate into TODOs). Please don't read the close as a judgment of the work — thank you for contributing. If you believe the underlying issue is still live on the latest master, reopen with a quick note and we'll take another look. 🙏 |
Currently embedding.ts hardcodes text-embedding-3-large at 1536 dims.
This patch makes the model and dimensions overridable via env vars, plus
adds a workaround for self-hosted endpoints that don't accept OpenAI's
dimensionsrequest param.The OpenAI SDK already reads OPENAI_BASE_URL and OPENAI_API_KEY from env,
so pointing gbrain at a self-hosted server was already half-possible.
The remaining gaps were the hardcoded model + dims.
New env vars added by this patch (all optional):
dimensionsparamExisting env vars used by the OpenAI SDK (already supported, listed for
completeness so a self-hosted user has the full set in one place):
If the server returns more dims than configured (e.g. a Matryoshka-trained
model where the param is omitted or ignored), we slice the prefix and
L2-renormalize. That keeps cosine retrieval quality on MRL models.
Tested with vLLM 0.20 + Qwen/Qwen3-Embedding-4B (native 2560-dim)
truncated to 1536. Full re-index of 4098 chunks across 736 pages,
gbrain doctor reports 100% coverage. Default behavior unchanged for
users on OpenAI.
Need help on this PR? Tag
@codesmithwith what you need.