add MiniMax embeddings provider support by cacity · Pull Request #148 · garrytan/gbrain

cacity · 2026-04-16T05:03:35Z

Summary

add MiniMax embo-01 as an embeddings provider
make embeddings and hybrid search provider-aware while preserving OpenAI as the default
persist the actual embedding model on chunks and update init, doctor, migration, docs, and tests accordingly

Scope

embeddings only
Anthropic query expansion remains unchanged
no embedding-dimension migration is required because MiniMax embo-01 uses 1536-dimensional vectors

Test plan

bun test
bun run test:e2e
bun run src/cli.ts --help
bun run src/cli.ts version

McoreD · 2026-04-19T07:45:26Z

I tested this from a fresh user perspective and followed the documented MiniMax setup as implemented in the code:

MINIMAX_EMBEDDING_MODEL=embo-01
MINIMAX_BASE_URL=https://api.minimax.io/v1
MINIMAX_API_KEY=...
provider set to MiniMax

The code defaults in provider-config.ts appear to match those values, so on the surface the setup looks correct.

The problem is that, in practice, this still does not feel reliably usable end to end. doctor can recognize the provider configuration, but embedding execution is still failing or not completing cleanly in real-world validation. From a user point of view, that makes it hard to tell whether the issue is:

incorrect MiniMax API expectations,
a request/response format mismatch,
rate limiting or retry behavior,
or something else in the embed pipeline.

So I think the PR direction is good, but I would not call the MiniMax path production-safe yet without a verified end-to-end success case and maybe a tighter validation test around the actual API response shape.

Happy to share logs or exact behavior if useful. What am I doing wrong?

cacity · 2026-04-20T04:15:34Z

Added the MiniMax follow-up fixes on top of this PR:

pass MINIMAX_GROUP_ID through config/init
switch to the api.minimax.chat endpoint
include GroupId in embedding requests
update credential checks, migration persistence, docs, and tests

Validated with:

bun test
bun run test:e2e

Tier 2 skill E2E was skipped locally because ANTHROPIC_API_KEY was not set.

McoreD · 2026-04-20T09:26:07Z

Heads-up from downstream integration testing in KovaForge/gbrain.

I merged this PR onto our fork and then did a real deploy smoke test from a fresh clone. Findings:

There were two unrelated runtime regressions we had to fix downstream before basic deploy worked:
- src/commands/init.ts: isMigrateOnly is not defined
- src/core/embedding.ts: exponentialDelay is not defined
After fixing those, basic deploy path is healthy:
- gbrain init --pglite works
- put, get, and search work

MiniMax embeddings are still not working end-to-end against the real API.
We consistently get:

MiniMax embeddings failed: login fail: Please carry the API secret key in the 'Authorization' field of the request header

What we tried downstream:

Authorization: Bearer <key> with GroupId in query string
Authorization: <key> with GroupId in query string
base URL https://api.minimax.chat/v1
endpoint /embeddings

payload shape:

{
  "model": "embo-01",
  "texts": ["hello world"],
  "type": "db"
}

The auth error stayed the same, so the remaining issue looks like the exact MiniMax auth contract or endpoint contract rather than merge logic.

If you have a known-good curl example for the embeddings endpoint, especially the exact required auth header format and whether GroupId belongs in query params vs body vs headers, that would unblock downstream verification quickly.

cacity · 2026-04-20T23:55:59Z

Thanks — this is very helpful.

My understanding is:

the downstream smoke test is healthy after fixing the two runtime regressions
basic flows (gbrain init --pglite, put, get, search) are working
the remaining blocker is specifically the real MiniMax embeddings API contract, not the general merge path

The two runtime regressions you mentioned are also important:

isMigrateOnly is not defined
exponentialDelay is not defined

If you can share the exact downstream patch or the minimal diff for those two fixes, that would help me verify whether I missed something in the
branch state versus what was actually deployed downstream.

And for the MiniMax issue, agreed — at this point the next step is to verify the exact auth / endpoint contract with a known-good real request,
rather than assuming the current header/query format is correct.

If you have raw request/response logs or a failing curl repro, please share them — that would help narrow this down quickly.

McoreD · 2026-04-21T00:27:28Z

Thanks, that matches my understanding.

The exact downstream patch is in these fork commits:

merge of the MiniMax support into our fork master:
KovaForge@717d9f6
follow-up downstream fix after integration testing:
KovaForge@566d2aa

Context from the downstream smoke test:

gbrain init --pglite works
put, get, and search work
the remaining blocker is specifically real MiniMax embeddings end-to-end

The runtime regressions we had to fix downstream were:

isMigrateOnly is not defined
exponentialDelay is not defined

And the MiniMax failure we still reproduce is:

MiniMax embeddings failed: login fail: Please carry the API secret key in the 'Authorization' field of the request header

So at this point I agree the next useful step is not more guessing, it’s a known-good real request shape for the MiniMax embeddings API, especially:

exact Authorization header format
whether GroupId belongs in query params, headers, or body
exact endpoint/base URL expected for embeddings

I don’t yet have a known-good curl to share, but I do have a reproducible failing downstream path from a fresh deploy. If helpful, I can also paste the minimal local diff around the two runtime fixes directly into the thread.

cacity · 2026-04-21T01:09:17Z

Added another follow-up pass on the MiniMax path.

What changed in this round:

default gbrain embed --all to serial page work when the embedding provider is MiniMax
keep GBRAIN_EMBED_CONCURRENCY as an explicit override
add MiniMax-specific request spacing before embedding calls
add MiniMax-specific retry handling for rate-limit-style API responses
cover the new behavior with focused tests

Validation:

bun test test/embed.test.ts
bun test test/config.test.ts test/embed.test.ts

Current status:

the local codepath is now more conservative and better aligned with downstream integration feedback
however, the real MiniMax auth / endpoint contract issue is still unresolved upstream
a real API smoke test with a valid key still reproduces the auth failure, so the remaining blocker still appears to be MiniMax API behavior
rather than just local concurrency/retry logic

cacity · 2026-04-21T01:58:46Z

Merged the latest master into this branch and resolved the merge conflicts.

I kept the MiniMax changes on top of the current upstream branch state and re-ran the focused checks after resolving conflicts:

bun run src/cli.ts --help
bun test test/config.test.ts test/embed.test.ts test/doctor.test.ts

The PR diff is still the MiniMax-related change set relative to current master, not the full upstream merge.

11th recipe. embo-01 model, 1536 dims, $0.07/1M tokens. OpenAI-compatible at api.minimax.chat. MiniMax requires a `type: 'db' | 'query'` field for asymmetric retrieval (documents indexed with type='db', queries embedded with type='query'). gbrain has no query/document signal at the embed-call site today, so v1 defaults to type='db' for both indexing and retrieval — same vector space, symmetric similarity. Asymmetric query support is a follow-up TODO that needs the embed seam to thread query/document context. Plumbed via src/core/ai/dims.ts: dimsProviderOptions returns {openaiCompatible: {type: 'db'}} for modelId === 'embo-01'. Conservative max_batch_tokens=4096 declared (MiniMax docs don't publish the limit). Recursive halving in the gateway catches token-limit errors at runtime. Tests: bun test test/ai/ — 101/101 (6 new + 95 prior). Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 5 of 11). Reworked from #148. Co-Authored-By: cacity <20351699+cacity@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…17-PR cluster) (#810) * feat(ai/types): add resolveAuth + probe + user_provided_models fields Foundation commit for the embedding-provider fix-wave (5 API-key recipes + discoverability pass). Three optional additions to the recipe contract: - `EmbeddingTouchpoint.user_provided_models?: true` (D8=A): flag for recipes that ship without a fixed model list. Consumed by the contract test (permits empty `models[]`), gateway.ts:223 (replaces hardcoded `recipe.id === 'litellm'` check in a follow-up commit), and init.ts:resolveAIOptions (refuses implicit "first model" pick for shorthand `--model <provider>`). - `Recipe.resolveAuth?(env): {headerName, token}` (D12=A): unified auth seam across embed / expansion / chat. Default behavior (returns `Authorization: Bearer <env-key>`) covers the existing 9 recipes unchanged. Recipes deviating (Azure with `api-key:`; future OAuth providers) override this single seam instead of adding parallel mechanisms in 3 places. Codex review caught that auth was triplicated at gateway.ts:281/728/931; D12=A unifies all three in one follow-up commit. - `Recipe.probe?(): Promise<{ready, hint?}>` (D13=A): recipe-owned readiness check for local-server providers (ollama, llama-server). Replaces the hardcoded `recipe.id === 'ollama'` special case in providers.ts. Wrapped in 200ms timeout at the call sites. Pure type additions — no behavior change. Typecheck green; existing 9 recipes work unchanged because all three fields are optional. Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (decisions D8=A, D11=C, D12=A, D13=A). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ai/gateway): unify openai-compatible auth via Recipe.resolveAuth (D12=A) Pre-v0.32, openai-compatible auth was duplicated 3 times in gateway.ts at instantiateEmbedding, instantiateExpansion, instantiateChat — with subtle drift (embedding had a `${recipe.id.toUpperCase()}_API_KEY` fallback the other two lacked). Codex outside-voice review caught this during /plan-eng-review. D12=A: unify all three through `Recipe.resolveAuth?(env)` (declared in the prior commit). Two new module-level helpers: - `defaultResolveAuth(recipe, env, touchpoint)` — applied when a recipe doesn't declare its own resolver. Returns Authorization Bearer with `auth_env.required[0]`, falling back to the first present `auth_env.optional` env var, or 'unauthenticated' for no-auth recipes like Ollama. Throws AIConfigError with the recipe's setup_hint when required env is missing. - `applyResolveAuth(recipe, cfg, touchpoint)` — returns `createOpenAICompatible` options. Bearer-via-Authorization paths use the SDK's native `apiKey` field; custom-header paths (Azure: api-key) use `headers` and OMIT apiKey to avoid double-auth leaks. The 3 `case 'openai-compatible':` branches in instantiateEmbedding (line ~281), instantiateExpansion (line ~728), instantiateChat (line ~931) each collapse from ~10 lines of bespoke auth handling to a single `applyResolveAuth(recipe, cfg, '<touchpoint>')` call. Also: the litellm-template hardcode at gateway.ts:223 (`recipe.id === 'litellm'`) is replaced with a union check for `EmbeddingTouchpoint.user_provided_models === true` (D8=A wire-through per Codex finding #3). Pre-v0.32 builds keep working via back-compat `recipe.id === 'litellm'` clause; new recipes declaring user_provided_models pick up the same gating automatically. Existing 9 recipes (openai, anthropic, google, deepseek, groq, ollama, litellm-proxy, together, voyage) gain zero per-recipe edits — the default resolver covers their existing behavior. Behavior change for ollama expansion/chat only: now reads OLLAMA_API_KEY when set (pre-v0.32 silently passed 'unauthenticated' for those touchpoints; embedding already read it). Ollama servers ignore the header so no real-world impact; this aligns the 3 touchpoints. Tests: bun test test/ai/ — 77/77 pass. Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (D8=A, D12=A; addresses Codex findings #3, #4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ai): IRON RULE regression test for v0.32 resolveAuth refactor Pins the contract that the v0.32 D2/D12=A resolveAuth refactor preserves auth behavior for the 9 existing recipes (openai, anthropic, google, deepseek, groq, ollama, litellm-proxy, together, voyage). 10 cases covering: - the 9 expected recipe ids are still registered - every recipe with non-empty required[] returns Authorization Bearer <key> - missing required env throws AIConfigError naming recipe + touchpoint + env-var - Ollama (empty required, optional set) reads first present optional env - Ollama (no env) falls back to "Bearer unauthenticated" - all 3 touchpoints (embedding/expansion/chat) produce identical auth shape for the same recipe + env (this is the core regression: pre-v0.32, embedding had a fallback the other two lacked) - applyResolveAuth converts Authorization Bearer to {apiKey} (SDK-native) - applyResolveAuth respects a custom-header override (Azure preview; the recipe ships in commit 8) and emits {headers} WITHOUT apiKey to avoid double-auth - native-* recipes (openai, anthropic, google) intentionally have no resolveAuth declared (they use AI-SDK adapters directly) - all openai-compatible recipes ship without resolveAuth in v0.32 (default applies); the first override is Azure in commit 8 Also: export `defaultResolveAuth` and `applyResolveAuth` as @internal gateway helpers so tests can pin them directly. Mirrors the pattern of `splitByTokenBudget` and `isTokenLimitError` already exported with the same @internal annotation. Tests: bun test test/ai/ — 87/87 pass (10 new + 77 existing). Typecheck: clean. Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (IRON RULE per Section 3 test review). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ai): add llama-server recipe (#702 reworked) 10th recipe in the registry; first to ship Recipe.probe (D13=A) and the second user_provided_models recipe (litellm-proxy is the first). llama.cpp's llama-server exposes an OpenAI-compatible /v1/embeddings endpoint. Distinct from Ollama: different default port (8080), different model-management story (you launch it with --model <path>; the server serves whatever was passed). Recipe ships with `models: []`, `user_provided_models: true`, `default_dims: 0` so the wizard refuses implicit defaults and forces explicit --embedding-model + --embedding-dimensions. Added: - src/core/ai/recipes/llama-server.ts (61 lines) - probeLlamaServer() in src/core/ai/probes.ts; reads LLAMA_SERVER_BASE_URL with default http://localhost:8080/v1 - Registered in src/core/ai/recipes/index.ts (10 recipes total now) - test/ai/recipe-llama-server.test.ts (8 cases): registered + shape, user_provided_models flag, probe declared + reachability fail-with-hint, default-auth covering no-env / API_KEY / URL-shaped-only paths Hardening: defaultResolveAuth in gateway.ts now skips URL-shaped optional env entries (names ending in _URL or _BASE_URL) when picking a fallback auth token. Pre-fix, OLLAMA_BASE_URL=http://my-ollama would have become the Bearer token; Ollama ignores it but llama-server (and future local-server recipes) shouldn't depend on the server tolerating garbage auth. The regression test (recipes-existing-regression) gains one case pinning this contract. Per-recipe test file follows D7=B (per-recipe over DRY for readability). Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 4 of 11). Reworked from #702 because the original PR didn't model the recipe-owned probe pattern (D13=A) or user_provided_models (D8=A). Tests: bun test test/ai/ — 95/95 pass (8 new + 87 existing). Co-Authored-By: SiyaoZheng <noreply@github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ai): add MiniMax recipe (#148 reworked) 11th recipe. embo-01 model, 1536 dims, $0.07/1M tokens. OpenAI-compatible at api.minimax.chat. MiniMax requires a `type: 'db' | 'query'` field for asymmetric retrieval (documents indexed with type='db', queries embedded with type='query'). gbrain has no query/document signal at the embed-call site today, so v1 defaults to type='db' for both indexing and retrieval — same vector space, symmetric similarity. Asymmetric query support is a follow-up TODO that needs the embed seam to thread query/document context. Plumbed via src/core/ai/dims.ts: dimsProviderOptions returns {openaiCompatible: {type: 'db'}} for modelId === 'embo-01'. Conservative max_batch_tokens=4096 declared (MiniMax docs don't publish the limit). Recursive halving in the gateway catches token-limit errors at runtime. Tests: bun test test/ai/ — 101/101 (6 new + 95 prior). Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 5 of 11). Reworked from #148. Co-Authored-By: cacity <20351699+cacity@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ai): add Alibaba DashScope recipe (#59 split, part 1/2) 12th recipe. text-embedding-v3 (current) + text-embedding-v2; 1024 default dims with Matryoshka options [64, 128, 256, 512, 768, 1024]. OpenAI-compatible at dashscope-intl.aliyuncs.com. China-region users override via cfg.base_urls['dashscope']; v0.32 ships with the international default. Conservative max_batch_tokens=8192 + chars_per_token=2 declared because Alibaba doesn't publish a hard batch limit and text-embedding-v3 mixes English + CJK heavily (CJK density closer to Voyage than OpenAI tiktoken). Tests: bun test test/ai/ — 106/106 (5 new + 101 prior). Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 6 of 11). Reworked from #59 (DashScope+Zhipu split into 2 commits per the plan; Zhipu lands next). Co-Authored-By: Magicray1217 <267836857+Magicray1217@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ai): add Zhipu AI (BigModel) recipe (#59 split, part 2/2) 13th recipe. embedding-3 (current) + embedding-2; 1024 default dims with Matryoshka options [256, 512, 1024, 2048]. OpenAI-compatible at open.bigmodel.cn. embedding-3 at 2048 dims exceeds pgvector's HNSW cap of 2000 — those brains fall back to exact vector scans via the existing chunkEmbeddingIndexSql policy at src/core/vector-index.ts. Default stays at 1024 (HNSW-fast); users who want maximum fidelity opt into 2048 via --embedding-dimensions and accept the slower retrieval. Tests pin the HNSW boundary: 1024 returns the index SQL, 2048 returns the skip-index/exact-scan SQL. Tests: bun test test/ai/ — 112/112 (6 new + 106 prior). Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 7 of 11). Reworked from #59. Together with DashScope (commit 6), closes the China-region embedding gap users repeatedly reported (DashScope covers Alibaba, Zhipu covers BigModel; both ship with international endpoints by default). Co-Authored-By: Magicray1217 <267836857+Magicray1217@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ai): add Azure OpenAI recipe (#459 reworked) 14th recipe and the first to exercise both v0.32 architectural seams: - resolveAuth (D12=A) returns `{headerName: 'api-key', token: <key>}` instead of the default Authorization Bearer. Azure rejects double-auth, so applyResolveAuth puts the key in `headers` and OMITS apiKey. - A new `Recipe.resolveOpenAICompatConfig?(env)` seam (Recipe.ts) lets the recipe template the baseURL from env (Azure: ENDPOINT + DEPLOYMENT combine into a non-/v1 path) and inject a custom fetch wrapper that splices ?api-version= onto every request URL. The fetch wrapper is type-safe via `as unknown as typeof fetch`; AI SDK never calls TS's strict `preconnect()` method on the wrapper so the cast is sound. `applyOpenAICompatConfig` (new gateway helper) routes through the recipe override or falls back to the pre-v0.32 base_urls/base_url_default behavior — existing 13 recipes get zero behavior change. API version defaults to `2024-10-21` (current stable as of 2026-05); override via AZURE_OPENAI_API_VERSION env. Endpoint trailing slash gets stripped during URL construction so users can copy-paste from the Azure portal. Tests (12 cases in test/ai/recipe-azure-openai.test.ts): - resolveAuth returns api-key NOT Authorization Bearer - applyResolveAuth puts key in headers, NOT apiKey (no double-auth) - baseURL templating from endpoint + deployment, with trailing-slash strip - AIConfigError on missing endpoint OR deployment - fetch wrapper splices api-version (default + AZURE_OPENAI_API_VERSION override) - fetch wrapper does NOT double-add api-version when caller already set it - applyOpenAICompatConfig honors recipe override IRON RULE regression test updated: now asserts azure-openai is the documented exception that overrides resolveAuth; any future override needs review. Tests: bun test test/ai/ — 124/124 (12 new + 112 prior). Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 8 of 11, plus the resolveOpenAICompatConfig seam discovered during fold-in). Reworked from #459. The original PR proposed a hardcoded AzureOpenAI client switch; this implementation routes through the unified seams so future Azure-shaped providers (other custom-URL services) can reuse them. Co-Authored-By: JamesJZhang <32652444+JamesJZhang@users.noreply.github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ai): adjacent fixes — no_batch_cap (#779) + config-key fallbacks (#121) Two small ergonomics fixes folded together (#765 deferred — see TODOS.md follow-up; the CJK PGLite extraction was bigger than the plan estimated). #779 reworked (alexandreroumieu-codeapprentice): silence the missing-max_batch_tokens startup warning for recipes with genuinely dynamic batch capacity. New `EmbeddingTouchpoint.no_batch_cap?: true` field. Set on ollama (capacity depends on locally loaded model + OLLAMA_NUM_PARALLEL), litellm-proxy (depends on backend), llama-server (set by --ctx-size at server launch). Three less stderr warnings on every gateway configure; google still warns (it's a real fixed-cap provider that ought to ship a max_batch_tokens declaration). Bonus: litellm-proxy now declares `user_provided_models: true`, removing the last consumer of the legacy `recipe.id === 'litellm'` hardcode in gateway.ts:223 (D8=A wire-through completion). #121 reworked (vinsew): self-contained API keys. Two parts: 1. config.ts: ANTHROPIC_API_KEY env merge was silently missing. loadConfig() merged OPENAI_API_KEY but not ANTHROPIC_API_KEY into the file-config-shape result. One-line addition. 2. cli.ts:buildGatewayConfig: when ~/.gbrain/config.json declares openai_api_key / anthropic_api_key but the process env doesn't have those env vars set (common for launchd-spawned daemons, agent subprocess tools, containers that don't propagate ~/.zshrc), fold the config-file values into the gateway env snapshot. Process env still wins (loaded last) so per-process overrides keep working. Tests (4 cases in test/ai/no-batch-cap-suppression.test.ts): - Ollama / LiteLLM / llama-server all declare no_batch_cap: true - configureGateway does NOT warn for those three - configureGateway STILL warns for google (regression guard) - Cross-cutting invariant: empty-models recipes declare user_provided_models Tests: bun test test/ai/ — 128/128 (4 new + 124 prior). Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 9 of 11). #765 (Hunyuan PGLite + CJK keyword fallback) deferred to TODOS.md follow-up; the CJK extraction (~150 lines + scoring logic + tests) is larger than the wave's adjacent-fix lane should carry. Closes that PR with a deferral note. Co-Authored-By: alexandreroumieu-codeapprentice <noreply@github.com> Co-Authored-By: vinsew <noreply@github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(discoverability): doctor alt-provider advisory + init user_provided_models refusal Two small but high-leverage changes that address the discoverability problem the v0.32 wave is trying to fix. src/commands/doctor.ts: new `alternative_providers` check (8c). After the existing embedding-provider smoke test, walks listRecipes() and surfaces any recipe whose required env vars are ALL present in the process env but is not the currently configured provider. Reports as status: 'ok' with an informational message — never errors. Helps users discover that, e.g., `OPENAI_API_KEY=x DASHSCOPE_API_KEY=y` configured for openai means they have a Chinese-region alternative ready without extra setup. src/commands/init.ts: user_provided_models recipes (litellm, llama-server) now refuse the implicit "first model" pick from shorthand --model with a structured setup hint pointing the user at the explicit form `--embedding-model <provider>:<your-model-id> --embedding-dimensions <N>`. Pre-fix, shorthand --model litellm threw "no embedding models listed" which was technically correct but unhelpful. The new error includes the recipe's setup_hint when available. Tests: bun test test/ai/ — 128/128 pass; typecheck clean. Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 10 of 11). The full interactive provider chooser in init.ts (the bigger piece of the discoverability lane) is deferred to a v0.32.x follow-up; this commit ships the doctor advisory + cleaner refusal that close the 80% case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(v0.32.0): embedding-providers.md + README callout + CHANGELOG + TODOS.md Final commit of the v0.32 wave. Closes the discoverability gap that generated the 17-PR community cluster. - New docs/integrations/embedding-providers.md: capability matrix, decision tree, per-recipe one-pagers, OAuth provider notes, "my provider isn't listed" pointer to LiteLLM proxy. Voice: capability not marketing per CLAUDE.md voice rules. - README.md: embedding-providers callout near the top, naming the count (14 recipes) and pointing at the new doc. - CHANGELOG.md: v0.32.0 entry following the verdict-headline format from CLAUDE.md voice rules. Lead-with-numbers ("14 providers, 5 new"), what-this- means-for-users closer, "to take advantage" upgrade block, itemized changes, contributor credits, deferred-with-context list. - VERSION + package.json: 0.31.1 → 0.32.0. Minor bump justified by the new public Recipe surface (resolveAuth, resolveOpenAICompatConfig, probe, user_provided_models, no_batch_cap fields), the new OAuth subsystem scaffold (deferred to v0.32.x but typed in v0.32.0), and the 5 new recipes. - TODOS.md: 7 follow-up entries for the v0.32 wave's deferred work (Vertex ADC, Copilot OAuth, Codex OAuth, CJK PGLite, interactive wizard, real-credentials CI matrix, MiniMax asymmetric retrieval, multimodal hardcode un-stuck). Each entry has full context + the exact file paths + the spike work needed so a future contributor can pick up cleanly. Tests: bun test test/ai/ — 128/128 pass; typecheck clean. Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 11 of 11). Wave complete: 11 commits, ~1500 net lines, 5 new recipes, full docs, doctor advisory, IRON RULE regression test, 7 TODOS for the v0.32.x follow-up wave. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: regenerate llms.txt + llms-full.txt for v0.32.0 After commit c384fad added the embedding-providers callout to README.md, the committed llms-full.txt drifted from the generator output and the build-llms test failed. Running `bun run build:llms` regenerates both files. The single line addition is the README callout pointing at docs/integrations/embedding-providers.md. Tests: bun test test/build-llms.test.ts — 7/7 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: hermetic GBRAIN_HOME for brain-registry serial flake + withEnv on recipe-llama-server Two test-isolation cleanups uncovered while shipping v0.32. test/brain-registry.serial.test.ts (the BrainRegistry "empty/null/undefined id routes to host" test): pre-existing flake on dev machines that have a real ~/.gbrain/config.json. The test asserts getBrain(null) REJECTS but on those machines the host-init path RESOLVES instead (it found the maintainer's actual brain). The fix pins GBRAIN_HOME to a guaranteed-empty tempdir for the test's duration so host-init has nothing to find and fails loudly with a non-UnknownBrainError — exactly what the assertion wants. File is .serial.test.ts so direct process.env mutation is allowed by the test-isolation linter (R1 quarantine). test/ai/recipe-llama-server.test.ts: rewrites the manual beforeEach/afterEach env save/restore as withEnv() per the canonical pattern in test/helpers/with-env.ts. The original was correct in behavior but tripped the test-isolation linter (R1: process.env mutation). withEnv() is exactly the cross-test-safe save+try/finally+restore the manual code did, just factored out. No behavior change. Tests: bun run test — 5217 pass / 0 fail (was 5027 / 1 pre-existing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address 5 codex pre-merge findings (dim passthrough + URL routing + MiniMax host) Codex adversarial review during /ship caught five real production bugs. All five fixed with regression test coverage. 1. **dimsProviderOptions on openai-compatible** (src/core/ai/dims.ts): text-embedding-3-* (Azure), text-embedding-v3 (DashScope), and embedding-3 (Zhipu) now thread `dimensions` to the wire. Without this, Azure-default 3072d hard-fails a 1536d brain on first embed; DashScope and Zhipu Matryoshka requests silently get the provider's default size instead of what the user asked for. New tests in recipe-azure-openai/dashscope/zhipu pin the contract. 2. **`gbrain init --embedding-model llama-server:foo` verbose path** (src/commands/init.ts): now refuses without `--embedding-dimensions` for user_provided_models recipes. Pre-fix, the shorthand `--model` path was guarded but the verbose `--embedding-model` path fell through to configureGateway's 1536d default and silently created the wrong- width schema; failure surfaced only at first real embed. 3. **MiniMax host correction** (src/core/ai/recipes/minimax.ts): `api.minimax.chat/v1` → `api.minimaxi.com/v1` matches MiniMax's current OpenAI-compatible docs. Default-config users would have hit the wrong endpoint before auth or model selection mattered. 4. **`LLAMA_SERVER_BASE_URL` reaches the gateway** (src/cli.ts: buildGatewayConfig): env-set local-server URLs (LLAMA_SERVER_BASE_URL, OLLAMA_BASE_URL, LMSTUDIO_BASE_URL, LITELLM_BASE_URL) now thread into `cfg.base_urls` so embed traffic hits the configured port. Pre-fix, the probe would succeed against a custom port while real embed calls went to localhost:8080. Caller-supplied `cfg.provider_base_urls` still wins over env. 5. **Recipe.probe(baseURL?) accepts the resolved URL** (src/core/ai/types.ts, src/core/ai/probes.ts, src/core/ai/recipes/llama-server.ts): when the user configures `provider_base_urls.llama-server` in config but no env var is set, the probe and gateway no longer disagree. Callers with cfg pass the resolved URL; legacy callers fall back to env / recipe default. CHANGELOG updated; llms-full.txt regenerated. Tests: bun run test — 5220/5220 pass / 0 fail (was 5217 / 0; +3 new codex-finding regression tests). Pre-merge codex adversarial: ran during /ship Step 11 against the v0.32 diff. All 5 findings addressed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): isolate v0.32 no-batch-cap test from mock.module leak (closes 19 CI fails) Three CI test-isolation fixes uncovered by yesterday's CI run on PR #810: 1. **`scripts/test-shard.sh` excludes `*.serial.test.ts`** (was running them in parallel shards). Without this, serial files race with non-serial files in the CI shard process. Mirrors `scripts/run-unit-shard.sh`'s exclusion set; 1-line `find` filter. 2. **`scripts/run-serial-tests.sh` runs each serial file in its own bun process**. Pre-fix, all serial files ran in ONE bun process with `--max-concurrency=1` — that limits intra-file concurrency but does NOT prevent module-registry leakage across files. When `eval-takes-quality-runner.serial.test.ts` does `mock.module('../src/core/ai/gateway.ts', () => ({chat, configureGateway}))` (a partial mock missing `resetGateway`, `defaultResolveAuth`, etc.), the next file in the same process gets the partial mock on import and `import { resetGateway }` fails with "Export named 'resetGateway' not found." Per-file processes give true isolation; cost is ~100ms × N files (negligible vs CI walltime). 3. **`test/ai/no-batch-cap-suppression.test.ts` → `.serial.test.ts`**. The test mutates `console.warn` globally (mock spy). When other tests in the same shard process load `src/core/ai/gateway.ts` and call `configureGateway()` first, they populate the module-scoped `_warnedRecipes` Set; the test's `resetGateway()` clears it but races if other gateway-touching code runs concurrently in the same process. Renaming to `.serial.test.ts` quarantines it via fix #1 + #2. 4. **CI workflow gains a serial-tests step on shard 1**. Pre-fix, shard 1 ran `bun run verify` + the parallel shard, but no shard ran `*.serial.test.ts` files. After fix #1 excludes them from shards, they need explicit invocation. New step: `bash scripts/run-serial-tests.sh` (shard 1 only). Tests: bun run test — 5220 / 0 fail (matches local pre-CI run; was showing 19 fails on CI for PR #810 due to fixes #1-#3 missing). Failure analysis from .context/attachments/test__2__75236697976.log: - 18 multimodal failures: caused by mock.module leak from eval-takes-quality-runner.serial.test.ts being run alongside voyage-multimodal.test.ts in the same parallel shard process. After fix #1 + fix #3, eval-takes-quality only runs in serial pass; after fix #2, its mock.module doesn't leak to subsequent serial files. - 1 no-batch-cap failure: same root cause; fix #3 quarantines it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: SiyaoZheng <noreply@github.com> Co-authored-by: cacity <20351699+cacity@users.noreply.github.com> Co-authored-by: Magicray1217 <267836857+Magicray1217@users.noreply.github.com> Co-authored-by: JamesJZhang <32652444+JamesJZhang@users.noreply.github.com>

feat: add MiniMax embeddings provider support

4c2ca1d

feat: complete MiniMax embeddings integration

9883813

McoreD pushed a commit to KovaForge/gbrain that referenced this pull request Apr 20, 2026

merge: apply upstream PR garrytan#148 with minimax support

41bcf15

garrytan mentioned this pull request Apr 20, 2026

feat: v0.28.9 pluggable embedding providers — Vercel AI SDK #257

Merged

9 tasks

fix: throttle and serialize MiniMax embedding requests

9a3ef6e

Merge origin/master into feat/minimax-embeddings

e8a6c45

garrytan mentioned this pull request May 10, 2026

v0.32.0 feat: 5 new embedding recipes + discoverability pass (closes 17-PR cluster) #810

Merged

8 tasks

garrytan closed this in #810 May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add MiniMax embeddings provider support#148

add MiniMax embeddings provider support#148
cacity wants to merge 4 commits into
garrytan:masterfrom
cacity:feat/minimax-embeddings

cacity commented Apr 16, 2026

Uh oh!

McoreD commented Apr 19, 2026

Uh oh!

cacity commented Apr 20, 2026

Uh oh!

McoreD commented Apr 20, 2026

Uh oh!

cacity commented Apr 20, 2026

Uh oh!

McoreD commented Apr 21, 2026

Uh oh!

cacity commented Apr 21, 2026

Uh oh!

cacity commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cacity commented Apr 16, 2026

Summary

Scope

Test plan

Uh oh!

McoreD commented Apr 19, 2026

Uh oh!

cacity commented Apr 20, 2026

Uh oh!

McoreD commented Apr 20, 2026

Uh oh!

cacity commented Apr 20, 2026

Uh oh!

McoreD commented Apr 21, 2026

Uh oh!

cacity commented Apr 21, 2026

Uh oh!

cacity commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants