Skip to content

v0.40.8.0 fix: local embeddings as a first-class provider#1329

Closed
kohai-ut wants to merge 5 commits into
garrytan:masterfrom
kohai-ut:feat/local-embeddings
Closed

v0.40.8.0 fix: local embeddings as a first-class provider#1329
kohai-ut wants to merge 5 commits into
garrytan:masterfrom
kohai-ut:feat/local-embeddings

Conversation

@kohai-ut

Copy link
Copy Markdown
Contributor

Summary

Makes local embedding providers (Ollama, llama-server) behave exactly like hosted ones across every place gbrain inspects your setup. Three independent gaps, one root cause (a hosted-only key check copied into two producers):

  • doctor/autopilot remediation planner (fix: recipe-aware embedding-provider check): gbrain doctor --remediation-plan reported a blocked "missing embedding API key" for a brain on ollama: / llama-server:, contradicting gbrain doctor --json's 100%-coverage health. A new shared embeddingProviderConfigured() helper (recipe-aware: empty auth_env.required ⇒ no key needed; hosted ⇒ checks its OWN required key) replaces the hosted-only prefix ladder in doctor.ts and the parallel copy in autopilot.ts. The RecommendationContext.hasEmbeddingApiKey field is renamed embeddingProviderConfigured and the blocker reason broadened to "embedding provider not configured".
  • budget tracker (fix(budget): price local embed providers at $0): a --max-cost-bounded embed/reindex job for a local provider TX2 hard-failed with no_pricing. New FREE_LOCAL_EMBED_PROVIDERS = {ollama, llama-server} (sibling to the existing FREE_LOCAL_RERANK_PROVIDERS) returns $0 on a lookupEmbeddingPrice miss. lmstudio (no recipe) and litellm (can proxy paid) intentionally excluded.
  • models doctor (feat(models): embedding reachability probe): a down/misconfigured local embed server was invisible until first embed. New probeEmbeddingReachability() mirrors the reranker probe — a 1-input embed with a 5s abort timeout, new 'embedding_reachability' touchpoint, gated on the zero-network config probe returning ok first.

Intended behavior change (hosted providers)

The remediation planner now judges each hosted provider by its OWN required key. Pre-fix, every non-OpenAI/non-ZE provider fell back to "any OpenAI/ZE key present", so a Voyage brain looked configured if an unrelated OpenAI key existed. Now a Voyage brain is judged by VOYAGE_API_KEY. Strictly more correct, but a behavior change for Voyage/Google brains that relied on the old fallback. Hosted brains with no key still block, as before.

Test Coverage

  • test/brain-score-recommendations.test.ts: 6 new helper cases (empty/undefined → false, local → true regardless of keys, hosted iff key resolves, the Voyage behavior change, unknown provider → false, malformed model id → false) + renamed consumer assertions.
  • test/core/budget/budget-tracker.test.ts: local embed → $0 (no TX2); regression unknown-hosted embed still TX2 hard-fails; regression known-hosted (openai) still trips a real cost gate.
  • test/models-doctor-embed.test.ts (new): pins the three reachability-probe invariants (uses embed not embedQuery, distinct touchpoint member, gated on config-probe-ok).
  • test/v0_37_gap_fill.serial.test.ts: source-grep updated to the new helper names.

64/64 tests pass across the 4 affected files. bun run verify and bun run check:all green. (The full parallel PGLite suite is intentionally left to CI — it OOMs the 16GB dev box; targeted-file runs + verify + check:all are the local gate.)

Pre-Landing Review

Clean — 0 findings. No SQL/LLM trust-boundary surface; the only added runtime cost is 4 sequential engine.getConfig awaits per autopilot tick (negligible). No frontend files (design review N/A), no prompt files (evals N/A).

Plan Completion

All planned items shipped (gaps A/B/C) plus the codex outside-voice refinements from the plan-eng-review (recipe-aware helper ≠ isAvailable, parseModelId try/catch, stated hosted behavior change + tests, autopilot model resolution, HOSTED_EMBED_KEY_CONFIG sync-closure map, embed not embedQuery, distinct touchpoint member, gating on config-probe-ok).

TODOS

Added a follow-up to unify FREE_LOCAL_EMBED_PROVIDERS + FREE_LOCAL_RERANK_PROVIDERS into one FREE_LOCAL_PROVIDERS keyed by kind, and to evaluate recipe-cost-driven resolution, once the rerank (#1326) and embed sides both land.

Test plan

  • bun run verify (typecheck + prechecks)
  • bun run check:all
  • Targeted: bun test test/brain-score-recommendations.test.ts test/core/budget/budget-tracker.test.ts test/models-doctor-embed.test.ts test/v0_37_gap_fill.serial.test.ts → 64 pass
  • Full CI suite (runs on PR)

🤖 Generated with Claude Code

kohai-ut and others added 5 commits May 23, 2026 14:39
doctor --remediation-plan and autopilot both judged the embedding
provider with a hosted-only key check, so a brain on ollama: or
llama-server: was reported "blocked" on a missing API key it never
needed, contradicting doctor --json's 100%-coverage health.

Extract a shared embeddingProviderConfigured() helper into
brain-score-recommendations.ts: empty auth_env.required (local
providers) is configured with no key; hosted providers check their
OWN required key. Both producers (doctor, autopilot) call it,
killing the DRY violation that caused the bug. Hosted brains with a
missing key still block.
A --max-cost-bounded embed/reindex job configured for ollama: or
llama-server: TX2 hard-failed with no_pricing because
lookupEmbeddingPrice has no entry for local models. Add
FREE_LOCAL_EMBED_PROVIDERS (sibling to FREE_LOCAL_RERANK_PROVIDERS)
so a pricing miss on a local-inference provider returns $0 instead
of null. lmstudio/litellm intentionally excluded.
A down/misconfigured local embed server was invisible until first
embed. Add probeEmbeddingReachability() (mirrors the reranker probe):
a 1-input embed with a 5s abort timeout, classified via classifyError,
under a new 'embedding_reachability' touchpoint, gated on the
zero-network config probe returning ok first.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
codex review caught a false positive: HOSTED_EMBED_KEY_CONFIG mapped
VOYAGE_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY to config fields, but
buildGatewayConfig only threads openai/anthropic/zeroentropy config
keys into the gateway env. A Voyage/Google brain with the key only in
config.json would be judged "configured" and dispatch an embed.stale
job that then fails auth at the gateway. Drop those two from the map so
the producer closures resolve them by env var only, matching what the
gateway can actually use. Pinned by a regression test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@garrytan

garrytan commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Thanks for this contribution — and apologies for the slow triage. We did a full pass over the entire PR backlog. gbrain has moved fast, and the maintainer's larger "cathedral" rewrites have superseded a big share of community PRs: the AI gateway + recipes + user_provided_models system replaced almost all individual provider PRs; #1805 fixed the whole Postgres module-singleton class; #1542 unified the type taxonomy; #1657 the retrieval path; #1802 the doctor; and so on.

We're closing this one in that cleanup — either the fix already landed on master, it duplicates another PR or merged change, or it's outside the current merge bar. Where a closed PR carried a genuinely valuable idea, we've recorded it in docs/designs/COMMUNITY_IDEAS.md so nothing good is lost (a few may graduate into TODOs).

Please don't read the close as a judgment of the work — thank you for contributing. If you believe the underlying issue is still live on the latest master, reopen with a quick note and we'll take another look. 🙏

@garrytan garrytan closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants