v0.40.8.0 fix: local embeddings as a first-class provider#1329
v0.40.8.0 fix: local embeddings as a first-class provider#1329kohai-ut wants to merge 5 commits into
Conversation
doctor --remediation-plan and autopilot both judged the embedding provider with a hosted-only key check, so a brain on ollama: or llama-server: was reported "blocked" on a missing API key it never needed, contradicting doctor --json's 100%-coverage health. Extract a shared embeddingProviderConfigured() helper into brain-score-recommendations.ts: empty auth_env.required (local providers) is configured with no key; hosted providers check their OWN required key. Both producers (doctor, autopilot) call it, killing the DRY violation that caused the bug. Hosted brains with a missing key still block.
A --max-cost-bounded embed/reindex job configured for ollama: or llama-server: TX2 hard-failed with no_pricing because lookupEmbeddingPrice has no entry for local models. Add FREE_LOCAL_EMBED_PROVIDERS (sibling to FREE_LOCAL_RERANK_PROVIDERS) so a pricing miss on a local-inference provider returns $0 instead of null. lmstudio/litellm intentionally excluded.
A down/misconfigured local embed server was invisible until first embed. Add probeEmbeddingReachability() (mirrors the reranker probe): a 1-input embed with a 5s abort timeout, classified via classifyError, under a new 'embedding_reachability' touchpoint, gated on the zero-network config probe returning ok first.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
codex review caught a false positive: HOSTED_EMBED_KEY_CONFIG mapped VOYAGE_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY to config fields, but buildGatewayConfig only threads openai/anthropic/zeroentropy config keys into the gateway env. A Voyage/Google brain with the key only in config.json would be judged "configured" and dispatch an embed.stale job that then fails auth at the gateway. Drop those two from the map so the producer closures resolve them by env var only, matching what the gateway can actually use. Pinned by a regression test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks for this contribution — and apologies for the slow triage. We did a full pass over the entire PR backlog. gbrain has moved fast, and the maintainer's larger "cathedral" rewrites have superseded a big share of community PRs: the AI gateway + recipes + user_provided_models system replaced almost all individual provider PRs; #1805 fixed the whole Postgres module-singleton class; #1542 unified the type taxonomy; #1657 the retrieval path; #1802 the doctor; and so on. We're closing this one in that cleanup — either the fix already landed on master, it duplicates another PR or merged change, or it's outside the current merge bar. Where a closed PR carried a genuinely valuable idea, we've recorded it in docs/designs/COMMUNITY_IDEAS.md so nothing good is lost (a few may graduate into TODOs). Please don't read the close as a judgment of the work — thank you for contributing. If you believe the underlying issue is still live on the latest master, reopen with a quick note and we'll take another look. 🙏 |
Summary
Makes local embedding providers (Ollama, llama-server) behave exactly like hosted ones across every place gbrain inspects your setup. Three independent gaps, one root cause (a hosted-only key check copied into two producers):
fix: recipe-aware embedding-provider check):gbrain doctor --remediation-planreported ablocked"missing embedding API key" for a brain onollama:/llama-server:, contradictinggbrain doctor --json's 100%-coverage health. A new sharedembeddingProviderConfigured()helper (recipe-aware: emptyauth_env.required⇒ no key needed; hosted ⇒ checks its OWN required key) replaces the hosted-only prefix ladder indoctor.tsand the parallel copy inautopilot.ts. TheRecommendationContext.hasEmbeddingApiKeyfield is renamedembeddingProviderConfiguredand the blocker reason broadened to"embedding provider not configured".fix(budget): price local embed providers at $0): a--max-cost-bounded embed/reindex job for a local provider TX2 hard-failed withno_pricing. NewFREE_LOCAL_EMBED_PROVIDERS = {ollama, llama-server}(sibling to the existingFREE_LOCAL_RERANK_PROVIDERS) returns$0on alookupEmbeddingPricemiss.lmstudio(no recipe) andlitellm(can proxy paid) intentionally excluded.feat(models): embedding reachability probe): a down/misconfigured local embed server was invisible until first embed. NewprobeEmbeddingReachability()mirrors the reranker probe — a 1-inputembedwith a 5s abort timeout, new'embedding_reachability'touchpoint, gated on the zero-network config probe returningokfirst.Intended behavior change (hosted providers)
The remediation planner now judges each hosted provider by its OWN required key. Pre-fix, every non-OpenAI/non-ZE provider fell back to "any OpenAI/ZE key present", so a Voyage brain looked configured if an unrelated OpenAI key existed. Now a Voyage brain is judged by
VOYAGE_API_KEY. Strictly more correct, but a behavior change for Voyage/Google brains that relied on the old fallback. Hosted brains with no key still block, as before.Test Coverage
test/brain-score-recommendations.test.ts: 6 new helper cases (empty/undefined → false, local → true regardless of keys, hosted iff key resolves, the Voyage behavior change, unknown provider → false, malformed model id → false) + renamed consumer assertions.test/core/budget/budget-tracker.test.ts: local embed → $0 (no TX2); regression unknown-hosted embed still TX2 hard-fails; regression known-hosted (openai) still trips a real cost gate.test/models-doctor-embed.test.ts(new): pins the three reachability-probe invariants (usesembednotembedQuery, distinct touchpoint member, gated on config-probe-ok).test/v0_37_gap_fill.serial.test.ts: source-grep updated to the new helper names.64/64 tests pass across the 4 affected files.
bun run verifyandbun run check:allgreen. (The full parallel PGLite suite is intentionally left to CI — it OOMs the 16GB dev box; targeted-file runs + verify + check:all are the local gate.)Pre-Landing Review
Clean — 0 findings. No SQL/LLM trust-boundary surface; the only added runtime cost is 4 sequential
engine.getConfigawaits per autopilot tick (negligible). No frontend files (design review N/A), no prompt files (evals N/A).Plan Completion
All planned items shipped (gaps A/B/C) plus the codex outside-voice refinements from the plan-eng-review (recipe-aware helper ≠
isAvailable,parseModelIdtry/catch, stated hosted behavior change + tests, autopilot model resolution,HOSTED_EMBED_KEY_CONFIGsync-closure map,embednotembedQuery, distinct touchpoint member, gating on config-probe-ok).TODOS
Added a follow-up to unify
FREE_LOCAL_EMBED_PROVIDERS+FREE_LOCAL_RERANK_PROVIDERSinto oneFREE_LOCAL_PROVIDERSkeyed by kind, and to evaluate recipe-cost-driven resolution, once the rerank (#1326) and embed sides both land.Test plan
bun run verify(typecheck + prechecks)bun run check:allbun test test/brain-score-recommendations.test.ts test/core/budget/budget-tracker.test.ts test/models-doctor-embed.test.ts test/v0_37_gap_fill.serial.test.ts→ 64 pass🤖 Generated with Claude Code