v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker by garrytan · Pull Request #1008 · garrytan/gbrain

garrytan · 2026-05-15T04:50:31Z

Summary

ZeroEntropy in the box: zembed-1 embeddings + zerank-2 cross-encoder reranking, on by default for tokenmax mode.

ZeroEntropy ships two specialized small models that target the two weakest retrieval moments in a gbrain pipeline: zembed-1 (32K-context embedding, flexible Matryoshka dims at 2560/1280/640/320/160/80/40, asymmetric input_type: query|document encoding) and zerank-2 (multilingual cross-encoder reranker, $0.025/1M tokens, ~50% cheaper than Cohere/Voyage rerankers). Both land as a new openai-compatible recipe alongside OpenAI/Voyage. The reranker is the bigger story: search had no reranker stage before this release. Hybrid search now ends with RRF → dedup → reranker → token-budget when reranker is enabled, with one configuration flip to opt in.

Provider:

src/core/ai/recipes/zeroentropyai.ts — new recipe declaring both embedding (zembed-1) + reranker (zerank-2/1/1-small) touchpoints with implementation: 'openai-compatible'.
src/core/ai/gateway.ts — zeroEntropyCompatFetch shim rewrites URL (/embeddings → /models/embed), injects input_type + encoding_format: 'float', rewrites response (results → data + usage.prompt_tokens), L1/L2 OOM caps via ZeroEntropyResponseTooLargeError. New gateway.rerank() native HTTP path with fail-open posture + 5s timeout + payload pre-flight guard.

Asymmetric encoding:

src/core/ai/dims.ts — dimsProviderOptions gains a 4th inputType param with per-model filtering (CDX2-F6: OpenAI text-3 + DashScope + Zhipu drop it, ZE + Voyage v3+ accept it).
src/core/ai/gateway.ts — embedQuery() companion threads inputType: 'query'. src/core/search/hybrid.ts flips two query-side embed sites (cache lookup + vector seed) to embedQuery.

Reranker integration:

src/core/search/rerank.ts — applyReranker slots between dedupResults() and enforceTokenBudget(). Fail-open on every RerankError.reason; stamps rerank_score; topNOut: null is the explicit "don't truncate" signal.
src/core/rerank-audit.ts — failure-only JSONL audit at ~/.gbrain/audit/rerank-failures-YYYY-Www.jsonl. Per CDX2-F22, no success logging (hot-path I/O + query-volume leak).

Mode bundles:

src/core/search/mode.ts — ModeBundle extended with 5 reranker fields. tokenmax defaults reranker_enabled=true (~$0.0003/query at 30 docs); conservative + balanced default off. KNOBS_HASH_VERSION bumps 1→2 (append-only) to fold reranker config into the query_cache.knobs_hash column.

Doctor + observability:

src/commands/models.ts — new probeRerankerConfig (zero-network allowlist check) + reachability probe. ZE branch added to probeEmbeddingConfig.
src/commands/doctor.ts — checkRerankerHealth reads search.reranker.enabled first, warns on any auth failure or ≥5 transient failures.

CI:

.github/workflows/e2e.yml Tier 2 step runs test/e2e/zeroentropy-live.test.ts and exposes ZEROENTROPY_API_KEY (already set as a repo secret).

Test Coverage

110 new tests across 10 files covering: recipe shape (F1+F2 regressions), dim allowlist + 4th-arg inputType plumbing, gateway.rerank() HTTP path (URL, body, auth, error classification, payload pre-flight, allowlist), applyReranker reorder + fail-open + null/undefined semantics, JSONL audit round-trip + ISO-week rotation, hybrid+reranker PGLite integration, knobsHash v=2 + 5-field separation, structural source assertions for zeroEntropyCompatFetch, and 6 live HTTP round-trips against api.zeroentropy.dev (env-gated).

Tests: 6360 → 6507 (+147 master + 110 ZE-specific). E2E: 91 files / 617 tests / 0 fail on real Postgres including ZE live API tests.

Pre-Landing Review

Two /codex rounds, 47 source-grounded findings. Round 1 (consult) caught type-level + URL + config-merge contradictions in the draft plan ('openai-compat'//v1/v1/ bugs). Round 2 (adversarial challenge) caught wire-shape completeness gaps (usage.prompt_tokens missing, instantiateEmbedding wiring, AIGatewayConfig extension). All must-fix findings folded in; 9 deferred bugs documented in the plan with source-line citations.

Plan Completion

PASS — every Phase 1-7 item from the planning round shipped. Plan file: ~/.claude/plans/system-instruction-you-are-working-linked-moonbeam.md with full GSTACK REVIEW REPORT footer (CODEX CLEARED post-patches; Eng review skipped on a solo wave).

Documentation

docs/ai-providers/zeroentropy.md — one-pager with setup, knob reference, failure observability, troubleshooting.
skills/migrations/v0.35.0.0.md — operator-facing migration notes (no required action; opt-in everywhere).
CLAUDE.md Key Files section: new recipe, rerank.ts, rerank-audit.ts, gateway extensions.
CHANGELOG.md release-summary block in GStack voice.
llms-full.txt regenerated.

Test plan

bun run verify — clean (12 pre-checks + typecheck)
bun run test — 6507 unit + 19 serial, 0 fail
bun run test:e2e — 91 files, 617 tests, 0 fail on fresh Postgres (incl. 6 ZE live API round-trips)
gbrain models doctor — embedding_config + reranker_config probes pass against real ZE API

🤖 Generated with Claude Code

Widens `TouchpointKind` with `'reranker'`, adds `RerankerTouchpoint` interface, extends `Recipe.touchpoints` and `AIGatewayConfig` to carry reranker model state. Registers `zeroentropyai` recipe (zembed-1 embeddings + zerank-{2,1,1-small} rerankers) in the recipe registry. Recipe declares the 7 Matryoshka dims (2560/1280/640/320/160/80/40), Voyage-style dense-payload hedge (chars_per_token=1, safety_factor=0.5), and 5MB rerank payload cap. Pinned by test/ai/zeroentropy-recipe.test.ts including F1 regression (implementation literal is 'openai-compatible') and F2 regression (base_url_default ends with /v1, no doubling). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`dimsProviderOptions` gains an optional `inputType?: 'query' | 'document'` 4th param so asymmetric providers (ZE zembed-1, Voyage v3+) can route query-side vs document-side encoding. Per-model filtering inside the openai-compatible branch keeps `input_type` from leaking to symmetric providers (OpenAI text-3, DashScope, Zhipu) that would 400 on it. Adds `ZEROENTROPY_VALID_DIMS` allowlist (2560/1280/640/320/160/80/40), `supportsZeroEntropyDimension(modelId)`, and `isValidZeroEntropyDim(dims)`. Throws `AIConfigError` with paste-ready fix hint when zembed-1 is configured with an invalid dim (most common: defaulting to 1536 from DEFAULT_EMBEDDING_DIMENSIONS). The 4th-arg is optional; existing call sites (1 production + N tests across Voyage/OpenAI/DashScope/Zhipu/MiniMax) compile unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two seams land together because they share the same recipe + auth path. zeroEntropyCompatFetch handles ZE's non-OpenAI-compatible wire shape: - URL rewrite: SDK's `${base_url}/embeddings` -> `${base_url}/models/embed` - Body inject: `input_type` (default 'document'; 'query' when threaded via providerOptions) + explicit `encoding_format: 'float'` - Response rewrite: `{results: [{embedding}]}` -> `{data: [{embedding, index}]}` so the AI SDK's openai-compat schema validates - `usage.prompt_tokens` injected from `total_tokens` (Voyage hit the same SDK schema requirement at :655) - Layer 1 (Content-Length) + Layer 2 (per-embedding size) OOM caps via tagged `ZeroEntropyResponseTooLargeError` (kept separate from `VoyageResponseTooLargeError` because the Voyage cap tests do structural source-text greps pinning the Voyage name) - Wired in `instantiateEmbedding()` via the existing `recipe.id === 'voyage' ? voyageCompatFetch : ...` ternary pattern embedQuery(text) routes `inputType: 'query'` through dimsProviderOptions for the search hot path. Companion to embed(texts) which now takes an optional 2nd-arg inputType (defaults to undefined -> 'document' for asymmetric providers). gateway.rerank() is the new native HTTP path (no AI-SDK reranking abstraction). Resolves the configured reranker model via `getRerankerModel()` (new accessor), parses + asserts the model is in the recipe's touchpoint.reranker.models allowlist (CDX2-F11: assertTouchpoint does not enforce allowlists for openai-compatible recipes — rerank() does it directly). Posts to `${recipe.base_url}/models/rerank` with bearer auth. Returns `RerankResult[]` sorted by `relevanceScore`. Errors classify into `RerankError.reason: 'auth' | 'rate_limit' | 'network' | 'timeout' | 'payload_too_large' | 'unknown'`. 5s default timeout. Pre-flight payload guard rejects bodies over `recipe.max_payload_bytes` BEFORE any HTTP call so applyReranker can fail-open without burning a round-trip. `_rerankTransport` + `__setRerankTransportForTests` mirror the embed test seam. `AIGatewayConfig.reranker_model` + isAvailable('reranker') branch + configureGateway / reconfigureGatewayWithEngine extensions thread the reranker model through the same state path as embedding/expansion/chat. `applyResolveAuth` + `defaultResolveAuth` widen the touchpoint param to include `'reranker'`. `KnownTouchpointKey` + `getTouchpoint()` in model-resolver widen to cover `'reranker'`. Pinned by: - test/ai/embedQuery.test.ts (8): returns single Float32Array, threads input_type='query' for ZE, drops field for OpenAI text-3, back-compat: legacy embed() callers without 4th arg keep their previous Voyage no-input_type shape - test/ai/rerank.test.ts (21): URL (F2 regression — no /v1/v1/), body shape, bearer header, response parsing, error classification across 6 HTTP shapes, payload pre-flight (no transport call), allowlist enforcement - test/ai/zeroentropy-compat-fetch.test.ts (14): structural source assertions for the shim that mirror test/voyage-response-cap.test.ts — URL rewrite path, body injection, response rewrite, usage.prompt_tokens injection, OOM caps Layer 1 + Layer 2 + instanceof rethrow, instantiateEmbedding wiring branch Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

src/core/search/rerank.ts — the call-site abstraction. Slices the top `opts.topNIn` deduped candidates, sends to gateway.rerank(), reorders by relevanceScore desc, appends the un-reranked tail in its original RRF order (recall protection). Fail-open on every RerankError.reason: logs via `logRerankFailure` and returns the input array unchanged. Stamps `rerank_score` onto reordered items. `topNOut: null` is the explicit "don't truncate" signal — distinct from `undefined` (fall through to mode bundle); pin in test (CDX2-F16). src/core/rerank-audit.ts — failure-only JSONL audit at `~/.gbrain/audit/rerank-failures-YYYY-Www.jsonl` (ISO-week rotation; mirrors `src/core/audit-slug-fallback.ts`). Exports `logRerankFailure` + `readRecentRerankFailures(days)`. **No `logRerankSuccess`** — CDX2-F22 deliberately drops success-event logging: writing once per tokenmax search is hot-path I/O churn AND success events leak query volume + timing into a local audit. The doctor check reads `search.reranker.enabled` first so "no events in window" gets interpreted correctly (disabled -> healthy by definition; enabled -> healthy because nothing failed). Query text is SHA-256-prefix-hashed (8 hex chars) for privacy. Honors `GBRAIN_AUDIT_DIR`. src/core/search/hybrid.ts — slots `applyReranker` between `dedupResults()` and `enforceTokenBudget()` in the main RRF path. Resolution: per-call `opts.reranker` overrides; otherwise pulled from the resolved mode bundle (tokenmax -> enabled, others -> disabled in commit 5). Cache rows store final reranked results; the bumped knobsHash (commit 5) ensures rows can't leak across reranker configs. src/core/types.ts — adds `SearchOpts.reranker` as a structural type so callers can pass per-call overrides; runtime type lives in src/core/search/rerank.ts (avoids circular import). Tests: - test/search/rerank.test.ts (14): reorder, tail preserve, fail-open on every error class, topNOut null vs number, score stamping, empty + enabled=false pass-through - test/rerank-audit.test.ts (10): JSONL round-trip, error_summary truncated to 200, corrupt rows skipped, missing dir -> [], ISO-week rotation walks current + previous week, no logRerankSuccess export (CDX2-F22 contract) - test/search/hybrid-reranker-integration.test.ts (6): reranker fires when enabled, doesn't when disabled, reorders correctly, preserves tail, stamps rerank_score, fail-opens on rerankerFn throw — uses PGLite + stubbed embed transport, no API keys Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends `ModeBundle` with five reranker fields: `reranker_enabled`, `reranker_model`, `reranker_top_n_in`, `reranker_top_n_out`, `reranker_timeout_ms`. Per-mode defaults: - conservative -> enabled=false (cost-sensitive) - balanced -> enabled=false (opt-in via search.reranker.enabled) - tokenmax -> enabled=true (the high-cost-tolerant tier; ~$0.0003/query) Defaults model to `zeroentropyai:zerank-2`, topNIn=30, topNOut=null (no truncate by default; preserves tokenmax's searchLimit=50 end-to-end per CDX2-F16), timeout_ms=5000. `SearchKeyOverrides` + `SearchPerCallOpts` + `resolveSearchMode.pick` all extend to thread the new fields through the resolution chain (per-call -> per-key config -> mode bundle -> default). `loadOverridesFromConfig` adds parsers for the five new `search.reranker.*` config keys. `top_n_out` parsing distinguishes three input shapes (CDX2-F15): key absent -> undefined (fall through to mode bundle) 'null'|'none'|empty -> explicit null (no truncate) positive integer -> that number `SEARCH_MODE_CONFIG_KEYS` extends so `gbrain search modes --reset` clears the reranker overrides too. **KNOBS_HASH_VERSION bumps 1 -> 2** (CDX1-F14). Five new entries appended to `parts[]` (append-only convention CDX2-F13; reordering existing fields would silently rebuild every existing cache row). Includes `reranker_timeout_ms` so a 5s -> 100ms change invalidates stale rows (CDX2-F14: more fail-opens = different search behavior). Mid-rolling-deploy note (CDX2-F12): v=1 and v=2 processes produce distinct cacheRowIds for the same (source_id, query_text). Expect a temporary hit-rate dip + cache-row doubling for hot queries. Clears naturally within `cache.ttl_seconds` (default 3600s). src/commands/search.ts extends `KNOB_DESCRIPTIONS` with five new entries so `gbrain search modes` renders them. test/search-mode.test.ts extends the three bundle fixtures and bumps the KNOBS_HASH_VERSION expectation to 2. Pinned by test/search/knobs-hash-reranker.test.ts (13): each of the 5 reranker fields independently flips the hash, top_n_out=null renders stable, append-only convention enforced via source-position assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`gbrain models doctor` gains two new probes: - `probeRerankerConfig` (zero-network) validates that the configured reranker model resolves through the recipe registry, that the recipe declares a `reranker` touchpoint, and that the model is in `touchpoint.models[]`. Direct allowlist check here — assertTouchpoint does not enforce allowlists for openai-compatible recipes (CDX2-F11). Surfaces paste-ready `gbrain config set search.reranker.model <zerank-2|zerank-1|zerank-1-small>` fix hint. - `probeRerankerReachability` (1-token-equivalent) sends a minimal `{query: "probe", documents: ["probe"]}` rerank to verify auth + URL. Failures classify via `classifyError` into auth/rate_limit/network/ unknown. Skipped silently when reranker is unconfigured. Also extends `probeEmbeddingConfig` with a `providerId === 'zeroentropyai'` branch that catches the silent-1536-default bug class for zembed-1 configurations (same posture as the existing Voyage branch). `ProbeResult.touchpoint` widens to include `'reranker_config'`. `gbrain doctor` adds `checkRerankerHealth` to both the abbreviated (doctorReportRemote) and full (runDoctor) check sets. Logic: 1) Read `search.reranker.enabled` first. Disabled + no failures => 'reranker disabled'. Enabled + no failures => healthy. 2) Walk last 7 days of ~/.gbrain/audit/rerank-failures-*.jsonl. 3) ANY auth failure warns (config-time problem the probe should have caught — surface it). 4) ANY payload_too_large failure warns (workload mismatch). 5) Transient (network/timeout/rate_limit) warns at >=5 in window. Below that they're noise; reranker fails open anyway. CDX2-F21 blind-spot fix: reading enabled state first means "no events" gets interpreted correctly — never confuses "never-used" with "success logging broken" (the latter is impossible because there is no success logging by design, CDX2-F22). Engine-agnostic; file-based + one config-key read. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

test/e2e/zeroentropy-live.test.ts exercises the full stack against the real api.zeroentropy.dev: embed (default 2560-dim + flexible 1280), embedQuery (asymmetric query side), batch embed (3 distinct vectors), rerank (3 docs sorted by relevance score, photosynthesis-relevant docs beat the irrelevant cat doc), rerank with topN truncation. Gated on `ZEROENTROPY_API_KEY`: every test prints `[skip]` and returns early without assertions when the env var is unset, so fork PRs and contributor machines without a ZE account stay green. CI wire-up: `.github/workflows/e2e.yml` Tier 2 step adds `test/e2e/zeroentropy-live.test.ts` to its `bun test` invocation and exposes `ZEROENTROPY_API_KEY: ${{ secrets.ZEROENTROPY_API_KEY }}` to the runner. The secret is set on garrytan/gbrain at the repo scope (separately from this commit — set via `gh secret set` so the value never lands in source). Tier 1 stays mechanical (no API keys); Tier 2 is the natural home for provider-live tests because it's already the API-keyed lane. Cost: each full run fires ~6 small HTTP calls totaling well under a cent at the published $0.025/1M-token rate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Release notes for the ZeroEntropy support wave: zembed-1 embeddings (flexible-dim 2560/1280/640/320/160/80/40, asymmetric input_type) and zerank-2 cross-encoder reranking land as a new openai-compatible recipe alongside OpenAI/Voyage. Reranker defaults ON for tokenmax mode, OFF for conservative/balanced (~$0.0003/query at tokenmax topNIn=30; rounding error vs the tier's $700/mo Opus pairing per the CLAUDE.md cost matrix). Search now ends with `RRF -> dedup -> reranker -> token-budget` when reranker is enabled; fails open to RRF order on any error class (audit-logged at ~/.gbrain/audit/rerank-failures-*.jsonl). `KNOBS_HASH_VERSION` bumps 1 -> 2 to fold reranker config into the query_cache row key. Rolling-deploy operators should expect a temporary cache hit-rate dip + cache-row doubling for hot queries (clears naturally within `cache.ttl_seconds`, default 3600s). Files in this commit are pure docs / version bump: - VERSION + package.json bump to 0.33.3.0 - CHANGELOG.md release-summary entry with "How to take advantage" block - CLAUDE.md Key Files annotations for the new recipe + rerank.ts + rerank-audit.ts + gateway extensions - docs/ai-providers/zeroentropy.md one-pager (setup, knob reference, failure observability, troubleshooting table) - skills/migrations/v0.33.3.md (purely informational: no required user action; reranker is opt-in everywhere, ZE embedding is opt-in) - llms-full.txt regenerated to match CLAUDE.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # CHANGELOG.md # CLAUDE.md # VERSION # llms-full.txt # package.json

# Conflicts: # CHANGELOG.md # VERSION # package.json # src/core/ai/gateway.ts

* upstream/master: v0.35.1.0: embedder shootout prereqs (pricing + gateway export + --resume-from) (garrytan#1055) v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker (garrytan#1008) v0.34.4.0 fix(embed): cursor-paginated --stale hardening wave (D2/D3/D4/D6/D7/D8 + regression test) (garrytan#991) v0.34.3.0 fix: supervisor treats code=0 watchdog exits as crashes (garrytan#1003) v0.34.2.0 fix(import): path-based checkpoint resume — kills parallel-drop + failed-file-skip + sort-flip bugs (garrytan#988) v0.34.1.0 fix(mcp): MCP fix wave — source-isolation P0 + PKCE DCR + federated_read + 3 more (garrytan#996) v0.34.0.0 feat: Cathedral III — recursive code intelligence + Leiden clusters + eval gate (garrytan#994) v0.33.3.0 feat(v0.33.3): code intelligence MCP foundation (v0.34 W0a-c + W3) (garrytan#934) v0.33.2.1 docs: fork-PR workflow for garrytan-agents (garrytan#992) fix(sync): raise maxBuffer to 100 MiB to prevent silent ENOBUFS crash (garrytan#982) v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting (garrytan#897) v0.33.1.1 fix: Voyage output_dimension + flexible-dim guard + OOM-cap rethrow (garrytan#962)

garrytan and others added 10 commits May 14, 2026 20:02

Merge remote-tracking branch 'origin/master' into garrytan/spokane-v1

52abe8e

# Conflicts: # CHANGELOG.md # CLAUDE.md # VERSION # llms-full.txt # package.json

Merge remote-tracking branch 'origin/master' into garrytan/spokane-v1

f986d7f

# Conflicts: # CHANGELOG.md # VERSION # package.json # src/core/ai/gateway.ts

garrytan merged commit baf1a47 into master May 15, 2026
7 checks passed

100yenadmin mentioned this pull request May 17, 2026

Merge upstream GBrain v0.35.1.1 while preserving Eva OpenClaw defaults electricsheephq/eva-brain#101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker#1008

v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker#1008
garrytan merged 10 commits into
masterfrom
garrytan/spokane-v1

garrytan commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented May 15, 2026

Summary

Test Coverage

Pre-Landing Review

Plan Completion

Documentation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant