v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker#1008
Merged
Conversation
Widens `TouchpointKind` with `'reranker'`, adds `RerankerTouchpoint`
interface, extends `Recipe.touchpoints` and `AIGatewayConfig` to carry
reranker model state. Registers `zeroentropyai` recipe (zembed-1
embeddings + zerank-{2,1,1-small} rerankers) in the recipe registry.
Recipe declares the 7 Matryoshka dims (2560/1280/640/320/160/80/40),
Voyage-style dense-payload hedge (chars_per_token=1, safety_factor=0.5),
and 5MB rerank payload cap. Pinned by test/ai/zeroentropy-recipe.test.ts
including F1 regression (implementation literal is 'openai-compatible')
and F2 regression (base_url_default ends with /v1, no doubling).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`dimsProviderOptions` gains an optional `inputType?: 'query' | 'document'` 4th param so asymmetric providers (ZE zembed-1, Voyage v3+) can route query-side vs document-side encoding. Per-model filtering inside the openai-compatible branch keeps `input_type` from leaking to symmetric providers (OpenAI text-3, DashScope, Zhipu) that would 400 on it. Adds `ZEROENTROPY_VALID_DIMS` allowlist (2560/1280/640/320/160/80/40), `supportsZeroEntropyDimension(modelId)`, and `isValidZeroEntropyDim(dims)`. Throws `AIConfigError` with paste-ready fix hint when zembed-1 is configured with an invalid dim (most common: defaulting to 1536 from DEFAULT_EMBEDDING_DIMENSIONS). The 4th-arg is optional; existing call sites (1 production + N tests across Voyage/OpenAI/DashScope/Zhipu/MiniMax) compile unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two seams land together because they share the same recipe + auth path.
zeroEntropyCompatFetch handles ZE's non-OpenAI-compatible wire shape:
- URL rewrite: SDK's `${base_url}/embeddings` -> `${base_url}/models/embed`
- Body inject: `input_type` (default 'document'; 'query' when threaded
via providerOptions) + explicit `encoding_format: 'float'`
- Response rewrite: `{results: [{embedding}]}` -> `{data: [{embedding,
index}]}` so the AI SDK's openai-compat schema validates
- `usage.prompt_tokens` injected from `total_tokens` (Voyage hit the
same SDK schema requirement at :655)
- Layer 1 (Content-Length) + Layer 2 (per-embedding size) OOM caps
via tagged `ZeroEntropyResponseTooLargeError` (kept separate from
`VoyageResponseTooLargeError` because the Voyage cap tests do
structural source-text greps pinning the Voyage name)
- Wired in `instantiateEmbedding()` via the existing
`recipe.id === 'voyage' ? voyageCompatFetch : ...` ternary pattern
embedQuery(text) routes `inputType: 'query'` through dimsProviderOptions
for the search hot path. Companion to embed(texts) which now takes an
optional 2nd-arg inputType (defaults to undefined -> 'document' for
asymmetric providers).
gateway.rerank() is the new native HTTP path (no AI-SDK reranking
abstraction). Resolves the configured reranker model via
`getRerankerModel()` (new accessor), parses + asserts the model is in
the recipe's touchpoint.reranker.models allowlist (CDX2-F11:
assertTouchpoint does not enforce allowlists for openai-compatible
recipes — rerank() does it directly). Posts to
`${recipe.base_url}/models/rerank` with bearer auth. Returns
`RerankResult[]` sorted by `relevanceScore`. Errors classify into
`RerankError.reason: 'auth' | 'rate_limit' | 'network' | 'timeout' |
'payload_too_large' | 'unknown'`. 5s default timeout. Pre-flight payload
guard rejects bodies over `recipe.max_payload_bytes` BEFORE any HTTP
call so applyReranker can fail-open without burning a round-trip.
`_rerankTransport` + `__setRerankTransportForTests` mirror the embed
test seam.
`AIGatewayConfig.reranker_model` + isAvailable('reranker') branch +
configureGateway / reconfigureGatewayWithEngine extensions thread the
reranker model through the same state path as embedding/expansion/chat.
`applyResolveAuth` + `defaultResolveAuth` widen the touchpoint param to
include `'reranker'`. `KnownTouchpointKey` + `getTouchpoint()` in
model-resolver widen to cover `'reranker'`.
Pinned by:
- test/ai/embedQuery.test.ts (8): returns single Float32Array, threads
input_type='query' for ZE, drops field for OpenAI text-3,
back-compat: legacy embed() callers without 4th arg keep their
previous Voyage no-input_type shape
- test/ai/rerank.test.ts (21): URL (F2 regression — no /v1/v1/), body
shape, bearer header, response parsing, error classification across
6 HTTP shapes, payload pre-flight (no transport call), allowlist
enforcement
- test/ai/zeroentropy-compat-fetch.test.ts (14): structural source
assertions for the shim that mirror test/voyage-response-cap.test.ts —
URL rewrite path, body injection, response rewrite, usage.prompt_tokens
injection, OOM caps Layer 1 + Layer 2 + instanceof rethrow,
instantiateEmbedding wiring branch
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/core/search/rerank.ts — the call-site abstraction. Slices the top `opts.topNIn` deduped candidates, sends to gateway.rerank(), reorders by relevanceScore desc, appends the un-reranked tail in its original RRF order (recall protection). Fail-open on every RerankError.reason: logs via `logRerankFailure` and returns the input array unchanged. Stamps `rerank_score` onto reordered items. `topNOut: null` is the explicit "don't truncate" signal — distinct from `undefined` (fall through to mode bundle); pin in test (CDX2-F16). src/core/rerank-audit.ts — failure-only JSONL audit at `~/.gbrain/audit/rerank-failures-YYYY-Www.jsonl` (ISO-week rotation; mirrors `src/core/audit-slug-fallback.ts`). Exports `logRerankFailure` + `readRecentRerankFailures(days)`. **No `logRerankSuccess`** — CDX2-F22 deliberately drops success-event logging: writing once per tokenmax search is hot-path I/O churn AND success events leak query volume + timing into a local audit. The doctor check reads `search.reranker.enabled` first so "no events in window" gets interpreted correctly (disabled -> healthy by definition; enabled -> healthy because nothing failed). Query text is SHA-256-prefix-hashed (8 hex chars) for privacy. Honors `GBRAIN_AUDIT_DIR`. src/core/search/hybrid.ts — slots `applyReranker` between `dedupResults()` and `enforceTokenBudget()` in the main RRF path. Resolution: per-call `opts.reranker` overrides; otherwise pulled from the resolved mode bundle (tokenmax -> enabled, others -> disabled in commit 5). Cache rows store final reranked results; the bumped knobsHash (commit 5) ensures rows can't leak across reranker configs. src/core/types.ts — adds `SearchOpts.reranker` as a structural type so callers can pass per-call overrides; runtime type lives in src/core/search/rerank.ts (avoids circular import). Tests: - test/search/rerank.test.ts (14): reorder, tail preserve, fail-open on every error class, topNOut null vs number, score stamping, empty + enabled=false pass-through - test/rerank-audit.test.ts (10): JSONL round-trip, error_summary truncated to 200, corrupt rows skipped, missing dir -> [], ISO-week rotation walks current + previous week, no logRerankSuccess export (CDX2-F22 contract) - test/search/hybrid-reranker-integration.test.ts (6): reranker fires when enabled, doesn't when disabled, reorders correctly, preserves tail, stamps rerank_score, fail-opens on rerankerFn throw — uses PGLite + stubbed embed transport, no API keys Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends `ModeBundle` with five reranker fields: `reranker_enabled`, `reranker_model`, `reranker_top_n_in`, `reranker_top_n_out`, `reranker_timeout_ms`. Per-mode defaults: - conservative -> enabled=false (cost-sensitive) - balanced -> enabled=false (opt-in via search.reranker.enabled) - tokenmax -> enabled=true (the high-cost-tolerant tier; ~$0.0003/query) Defaults model to `zeroentropyai:zerank-2`, topNIn=30, topNOut=null (no truncate by default; preserves tokenmax's searchLimit=50 end-to-end per CDX2-F16), timeout_ms=5000. `SearchKeyOverrides` + `SearchPerCallOpts` + `resolveSearchMode.pick` all extend to thread the new fields through the resolution chain (per-call -> per-key config -> mode bundle -> default). `loadOverridesFromConfig` adds parsers for the five new `search.reranker.*` config keys. `top_n_out` parsing distinguishes three input shapes (CDX2-F15): key absent -> undefined (fall through to mode bundle) 'null'|'none'|empty -> explicit null (no truncate) positive integer -> that number `SEARCH_MODE_CONFIG_KEYS` extends so `gbrain search modes --reset` clears the reranker overrides too. **KNOBS_HASH_VERSION bumps 1 -> 2** (CDX1-F14). Five new entries appended to `parts[]` (append-only convention CDX2-F13; reordering existing fields would silently rebuild every existing cache row). Includes `reranker_timeout_ms` so a 5s -> 100ms change invalidates stale rows (CDX2-F14: more fail-opens = different search behavior). Mid-rolling-deploy note (CDX2-F12): v=1 and v=2 processes produce distinct cacheRowIds for the same (source_id, query_text). Expect a temporary hit-rate dip + cache-row doubling for hot queries. Clears naturally within `cache.ttl_seconds` (default 3600s). src/commands/search.ts extends `KNOB_DESCRIPTIONS` with five new entries so `gbrain search modes` renders them. test/search-mode.test.ts extends the three bundle fixtures and bumps the KNOBS_HASH_VERSION expectation to 2. Pinned by test/search/knobs-hash-reranker.test.ts (13): each of the 5 reranker fields independently flips the hash, top_n_out=null renders stable, append-only convention enforced via source-position assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`gbrain models doctor` gains two new probes:
- `probeRerankerConfig` (zero-network) validates that the configured
reranker model resolves through the recipe registry, that the recipe
declares a `reranker` touchpoint, and that the model is in
`touchpoint.models[]`. Direct allowlist check here — assertTouchpoint
does not enforce allowlists for openai-compatible recipes (CDX2-F11).
Surfaces paste-ready `gbrain config set search.reranker.model
<zerank-2|zerank-1|zerank-1-small>` fix hint.
- `probeRerankerReachability` (1-token-equivalent) sends a minimal
`{query: "probe", documents: ["probe"]}` rerank to verify auth + URL.
Failures classify via `classifyError` into auth/rate_limit/network/
unknown. Skipped silently when reranker is unconfigured.
Also extends `probeEmbeddingConfig` with a `providerId === 'zeroentropyai'`
branch that catches the silent-1536-default bug class for zembed-1
configurations (same posture as the existing Voyage branch).
`ProbeResult.touchpoint` widens to include `'reranker_config'`.
`gbrain doctor` adds `checkRerankerHealth` to both the abbreviated
(doctorReportRemote) and full (runDoctor) check sets. Logic:
1) Read `search.reranker.enabled` first. Disabled + no failures =>
'reranker disabled'. Enabled + no failures => healthy.
2) Walk last 7 days of ~/.gbrain/audit/rerank-failures-*.jsonl.
3) ANY auth failure warns (config-time problem the probe should have
caught — surface it).
4) ANY payload_too_large failure warns (workload mismatch).
5) Transient (network/timeout/rate_limit) warns at >=5 in window.
Below that they're noise; reranker fails open anyway.
CDX2-F21 blind-spot fix: reading enabled state first means "no events"
gets interpreted correctly — never confuses "never-used" with "success
logging broken" (the latter is impossible because there is no success
logging by design, CDX2-F22).
Engine-agnostic; file-based + one config-key read.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test/e2e/zeroentropy-live.test.ts exercises the full stack against the
real api.zeroentropy.dev: embed (default 2560-dim + flexible 1280),
embedQuery (asymmetric query side), batch embed (3 distinct vectors),
rerank (3 docs sorted by relevance score, photosynthesis-relevant docs
beat the irrelevant cat doc), rerank with topN truncation.
Gated on `ZEROENTROPY_API_KEY`: every test prints `[skip]` and returns
early without assertions when the env var is unset, so fork PRs and
contributor machines without a ZE account stay green.
CI wire-up: `.github/workflows/e2e.yml` Tier 2 step adds
`test/e2e/zeroentropy-live.test.ts` to its `bun test` invocation and
exposes `ZEROENTROPY_API_KEY: ${{ secrets.ZEROENTROPY_API_KEY }}` to
the runner. The secret is set on garrytan/gbrain at the repo scope
(separately from this commit — set via `gh secret set` so the value
never lands in source).
Tier 1 stays mechanical (no API keys); Tier 2 is the natural home for
provider-live tests because it's already the API-keyed lane.
Cost: each full run fires ~6 small HTTP calls totaling well under a
cent at the published $0.025/1M-token rate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Release notes for the ZeroEntropy support wave: zembed-1 embeddings (flexible-dim 2560/1280/640/320/160/80/40, asymmetric input_type) and zerank-2 cross-encoder reranking land as a new openai-compatible recipe alongside OpenAI/Voyage. Reranker defaults ON for tokenmax mode, OFF for conservative/balanced (~$0.0003/query at tokenmax topNIn=30; rounding error vs the tier's $700/mo Opus pairing per the CLAUDE.md cost matrix). Search now ends with `RRF -> dedup -> reranker -> token-budget` when reranker is enabled; fails open to RRF order on any error class (audit-logged at ~/.gbrain/audit/rerank-failures-*.jsonl). `KNOBS_HASH_VERSION` bumps 1 -> 2 to fold reranker config into the query_cache row key. Rolling-deploy operators should expect a temporary cache hit-rate dip + cache-row doubling for hot queries (clears naturally within `cache.ttl_seconds`, default 3600s). Files in this commit are pure docs / version bump: - VERSION + package.json bump to 0.33.3.0 - CHANGELOG.md release-summary entry with "How to take advantage" block - CLAUDE.md Key Files annotations for the new recipe + rerank.ts + rerank-audit.ts + gateway extensions - docs/ai-providers/zeroentropy.md one-pager (setup, knob reference, failure observability, troubleshooting table) - skills/migrations/v0.33.3.md (purely informational: no required user action; reranker is opt-in everywhere, ZE embedding is opt-in) - llms-full.txt regenerated to match CLAUDE.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md # CLAUDE.md # VERSION # llms-full.txt # package.json
# Conflicts: # CHANGELOG.md # VERSION # package.json # src/core/ai/gateway.ts
brandonlipman
added a commit
to brandonlipman/gbrain
that referenced
this pull request
May 29, 2026
* upstream/master: v0.35.1.0: embedder shootout prereqs (pricing + gateway export + --resume-from) (garrytan#1055) v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker (garrytan#1008) v0.34.4.0 fix(embed): cursor-paginated --stale hardening wave (D2/D3/D4/D6/D7/D8 + regression test) (garrytan#991) v0.34.3.0 fix: supervisor treats code=0 watchdog exits as crashes (garrytan#1003) v0.34.2.0 fix(import): path-based checkpoint resume — kills parallel-drop + failed-file-skip + sort-flip bugs (garrytan#988) v0.34.1.0 fix(mcp): MCP fix wave — source-isolation P0 + PKCE DCR + federated_read + 3 more (garrytan#996) v0.34.0.0 feat: Cathedral III — recursive code intelligence + Leiden clusters + eval gate (garrytan#994) v0.33.3.0 feat(v0.33.3): code intelligence MCP foundation (v0.34 W0a-c + W3) (garrytan#934) v0.33.2.1 docs: fork-PR workflow for garrytan-agents (garrytan#992) fix(sync): raise maxBuffer to 100 MiB to prevent silent ENOBUFS crash (garrytan#982) v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting (garrytan#897) v0.33.1.1 fix: Voyage output_dimension + flexible-dim guard + OOM-cap rethrow (garrytan#962)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ZeroEntropy in the box: zembed-1 embeddings + zerank-2 cross-encoder reranking, on by default for tokenmax mode.
ZeroEntropy ships two specialized small models that target the two weakest retrieval moments in a gbrain pipeline:
zembed-1(32K-context embedding, flexible Matryoshka dims at 2560/1280/640/320/160/80/40, asymmetricinput_type: query|documentencoding) andzerank-2(multilingual cross-encoder reranker, $0.025/1M tokens, ~50% cheaper than Cohere/Voyage rerankers). Both land as a new openai-compatible recipe alongside OpenAI/Voyage. The reranker is the bigger story: search had no reranker stage before this release. Hybrid search now ends withRRF → dedup → reranker → token-budgetwhen reranker is enabled, with one configuration flip to opt in.Provider:
src/core/ai/recipes/zeroentropyai.ts— new recipe declaring bothembedding(zembed-1) +reranker(zerank-2/1/1-small) touchpoints withimplementation: 'openai-compatible'.src/core/ai/gateway.ts—zeroEntropyCompatFetchshim rewrites URL (/embeddings → /models/embed), injectsinput_type+encoding_format: 'float', rewrites response (results → data+usage.prompt_tokens), L1/L2 OOM caps viaZeroEntropyResponseTooLargeError. Newgateway.rerank()native HTTP path with fail-open posture + 5s timeout + payload pre-flight guard.Asymmetric encoding:
src/core/ai/dims.ts—dimsProviderOptionsgains a 4thinputTypeparam with per-model filtering (CDX2-F6: OpenAI text-3 + DashScope + Zhipu drop it, ZE + Voyage v3+ accept it).src/core/ai/gateway.ts—embedQuery()companion threadsinputType: 'query'.src/core/search/hybrid.tsflips two query-side embed sites (cache lookup + vector seed) toembedQuery.Reranker integration:
src/core/search/rerank.ts—applyRerankerslots betweendedupResults()andenforceTokenBudget(). Fail-open on everyRerankError.reason; stampsrerank_score;topNOut: nullis the explicit "don't truncate" signal.src/core/rerank-audit.ts— failure-only JSONL audit at~/.gbrain/audit/rerank-failures-YYYY-Www.jsonl. Per CDX2-F22, no success logging (hot-path I/O + query-volume leak).Mode bundles:
src/core/search/mode.ts—ModeBundleextended with 5 reranker fields.tokenmaxdefaultsreranker_enabled=true(~$0.0003/query at 30 docs);conservative+balanceddefault off.KNOBS_HASH_VERSIONbumps 1→2 (append-only) to fold reranker config into thequery_cache.knobs_hashcolumn.Doctor + observability:
src/commands/models.ts— newprobeRerankerConfig(zero-network allowlist check) + reachability probe. ZE branch added toprobeEmbeddingConfig.src/commands/doctor.ts—checkRerankerHealthreadssearch.reranker.enabledfirst, warns on any auth failure or ≥5 transient failures.CI:
.github/workflows/e2e.ymlTier 2 step runstest/e2e/zeroentropy-live.test.tsand exposesZEROENTROPY_API_KEY(already set as a repo secret).Test Coverage
110 new tests across 10 files covering: recipe shape (F1+F2 regressions), dim allowlist + 4th-arg
inputTypeplumbing,gateway.rerank()HTTP path (URL, body, auth, error classification, payload pre-flight, allowlist),applyRerankerreorder + fail-open + null/undefined semantics, JSONL audit round-trip + ISO-week rotation, hybrid+reranker PGLite integration, knobsHash v=2 + 5-field separation, structural source assertions forzeroEntropyCompatFetch, and 6 live HTTP round-trips againstapi.zeroentropy.dev(env-gated).Tests: 6360 → 6507 (+147 master + 110 ZE-specific). E2E: 91 files / 617 tests / 0 fail on real Postgres including ZE live API tests.
Pre-Landing Review
Two
/codexrounds, 47 source-grounded findings. Round 1 (consult) caught type-level + URL + config-merge contradictions in the draft plan ('openai-compat'//v1/v1/bugs). Round 2 (adversarial challenge) caught wire-shape completeness gaps (usage.prompt_tokensmissing,instantiateEmbeddingwiring,AIGatewayConfigextension). All must-fix findings folded in; 9 deferred bugs documented in the plan with source-line citations.Plan Completion
PASS — every Phase 1-7 item from the planning round shipped. Plan file:
~/.claude/plans/system-instruction-you-are-working-linked-moonbeam.mdwith full GSTACK REVIEW REPORT footer (CODEX CLEARED post-patches; Eng review skipped on a solo wave).Documentation
docs/ai-providers/zeroentropy.md— one-pager with setup, knob reference, failure observability, troubleshooting.skills/migrations/v0.35.0.0.md— operator-facing migration notes (no required action; opt-in everywhere).CLAUDE.mdKey Files section: new recipe,rerank.ts,rerank-audit.ts, gateway extensions.CHANGELOG.mdrelease-summary block in GStack voice.llms-full.txtregenerated.Test plan
bun run verify— clean (12 pre-checks + typecheck)bun run test— 6507 unit + 19 serial, 0 failbun run test:e2e— 91 files, 617 tests, 0 fail on fresh Postgres (incl. 6 ZE live API round-trips)gbrain models doctor— embedding_config + reranker_config probes pass against real ZE API🤖 Generated with Claude Code