Skip to content

v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker#1008

Merged
garrytan merged 10 commits into
masterfrom
garrytan/spokane-v1
May 15, 2026
Merged

v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker#1008
garrytan merged 10 commits into
masterfrom
garrytan/spokane-v1

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

ZeroEntropy in the box: zembed-1 embeddings + zerank-2 cross-encoder reranking, on by default for tokenmax mode.

ZeroEntropy ships two specialized small models that target the two weakest retrieval moments in a gbrain pipeline: zembed-1 (32K-context embedding, flexible Matryoshka dims at 2560/1280/640/320/160/80/40, asymmetric input_type: query|document encoding) and zerank-2 (multilingual cross-encoder reranker, $0.025/1M tokens, ~50% cheaper than Cohere/Voyage rerankers). Both land as a new openai-compatible recipe alongside OpenAI/Voyage. The reranker is the bigger story: search had no reranker stage before this release. Hybrid search now ends with RRF → dedup → reranker → token-budget when reranker is enabled, with one configuration flip to opt in.

Provider:

  • src/core/ai/recipes/zeroentropyai.ts — new recipe declaring both embedding (zembed-1) + reranker (zerank-2/1/1-small) touchpoints with implementation: 'openai-compatible'.
  • src/core/ai/gateway.tszeroEntropyCompatFetch shim rewrites URL (/embeddings → /models/embed), injects input_type + encoding_format: 'float', rewrites response (results → data + usage.prompt_tokens), L1/L2 OOM caps via ZeroEntropyResponseTooLargeError. New gateway.rerank() native HTTP path with fail-open posture + 5s timeout + payload pre-flight guard.

Asymmetric encoding:

  • src/core/ai/dims.tsdimsProviderOptions gains a 4th inputType param with per-model filtering (CDX2-F6: OpenAI text-3 + DashScope + Zhipu drop it, ZE + Voyage v3+ accept it).
  • src/core/ai/gateway.tsembedQuery() companion threads inputType: 'query'. src/core/search/hybrid.ts flips two query-side embed sites (cache lookup + vector seed) to embedQuery.

Reranker integration:

  • src/core/search/rerank.tsapplyReranker slots between dedupResults() and enforceTokenBudget(). Fail-open on every RerankError.reason; stamps rerank_score; topNOut: null is the explicit "don't truncate" signal.
  • src/core/rerank-audit.ts — failure-only JSONL audit at ~/.gbrain/audit/rerank-failures-YYYY-Www.jsonl. Per CDX2-F22, no success logging (hot-path I/O + query-volume leak).

Mode bundles:

  • src/core/search/mode.tsModeBundle extended with 5 reranker fields. tokenmax defaults reranker_enabled=true (~$0.0003/query at 30 docs); conservative + balanced default off. KNOBS_HASH_VERSION bumps 1→2 (append-only) to fold reranker config into the query_cache.knobs_hash column.

Doctor + observability:

  • src/commands/models.ts — new probeRerankerConfig (zero-network allowlist check) + reachability probe. ZE branch added to probeEmbeddingConfig.
  • src/commands/doctor.tscheckRerankerHealth reads search.reranker.enabled first, warns on any auth failure or ≥5 transient failures.

CI:

  • .github/workflows/e2e.yml Tier 2 step runs test/e2e/zeroentropy-live.test.ts and exposes ZEROENTROPY_API_KEY (already set as a repo secret).

Test Coverage

110 new tests across 10 files covering: recipe shape (F1+F2 regressions), dim allowlist + 4th-arg inputType plumbing, gateway.rerank() HTTP path (URL, body, auth, error classification, payload pre-flight, allowlist), applyReranker reorder + fail-open + null/undefined semantics, JSONL audit round-trip + ISO-week rotation, hybrid+reranker PGLite integration, knobsHash v=2 + 5-field separation, structural source assertions for zeroEntropyCompatFetch, and 6 live HTTP round-trips against api.zeroentropy.dev (env-gated).

Tests: 6360 → 6507 (+147 master + 110 ZE-specific). E2E: 91 files / 617 tests / 0 fail on real Postgres including ZE live API tests.

Pre-Landing Review

Two /codex rounds, 47 source-grounded findings. Round 1 (consult) caught type-level + URL + config-merge contradictions in the draft plan ('openai-compat'//v1/v1/ bugs). Round 2 (adversarial challenge) caught wire-shape completeness gaps (usage.prompt_tokens missing, instantiateEmbedding wiring, AIGatewayConfig extension). All must-fix findings folded in; 9 deferred bugs documented in the plan with source-line citations.

Plan Completion

PASS — every Phase 1-7 item from the planning round shipped. Plan file: ~/.claude/plans/system-instruction-you-are-working-linked-moonbeam.md with full GSTACK REVIEW REPORT footer (CODEX CLEARED post-patches; Eng review skipped on a solo wave).

Documentation

  • docs/ai-providers/zeroentropy.md — one-pager with setup, knob reference, failure observability, troubleshooting.
  • skills/migrations/v0.35.0.0.md — operator-facing migration notes (no required action; opt-in everywhere).
  • CLAUDE.md Key Files section: new recipe, rerank.ts, rerank-audit.ts, gateway extensions.
  • CHANGELOG.md release-summary block in GStack voice.
  • llms-full.txt regenerated.

Test plan

  • bun run verify — clean (12 pre-checks + typecheck)
  • bun run test — 6507 unit + 19 serial, 0 fail
  • bun run test:e2e — 91 files, 617 tests, 0 fail on fresh Postgres (incl. 6 ZE live API round-trips)
  • gbrain models doctor — embedding_config + reranker_config probes pass against real ZE API

🤖 Generated with Claude Code

garrytan and others added 10 commits May 14, 2026 20:02
Widens `TouchpointKind` with `'reranker'`, adds `RerankerTouchpoint`
interface, extends `Recipe.touchpoints` and `AIGatewayConfig` to carry
reranker model state. Registers `zeroentropyai` recipe (zembed-1
embeddings + zerank-{2,1,1-small} rerankers) in the recipe registry.

Recipe declares the 7 Matryoshka dims (2560/1280/640/320/160/80/40),
Voyage-style dense-payload hedge (chars_per_token=1, safety_factor=0.5),
and 5MB rerank payload cap. Pinned by test/ai/zeroentropy-recipe.test.ts
including F1 regression (implementation literal is 'openai-compatible')
and F2 regression (base_url_default ends with /v1, no doubling).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`dimsProviderOptions` gains an optional `inputType?: 'query' | 'document'`
4th param so asymmetric providers (ZE zembed-1, Voyage v3+) can route
query-side vs document-side encoding. Per-model filtering inside the
openai-compatible branch keeps `input_type` from leaking to symmetric
providers (OpenAI text-3, DashScope, Zhipu) that would 400 on it.

Adds `ZEROENTROPY_VALID_DIMS` allowlist (2560/1280/640/320/160/80/40),
`supportsZeroEntropyDimension(modelId)`, and `isValidZeroEntropyDim(dims)`.
Throws `AIConfigError` with paste-ready fix hint when zembed-1 is
configured with an invalid dim (most common: defaulting to 1536 from
DEFAULT_EMBEDDING_DIMENSIONS).

The 4th-arg is optional; existing call sites (1 production + N tests
across Voyage/OpenAI/DashScope/Zhipu/MiniMax) compile unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two seams land together because they share the same recipe + auth path.

zeroEntropyCompatFetch handles ZE's non-OpenAI-compatible wire shape:
  - URL rewrite: SDK's `${base_url}/embeddings` -> `${base_url}/models/embed`
  - Body inject: `input_type` (default 'document'; 'query' when threaded
    via providerOptions) + explicit `encoding_format: 'float'`
  - Response rewrite: `{results: [{embedding}]}` -> `{data: [{embedding,
    index}]}` so the AI SDK's openai-compat schema validates
  - `usage.prompt_tokens` injected from `total_tokens` (Voyage hit the
    same SDK schema requirement at :655)
  - Layer 1 (Content-Length) + Layer 2 (per-embedding size) OOM caps
    via tagged `ZeroEntropyResponseTooLargeError` (kept separate from
    `VoyageResponseTooLargeError` because the Voyage cap tests do
    structural source-text greps pinning the Voyage name)
  - Wired in `instantiateEmbedding()` via the existing
    `recipe.id === 'voyage' ? voyageCompatFetch : ...` ternary pattern

embedQuery(text) routes `inputType: 'query'` through dimsProviderOptions
for the search hot path. Companion to embed(texts) which now takes an
optional 2nd-arg inputType (defaults to undefined -> 'document' for
asymmetric providers).

gateway.rerank() is the new native HTTP path (no AI-SDK reranking
abstraction). Resolves the configured reranker model via
`getRerankerModel()` (new accessor), parses + asserts the model is in
the recipe's touchpoint.reranker.models allowlist (CDX2-F11:
assertTouchpoint does not enforce allowlists for openai-compatible
recipes — rerank() does it directly). Posts to
`${recipe.base_url}/models/rerank` with bearer auth. Returns
`RerankResult[]` sorted by `relevanceScore`. Errors classify into
`RerankError.reason: 'auth' | 'rate_limit' | 'network' | 'timeout' |
'payload_too_large' | 'unknown'`. 5s default timeout. Pre-flight payload
guard rejects bodies over `recipe.max_payload_bytes` BEFORE any HTTP
call so applyReranker can fail-open without burning a round-trip.
`_rerankTransport` + `__setRerankTransportForTests` mirror the embed
test seam.

`AIGatewayConfig.reranker_model` + isAvailable('reranker') branch +
configureGateway / reconfigureGatewayWithEngine extensions thread the
reranker model through the same state path as embedding/expansion/chat.
`applyResolveAuth` + `defaultResolveAuth` widen the touchpoint param to
include `'reranker'`. `KnownTouchpointKey` + `getTouchpoint()` in
model-resolver widen to cover `'reranker'`.

Pinned by:
- test/ai/embedQuery.test.ts (8): returns single Float32Array, threads
  input_type='query' for ZE, drops field for OpenAI text-3,
  back-compat: legacy embed() callers without 4th arg keep their
  previous Voyage no-input_type shape
- test/ai/rerank.test.ts (21): URL (F2 regression — no /v1/v1/), body
  shape, bearer header, response parsing, error classification across
  6 HTTP shapes, payload pre-flight (no transport call), allowlist
  enforcement
- test/ai/zeroentropy-compat-fetch.test.ts (14): structural source
  assertions for the shim that mirror test/voyage-response-cap.test.ts —
  URL rewrite path, body injection, response rewrite, usage.prompt_tokens
  injection, OOM caps Layer 1 + Layer 2 + instanceof rethrow,
  instantiateEmbedding wiring branch

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/core/search/rerank.ts — the call-site abstraction. Slices the top
`opts.topNIn` deduped candidates, sends to gateway.rerank(), reorders by
relevanceScore desc, appends the un-reranked tail in its original RRF
order (recall protection). Fail-open on every RerankError.reason: logs
via `logRerankFailure` and returns the input array unchanged. Stamps
`rerank_score` onto reordered items. `topNOut: null` is the explicit
"don't truncate" signal — distinct from `undefined` (fall through to
mode bundle); pin in test (CDX2-F16).

src/core/rerank-audit.ts — failure-only JSONL audit at
`~/.gbrain/audit/rerank-failures-YYYY-Www.jsonl` (ISO-week rotation;
mirrors `src/core/audit-slug-fallback.ts`). Exports `logRerankFailure`
+ `readRecentRerankFailures(days)`. **No `logRerankSuccess`** — CDX2-F22
deliberately drops success-event logging: writing once per tokenmax
search is hot-path I/O churn AND success events leak query
volume + timing into a local audit. The doctor check reads
`search.reranker.enabled` first so "no events in window" gets
interpreted correctly (disabled -> healthy by definition; enabled ->
healthy because nothing failed). Query text is SHA-256-prefix-hashed
(8 hex chars) for privacy. Honors `GBRAIN_AUDIT_DIR`.

src/core/search/hybrid.ts — slots `applyReranker` between
`dedupResults()` and `enforceTokenBudget()` in the main RRF path.
Resolution: per-call `opts.reranker` overrides; otherwise pulled from
the resolved mode bundle (tokenmax -> enabled, others -> disabled in
commit 5). Cache rows store final reranked results; the bumped
knobsHash (commit 5) ensures rows can't leak across reranker configs.

src/core/types.ts — adds `SearchOpts.reranker` as a structural type so
callers can pass per-call overrides; runtime type lives in
src/core/search/rerank.ts (avoids circular import).

Tests:
- test/search/rerank.test.ts (14): reorder, tail preserve, fail-open on
  every error class, topNOut null vs number, score stamping, empty +
  enabled=false pass-through
- test/rerank-audit.test.ts (10): JSONL round-trip, error_summary
  truncated to 200, corrupt rows skipped, missing dir -> [], ISO-week
  rotation walks current + previous week, no logRerankSuccess export
  (CDX2-F22 contract)
- test/search/hybrid-reranker-integration.test.ts (6): reranker fires
  when enabled, doesn't when disabled, reorders correctly, preserves
  tail, stamps rerank_score, fail-opens on rerankerFn throw — uses
  PGLite + stubbed embed transport, no API keys

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends `ModeBundle` with five reranker fields: `reranker_enabled`,
`reranker_model`, `reranker_top_n_in`, `reranker_top_n_out`,
`reranker_timeout_ms`. Per-mode defaults:

  - conservative -> enabled=false (cost-sensitive)
  - balanced     -> enabled=false (opt-in via search.reranker.enabled)
  - tokenmax     -> enabled=true  (the high-cost-tolerant tier; ~$0.0003/query)

Defaults model to `zeroentropyai:zerank-2`, topNIn=30, topNOut=null
(no truncate by default; preserves tokenmax's searchLimit=50 end-to-end
per CDX2-F16), timeout_ms=5000.

`SearchKeyOverrides` + `SearchPerCallOpts` + `resolveSearchMode.pick`
all extend to thread the new fields through the resolution chain
(per-call -> per-key config -> mode bundle -> default).

`loadOverridesFromConfig` adds parsers for the five new
`search.reranker.*` config keys. `top_n_out` parsing distinguishes
three input shapes (CDX2-F15):
  key absent           -> undefined (fall through to mode bundle)
  'null'|'none'|empty  -> explicit null (no truncate)
  positive integer     -> that number

`SEARCH_MODE_CONFIG_KEYS` extends so `gbrain search modes --reset`
clears the reranker overrides too.

**KNOBS_HASH_VERSION bumps 1 -> 2** (CDX1-F14). Five new entries
appended to `parts[]` (append-only convention CDX2-F13; reordering
existing fields would silently rebuild every existing cache row).
Includes `reranker_timeout_ms` so a 5s -> 100ms change invalidates
stale rows (CDX2-F14: more fail-opens = different search behavior).

Mid-rolling-deploy note (CDX2-F12): v=1 and v=2 processes produce
distinct cacheRowIds for the same (source_id, query_text). Expect a
temporary hit-rate dip + cache-row doubling for hot queries. Clears
naturally within `cache.ttl_seconds` (default 3600s).

src/commands/search.ts extends `KNOB_DESCRIPTIONS` with five new
entries so `gbrain search modes` renders them. test/search-mode.test.ts
extends the three bundle fixtures and bumps the KNOBS_HASH_VERSION
expectation to 2.

Pinned by test/search/knobs-hash-reranker.test.ts (13): each of the 5
reranker fields independently flips the hash, top_n_out=null renders
stable, append-only convention enforced via source-position assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`gbrain models doctor` gains two new probes:

- `probeRerankerConfig` (zero-network) validates that the configured
  reranker model resolves through the recipe registry, that the recipe
  declares a `reranker` touchpoint, and that the model is in
  `touchpoint.models[]`. Direct allowlist check here — assertTouchpoint
  does not enforce allowlists for openai-compatible recipes (CDX2-F11).
  Surfaces paste-ready `gbrain config set search.reranker.model
  <zerank-2|zerank-1|zerank-1-small>` fix hint.

- `probeRerankerReachability` (1-token-equivalent) sends a minimal
  `{query: "probe", documents: ["probe"]}` rerank to verify auth + URL.
  Failures classify via `classifyError` into auth/rate_limit/network/
  unknown. Skipped silently when reranker is unconfigured.

Also extends `probeEmbeddingConfig` with a `providerId === 'zeroentropyai'`
branch that catches the silent-1536-default bug class for zembed-1
configurations (same posture as the existing Voyage branch).

`ProbeResult.touchpoint` widens to include `'reranker_config'`.

`gbrain doctor` adds `checkRerankerHealth` to both the abbreviated
(doctorReportRemote) and full (runDoctor) check sets. Logic:

  1) Read `search.reranker.enabled` first. Disabled + no failures =>
     'reranker disabled'. Enabled + no failures => healthy.
  2) Walk last 7 days of ~/.gbrain/audit/rerank-failures-*.jsonl.
  3) ANY auth failure warns (config-time problem the probe should have
     caught — surface it).
  4) ANY payload_too_large failure warns (workload mismatch).
  5) Transient (network/timeout/rate_limit) warns at >=5 in window.
     Below that they're noise; reranker fails open anyway.

CDX2-F21 blind-spot fix: reading enabled state first means "no events"
gets interpreted correctly — never confuses "never-used" with "success
logging broken" (the latter is impossible because there is no success
logging by design, CDX2-F22).

Engine-agnostic; file-based + one config-key read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test/e2e/zeroentropy-live.test.ts exercises the full stack against the
real api.zeroentropy.dev: embed (default 2560-dim + flexible 1280),
embedQuery (asymmetric query side), batch embed (3 distinct vectors),
rerank (3 docs sorted by relevance score, photosynthesis-relevant docs
beat the irrelevant cat doc), rerank with topN truncation.

Gated on `ZEROENTROPY_API_KEY`: every test prints `[skip]` and returns
early without assertions when the env var is unset, so fork PRs and
contributor machines without a ZE account stay green.

CI wire-up: `.github/workflows/e2e.yml` Tier 2 step adds
`test/e2e/zeroentropy-live.test.ts` to its `bun test` invocation and
exposes `ZEROENTROPY_API_KEY: ${{ secrets.ZEROENTROPY_API_KEY }}` to
the runner. The secret is set on garrytan/gbrain at the repo scope
(separately from this commit — set via `gh secret set` so the value
never lands in source).

Tier 1 stays mechanical (no API keys); Tier 2 is the natural home for
provider-live tests because it's already the API-keyed lane.

Cost: each full run fires ~6 small HTTP calls totaling well under a
cent at the published $0.025/1M-token rate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Release notes for the ZeroEntropy support wave: zembed-1 embeddings
(flexible-dim 2560/1280/640/320/160/80/40, asymmetric input_type) and
zerank-2 cross-encoder reranking land as a new openai-compatible recipe
alongside OpenAI/Voyage. Reranker defaults ON for tokenmax mode, OFF
for conservative/balanced (~$0.0003/query at tokenmax topNIn=30; rounding
error vs the tier's $700/mo Opus pairing per the CLAUDE.md cost matrix).

Search now ends with `RRF -> dedup -> reranker -> token-budget` when
reranker is enabled; fails open to RRF order on any error class
(audit-logged at ~/.gbrain/audit/rerank-failures-*.jsonl).

`KNOBS_HASH_VERSION` bumps 1 -> 2 to fold reranker config into the
query_cache row key. Rolling-deploy operators should expect a temporary
cache hit-rate dip + cache-row doubling for hot queries (clears
naturally within `cache.ttl_seconds`, default 3600s).

Files in this commit are pure docs / version bump:
- VERSION + package.json bump to 0.33.3.0
- CHANGELOG.md release-summary entry with "How to take advantage" block
- CLAUDE.md Key Files annotations for the new recipe + rerank.ts +
  rerank-audit.ts + gateway extensions
- docs/ai-providers/zeroentropy.md one-pager (setup, knob reference,
  failure observability, troubleshooting table)
- skills/migrations/v0.33.3.md (purely informational: no required user
  action; reranker is opt-in everywhere, ZE embedding is opt-in)
- llms-full.txt regenerated to match CLAUDE.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts:
#	CHANGELOG.md
#	CLAUDE.md
#	VERSION
#	llms-full.txt
#	package.json
# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
#	src/core/ai/gateway.ts
@garrytan garrytan merged commit baf1a47 into master May 15, 2026
7 checks passed
brandonlipman added a commit to brandonlipman/gbrain that referenced this pull request May 29, 2026
* upstream/master:
  v0.35.1.0: embedder shootout prereqs (pricing + gateway export + --resume-from) (garrytan#1055)
  v0.35.0.0 feat: ZeroEntropy zembed-1 + zerank-2 reranker (garrytan#1008)
  v0.34.4.0 fix(embed): cursor-paginated --stale hardening wave (D2/D3/D4/D6/D7/D8 + regression test) (garrytan#991)
  v0.34.3.0 fix: supervisor treats code=0 watchdog exits as crashes (garrytan#1003)
  v0.34.2.0 fix(import): path-based checkpoint resume — kills parallel-drop + failed-file-skip + sort-flip bugs (garrytan#988)
  v0.34.1.0 fix(mcp): MCP fix wave — source-isolation P0 + PKCE DCR + federated_read + 3 more (garrytan#996)
  v0.34.0.0 feat: Cathedral III — recursive code intelligence + Leiden clusters + eval gate (garrytan#994)
  v0.33.3.0 feat(v0.33.3): code intelligence MCP foundation (v0.34 W0a-c + W3) (garrytan#934)
  v0.33.2.1 docs: fork-PR workflow for garrytan-agents (garrytan#992)
  fix(sync): raise maxBuffer to 100 MiB to prevent silent ENOBUFS crash (garrytan#982)
  v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting (garrytan#897)
  v0.33.1.1 fix: Voyage output_dimension + flexible-dim guard + OOM-cap rethrow (garrytan#962)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant