Skip to content

v0.41.31.0 feat(embed): delta-aware sync --all cost gate + real stale-embedding semantics#1632

Merged
garrytan merged 13 commits into
masterfrom
garrytan/embedding-cost-model-fix
May 30, 2026
Merged

v0.41.31.0 feat(embed): delta-aware sync --all cost gate + real stale-embedding semantics#1632
garrytan merged 13 commits into
masterfrom
garrytan/embedding-cost-model-fix

Conversation

@garrytan

@garrytan garrytan commented May 30, 2026

Copy link
Copy Markdown
Owner

Summary

Three things, shipped as bisectable commits on top of the v0.41.29 base.

1. gbrain sync --all stops blocking crons. The pre-existing cost gate emitted ConfirmationRequired + exit 2 on every non-TTY run without --yes, based on a whole-corpus estimate — so a nightly cron over an already-synced brain failed every night. The gate is now mode-aware: when embedding is deferred to backfill jobs (the federated_v2 default), it prints an informational notice and never exits 2 (the backfill's own $25/source/24h cap is the real money gate). The blocking confirmation fires only when sync embeds inline, and only when the new-content estimate exceeds a configurable floor sync.cost_gate_min_usd (default $0.50). Unchanged sources (git HEAD == last_commit, clean tree, current chunker) contribute 0.

2. Real stale-embedding semantics (migration v108). New pages.embedding_signature column records <provider:model>:<dims> at every embed-write site. Swap your embedding model and gbrain embed --stale now finds and re-embeds the drifted pages. NULL signature is grandfathered (never stale) so upgrading does not mass-re-embed the corpus.

3. Embed-backfill visibility. gbrain sources status gains a BACKFILL column (active/queued/idle per source) and the deferred sync notice appends a queued-job count, so a cron operator can see deferred embedding work.

Commit walk (git log master..HEAD): model-aware cost rate → mode-aware/delta-aware gate → stale semantics + v108 → backfill visibility → test mock fixes → R-3 + stamp-wiring tests → F1/F2 adversarial fix → version bump → P0a/P0b adversarial fix → docs.

Test Coverage

Coverage audit: 13/17 paths at first pass (76%), then the two flagged gaps closed — R-3 (a mandatory regression) and the embed-signature stamp-call wiring now have dedicated tests. New/extended test files: sync-cost-preview (shouldBlockSync/willEmbedSynchronously matrix incl R-1/R-2), sum-stale-chunk-chars, embedding-signature-stale (R-4 grandfather + mismatch + scoped invalidate + stamp), sync-cost-gate.serial (R-1/R-2/R-3 + control), import-signature-stamp.serial (inline stamp), source-health (BACKFILL counts), embed.serial (stamp-call assertion), e2e/sync-status-pglite (BACKFILL).

4 mandatory regressions pinned: R-1 deferred non-TTY never exit 2 · R-2 inline above-floor still exit 2 · R-3 chunker drift still estimates (git-unchanged ≠ free) · R-4 migration never mass-marks-stale (NULL grandfathered).

Pre-Landing & Adversarial Review

Two adversarial rounds (Claude subagent + Codex), both found real bugs that are now fixed:

  • F1 — the inline-import/per-slug embed paths never stamped the signature, so the feature was inert for non-federated brains. Fixed: all embed-write paths (embed/--all/--stale, embed-backfill, sync inline import, gbrain import) now stamp.
  • F2 — the inline gate counted stale backlog that sync doesn't clear; after F1 that would block a post-swap cron. Fixed: inline blocking cost = new-content only; backlog shown informationally.
  • P0a — the BACKFILL column was added to a helper the CLI sources status doesn't call. Fixed: wired into the real computeAllSourceMetrics path.
  • P0b — partial-page embeds could falsely stamp a mixed page as current. Fixed: stamp only when the whole page's chunk set was embedded this pass.

Accepted as documented limitations (per-page signature tradeoff): P1b (a >100-file --serial sync can over-estimate; non-default mode, conservative-high bias) and P1a (model-swap invalidate NULLs drifted vectors before re-embed; deliberate rare op).

Plan Completion

All 6 implementation tasks (T1-T6) DONE; T4 signature format is a documented CHANGED (model:dims, with chunker drift tracked separately via the existing pages.chunker_version). Plan + reviews at ~/.claude/plans/system-instruction-you-are-working-humming-kettle.md.

Migration

v108 pages_embedding_signature — additive nullable TEXT column, metadata-only on both engines, bootstrap probe added to both engines + REQUIRED_BOOTSTRAP_COVERAGE. Verified on real Postgres (schema-drift parity) + PGLite.

Documentation

Synced for v0.41.31.0: CLAUDE.md Key Files + Commands annotations, a "same-dimension model swaps (automatic)" section in docs/embedding-migrations.md, and regenerated llms-full.txt. CHANGELOG written. Trio audit clean: VERSION = package.json = CHANGELOG = 0.41.31.0.

Test plan

  • Full unit suite: 12027 parallel pass / 0 fail + 53/53 serial files
  • bun run verify: 29/29 checks green
  • bun run typecheck: clean
  • E2E on real Postgres (Docker): schema-drift parity (v108 column matches both engines), bootstrap, engine-parity, sync, source-isolation, multi-source, sync-status
  • Inline Postgres validation of new engine methods (R-4 grandfather confirmed on real PG)

🤖 Generated with Claude Code

…dcoded OpenAI

The sync --all cost gate computed spend from a hardcoded
EMBEDDING_COST_PER_1K_TOKENS = 0.00013 (OpenAI text-embedding-3-large)
and labeled the preview with the back-compat EMBEDDING_MODEL constant,
regardless of the actually-configured embedding model. A brain running a
cheaper model (e.g. zeroentropyai:zembed-1 @ $0.05/Mtok) saw a preview
that named the wrong provider and over-stated spend ~2.6x ($337 vs $130
on a 2.6B-token corpus).

estimateEmbeddingCostUsd now resolves the live model via the gateway and
prices it through embedding-pricing.ts (the existing per-provider:model
table), falling back to the OpenAI rate only when the gateway is
unconfigured (unit-test context) or the model is unknown. sync.ts surfaces
the real model name in the preview message and JSON.

Regression test pins model-aware pricing: openai 3-large vs zembed-1 must
produce materially different previews; collapsing both to the OpenAI number
fails the assertion.
garrytan and others added 9 commits May 29, 2026 21:14
…elta-aware inline gate

Under federated_v2 (default), sync --all DEFERS embedding to per-source
embed-backfill jobs that already cap spend at $25/source/24h. The v0.20
cost gate predated that cap and fired ConfirmationRequired + exit 2 on
EVERY non-TTY sync --all without --yes, regardless of cost — blocking
nightly crons over already-synced corpora and forcing permanent --yes.

The gate is now mode-aware:
  - Deferred embed (v2 default): print an FYI deferred notice (cap-aware,
    "not charged by this sync") + the stale-chunk backlog estimate, and
    NEVER exit 2. The backfill cap is the real money gate.
  - Inline embed (v2 off, or --serial without --no-embed): keep the
    blocking gate, but estimate the actual delta — full-tree ceiling for
    changed sources (unchanged sources contribute 0 via the same git +
    chunker_version "do work?" gate doctor/sync use) + stale backlog — and
    block only when it exceeds the new configurable floor
    sync.cost_gate_min_usd (default $0.50).

New pure helpers in embedding.ts (willEmbedSynchronously, shouldBlockSync)
keep the decision logic hermetically testable. New engine method
sumStaleChunkChars (both engines) prices the embedding backlog via
estimateCostFromChars. estimateSyncAllCost's per-source walk extracted to
estimateSourceTreeTokens (reused by the inline estimator).

Regressions pinned: R-1 deferred non-TTY never exit 2 (headline), R-2
inline above-floor still exit 2 (protection), plus the willEmbedSynchronously
/ shouldBlockSync matrix and sumStaleChunkChars engine + scope + embed_skip
coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ation v108)

Pre-v0.41.30 "stale" meant only `embedding IS NULL`, so swapping the
embedding model or dimensions left the whole corpus silently embedded under
the OLD model — `embed --stale` ignored it and search quality quietly
degraded.

New `pages.embedding_signature` (TEXT, migration v108) stamps the embedding
provenance (`<provider:model>:<dims>`) whenever a page's chunks are embedded.
A later model/dims swap makes the stored signature differ from the current
one, which the embed paths now detect and re-embed.

GRANDFATHER (critical): the stale predicate is
  `embedding IS NULL OR (embedding_signature IS NOT NULL AND <> $current)`
so a NULL signature is NEVER stale. After the migration every existing page
has NULL → none flagged → the next `embed --stale` does NOT re-embed the
whole corpus. Signatures are stamped going forward only.

Surface:
  - countStaleChunks / sumStaleChunkChars gain an optional `signature` opt
    that widens staleness (read-only; used by the dry-run preview + the
    sync cost preview, which is now signature-aware).
  - invalidateStaleSignatureEmbeddings(signature, sourceId?) NULLs the
    embeddings of signature-mismatched pages so the EXISTING NULL-embedding
    cursor (listStaleChunks, untouched) re-embeds them — keeps the keyset
    pagination logic intact.
  - setPageEmbeddingSignature stamps after a page's chunks land.
  - Both embed loops wired: `gbrain embed --stale`/`--all` (embed.ts) and the
    embed-backfill minion (embed-stale.ts) invalidate-then-stamp.

Migration v108 + bootstrap probe (both engines) + REQUIRED_BOOTSTRAP_COVERAGE
entry. Pinned by test/embedding-signature-stale.test.ts (R-4 grandfather,
mismatch detection, matching no-op, scoped invalidate, stamp) + the
bootstrap-coverage gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rred notice

Under federated_v2, `sync --all` exits 0 and embedding lags behind in
embed-backfill jobs (subject to cooldown + the per-source 24h cap). Pre-fix
an operator had no signal those jobs were queued or lagging — the sync looked
"done" while embeddings trickled in later.

`gbrain sources status` now shows a BACKFILL column per source
(active(N)/queued(N)/idle) plus the last completion timestamp, read from
minion_jobs. The deferred-sync notice appends "N backfill job(s) queued" so a
cron operator sees work is enqueued, not lost. Both reads are best-effort —
a brain that never ran a worker (no minion_jobs table) reports idle/0 instead
of crashing the dashboard.

SyncStatusReportSource gains backfill_queued / backfill_active /
backfill_last_completed_at (additive; JSON envelope schema_version unchanged).
Pinned by a new case in test/e2e/sync-status-pglite.test.ts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e EXPECTED_PHASES

Commit 3 made embed.ts import currentEmbeddingSignature from embedding.ts.
Four tests mock.module the whole embedding.ts and omitted the new export, so
embed.ts (imported transitively) failed at load with "Export named
'currentEmbeddingSignature' not found". Add the export to each mock:
embed.serial.test.ts, e2e/cycle.test.ts, e2e/dream.test.ts,
e2e/dream-cycle-phase-order-pglite.test.ts.

Also sync the stale EXPECTED_PHASES in dream-cycle-phase-order-pglite.test.ts
to match cycle.ts ALL_PHASES — extract_atoms, synthesize_concepts, and
conversation_facts_backfill drifted in after the test was last touched
(v0.41.0.0) and were never added, so both phase-order assertions were failing
on the branch before this wave (confirmed against 0906ab0). The dry-run cycle
emits all 20 phases, so mirroring the constant makes both assertions pass.

Pre-existing, unrelated: cycle.test.ts / dream.test.ts have 5 runCycle
failures via direct `bun test` (the conversation_facts_backfill phase uses the
module-singleton getConnection) — present identically at 0906ab0, not touched
by this wave.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iring

Ship-workflow coverage audit flagged two gaps:
- R-3 (mandatory regression) had no dedicated test: the inline unchanged-source
  short-circuit requires git-unchanged AND chunker_version match, but nothing
  pinned the chunker half. Add a case where git is unchanged (HEAD==last_commit,
  clean) but chunker_version is stale → estimate still fires (exit 2), plus a
  control where chunker matches → short-circuits to $0 (no block).
- The embed loops' setPageEmbeddingSignature call-site was only kept green by the
  mock, never asserted. Add a test that runs `embed --all` and asserts the stamp
  fires once per page with the current signature.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… paths (F1); inline cost gate counts new-content only (F2)

Adversarial review caught that the stale-detection feature was inert for
non-federated/inline brains: the embed-write paths that DON'T go through
embed.ts/embed-stale.ts never stamped pages.embedding_signature.

F1 — stamp at the remaining write sites:
  - embedPage (gbrain embed <slug> + sync's post-import runEmbedCore({slugs}))
  - importFromContent markdown branch (inline import/sync embed + gbrain import)
  - importCodeFile (only when EVERY chunk was freshly embedded this call —
    reuse-by-hash carries old-model vectors, so a mixed page stays unstamped
    rather than falsely marked current)
Without this, inline-synced pages kept NULL signatures → grandfathered → never
re-embedded on a model/dims swap. Now all embed-write paths stamp.

F2 — coupled regression the F1 fix would otherwise introduce: the inline cost
gate added the stale backlog (NULL + signature drift) into the BLOCKING cost,
but `gbrain sync` inline only embeds new/changed content — the backlog is
`gbrain embed --stale`'s job. Once F1 gives inline brains real signatures, a
model swap would inflate the inline gate and block the next cron for cost the
sync never incurs. Inline blocking cost is now new-content only; the stale
backlog is shown informationally ("pending gbrain embed --stale"). Deferred
path keeps the signature-aware backlog FYI (the backfill does clear it).

Pinned by test/import-signature-stamp.serial.test.ts (inline stamp + --no-embed
NULL) and the existing R-2/R-3 inline-gate tests (still exit 2 on new-content).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…; guard partial-page signature stamping (P0b)

Codex adversarial review caught two issues in the v0.41.30 wave:

P0a — `gbrain sources status` routes through computeAllSourceMetrics
(source-health.ts), not the buildSyncStatusReport helper where the BACKFILL
column was added, so the CLI never showed it. Add per-source embed-backfill
active/queued counts to computeAllSourceMetrics (one extra FILTER on the
existing minion_jobs query) and render a BACKFILL column in `sources status`.
The deferred-sync notice's queued-job count (live sync path) already worked.

P0b — embedPage / embedAllStale / embed-stale stamped embedding_signature
unconditionally after embedding only the STALE subset of a page's chunks. A
partially-embedded page (some chunks preserved from a prior embed under
unknown/old provenance) would be falsely marked current, hiding the old
vectors from future stale detection. Now stamp only when EVERY chunk of the
page was (re)embedded this pass (toEmbed === chunks / stale === existing).
importFromContent embeds the full chunk set so it stays unconditional;
importCodeFile already had the equivalent guard. `gbrain embed --all` fully
re-embeds and stamps mixed pages.

Accepted as documented limitations (not fixed): the inline cost gate can
over-estimate a >100-file `--serial` sync that performSync will defer
(non-default mode, conservative-high bias), and model-swap invalidation NULLs
drifted vectors before re-embed (a deliberate, rare operation).

Pinned by a new backfill-counts case in test/source-health.test.ts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Refresh CLAUDE.md Key Files + Commands for the embedding cost-model + stale-semantics wave: model-aware cost helpers in embedding.ts (currentEmbeddingPricePerMTok / currentEmbeddingSignature / willEmbedSynchronously / shouldBlockSync), the embedding-signature stale-detection engine quartet (sumStaleChunkChars / setPageEmbeddingSignature / invalidateStaleSignatureEmbeddings + widened countStaleChunks), migration v108, signature stamping across embed.ts / import-file.ts, the mode-aware sync --all cost gate + sync.cost_gate_min_usd config key, and the sources status BACKFILL column. Add a same-dimension-swap auto-reembed note to docs/embedding-migrations.md. Regenerate llms-full.txt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title fix(cost): embedding cost preview uses configured model rate, not hardcoded OpenAI v0.41.30.0 feat(embed): delta-aware sync --all cost gate + real stale-embedding semantics May 30, 2026
@garrytan garrytan changed the title v0.41.30.0 feat(embed): delta-aware sync --all cost gate + real stale-embedding semantics v0.41.31.0 feat(embed): delta-aware sync --all cost gate + real stale-embedding semantics May 30, 2026
garrytan and others added 3 commits May 30, 2026 09:16
Mechanical version-string sweep across VERSION, package.json, CHANGELOG,
CLAUDE.md, docs, source/test comments, and regenerated llms bundles. No logic
change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cost-model-fix

# Conflicts:
#	CHANGELOG.md
#	CLAUDE.md
#	VERSION
#	llms-full.txt
#	package.json
…5 contamination)

CI shard 5 failed deterministically with `expected 1280 dimensions, not 1536`.
Root cause: cosine-rescore-column.test.ts hardcodes 1536-dim `embedding`
vectors and asserts length 1536, but its beforeAll ran `initSchema()` with no
gateway config. initSchema sizes the `embedding` column from
getEmbeddingDimensions(), whose default is 1280 (zeroentropyai:zembed-1). The
test only passed by inheriting a leaked 1536 gateway config from an earlier
test (or, locally, from ~/.gbrain). When the v0.41.31 merge shifted the
weight-aware shard bin-packing, the file order changed so the 1280 default won
in CI → vector(1280) column → 1536 insert rejected. (Passed locally because
the dev machine's ~/.gbrain resolves 1536.)

Fix: configureGateway({ openai:text-embedding-3-large, 1536 }) in beforeAll
BEFORE connect/initSchema so the column is deterministically vector(1536)
regardless of ambient/leaked state, and resetGateway() in afterAll for
hygiene. Proven: under a forced-1280 gateway preload the old test reproduces
the exact CI error and the fixed test passes (4/4).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit 146a8f1 into master May 30, 2026
21 checks passed
mgunnin added a commit to mgunnin/gbrain that referenced this pull request Jun 3, 2026
* upstream/master:
  v0.41.36.0 feat(mcp): publish agent skills (list_skills / get_skill) for thin clients (garrytan#1661)
  v0.41.35.0 feat(guardrails): vendor-neutral content guardrail seams (supersedes garrytan#1652) (garrytan#1660)
  v0.41.34.0 feat(search): retrieval cathedral — max-pool + title + alias + evidence (garrytan#1657)
  v0.41.33.0 feat(search): intent-aware adaptive return-sizing + agent-facing query param (garrytan#1640)
  v0.41.32.0 fix(staleness): commit-relative sync staleness (supersedes garrytan#1623) (garrytan#1656)
  v0.41.31.0 feat(embed): delta-aware sync --all cost gate + real stale-embedding semantics (garrytan#1632)
  v0.41.30.0 fix(brainstorm/lsd): --save writes the advertised .md file via canonical ingestion path (garrytan#1655)

# Conflicts:
#	src/core/operations.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant