fix: support Voyage 4 Large 2048d schema setup by 100yenadmin · Pull Request #641 · garrytan/gbrain

100yenadmin · 2026-05-05T10:30:07Z

Fixes #640.

Plain-English Summary

Voyage 4 Large can return 2048-dimensional embeddings. That is useful for technical docs because the vectors have more room to represent fine-grained concepts, but pgvector's HNSW index has a 2000-dimensional limit.

This PR lets GBrain store and query Voyage 4 Large 2048d vectors safely: storage stays vector(2048), exact vector scans still work, and only the unsupported HNSW index is skipped.

flowchart LR
  A["GBRAIN_EMBEDDING_MODEL=voyage:voyage-4-large"] --> B["AI gateway"]
  B --> C["Voyage output_dimension=2048"]
  C --> D["schema helper"]
  D --> E["content_chunks.embedding vector(2048)"]
  D --> F{"dims > 2000?"}
  F -->|yes| G["skip HNSW index"]
  F -->|no| H["create HNSW index"]
  E --> I["exact scans remain valid"]

What Changed

Adds current Voyage embedding models, including voyage-4-large.
Sends Voyage's output_dimension provider option for modern Voyage models that support flexible dimensions.
Keeps older/fixed-dimension Voyage models from receiving unsupported dimension options.
Allows vector(2048) storage while skipping only the unsupported pgvector HNSW chunk embedding index above 2000 dims.
Keeps the HNSW index for supported dimensions such as 1024 and 1536.
Writes the configured embedding model and dimension into generated PGLite and Postgres schema config seeds.
Escapes model strings when templating Postgres schema SQL.

Voyage docs list 2048/1024/512/256 as supported output_dimension values for modern Voyage models, with 1024 as the default: https://docs.voyageai.com/reference/embeddings-api

Root Cause

pgvector can store vector(2048), but HNSW vector indexes have a 2000-dimensional cap. GBrain's schema templating replaced the vector column dimension but still emitted:

CREATE INDEX IF NOT EXISTS idx_chunks_embedding ON content_chunks USING hnsw (embedding vector_cosine_ops);

A Voyage 4 Large 2048d brain could therefore fail during schema/index creation even though exact vector scans were still valid.

Human Walkthrough

Think of this as letting GBrain use a sharper map for documents without forcing Postgres to build an index type it cannot legally build. Search still works; the one impossible optimization is skipped.

Review-Fix Pass

Latest review-fix commit: 169a79c.

Fixed Copilot review findings around:

Postgres seeded embedding_dimensions still reporting 1536;
SQL literal escaping for configured model names.

Validation

git diff --check
bun run verify
bun test test/ai/gateway.test.ts test/ai/schema-templating.test.ts
focused review-fix rerun: bun test test/ai/schema-templating.test.ts test/ai/gateway.test.ts -> 29 pass, 0 fail
prior full isolated run: HOME=$(mktemp -d) bun run test -> 3832 pass, 0 fail

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

Copilot

Pull request overview

This PR fixes schema initialization for Voyage’s 2048-dimensional embeddings by preventing pgvector HNSW index creation when dimensions exceed the 2000-dimension HNSW limit, while still allowing vector(2048) storage and exact-scan queries. It also updates Voyage model support and ensures Voyage’s OpenAI-compatible embeddings receive output_dimension where supported.

Changes:

Add a centralized pgvector HNSW dimension cap policy and apply it during schema generation (skip only the chunk HNSW index when dims > 2000).
Extend Voyage embedding model listings and pass output_dimension for Voyage models that support flexible dimensions.
Improve PGLite schema templating to seed embedding_model/embedding_dimensions into the config table and validate/sanitize inputs; add tests for 1024d (keeps HNSW) and 2048d (skips HNSW).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
test/ai/schema-templating.test.ts	Adds coverage for embedding config seeding and HNSW index behavior at 1024d vs 2048d.
test/ai/gateway.test.ts	Adds coverage for Voyage-specific `output_dimension` providerOptions behavior.
src/core/vector-index.ts	Introduces pgvector HNSW max-dimension policy utilities and SQL rewriting helper.
src/core/postgres-engine.ts	Applies the HNSW index policy when generating SCHEMA_SQL for Postgres initialization.
src/core/pglite-schema.ts	Seeds embedding model/dims into config insert, validates dims, sanitizes model, applies HNSW index policy.
src/core/ai/recipes/voyage.ts	Updates Voyage embedding model list to include newer Voyage models.
src/core/ai/dims.ts	Adds Voyage model allowlist to send `output_dimension` for OpenAI-compatible Voyage embeddings.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

100yenadmin · 2026-05-05T11:35:53Z

@copilot please re-review this PR with the latest changes. @codesmith please re-check this PR against the latest commits and refreshed PR description.

blacksmith-sh · 2026-05-05T11:35:55Z

Hi @100yenadmin! Codesmith requires write access to this repository. You currently have read-only access to garrytan/gbrain.

@100yenadmin

…n-link foot-gun Fix wave bundling 9 community PRs to unwedge users stuck since v0.27. Cluster A — PGLite upgrade wedge (#670, #661, #657, #651, #625, #615, #609): - Bootstrap now covers v0.20+v0.26.3+v0.27 forward references (both engines) - hasPendingMigrations() probe gates initSchema() in connectEngine - Post-upgrade auto-applies pending schema migrations (X1) - SQL-parser-backed bootstrap coverage replaces hand-maintained array (A2) Cluster B — Embedding dim corruption (#673, #672, #666, #640): - Schema templating cascade fixed end-to-end (#641 from @100yenadmin) - gbrain doctor 8b live embedding-provider probe (#665) - Voyage adaptive batch sizing for 120K-token cap (#680) - gbrain init A4 hard-error on existing-brain dim mismatch - docs/embedding-migrations.md with conditional-HNSW four-step recipe - #672 misleading migrate-suggestion error replaced with inline recipe Cluster C — CLI exec bit (#683, dupe of #655): - src/cli.ts mode 100644 → 100755 (#683 from @brandonlipman) - scripts/check-cli-executable.sh CI guard against future regression Cluster D — bun add -g foot-gun (#656, #658): - 3-signal detectInstallMethod rewrite (bun-link, repo.url, source-marker) - Loud-red recovery message names source-clone AND release-binary paths - README "DO NOT use bun add -g gbrain" callout Contributors: @brandonlipman (#682, #683), @mdcruz88 (#668), @ChenyqThu (#627), @alan-mathison-enigma (#610), @oyi77 (#652 building block), @abkrim (#655), @100yenadmin (#641). VERSION 0.27.0 → 0.28.5 package.json 0.27.0 → 0.28.5 schema-embedded.ts regenerated via bun run build:schema llms-full.txt regenerated via bun run build:llms Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@100yenadmin

…bun-link foot-gun (#697) * fix(engines): pre-add v0.20 + v0.26.3 forward-reference columns in bootstrap The forward-reference bootstrap (PostgresEngine + PGLiteEngine applyForwardReferenceBootstrap) covered v0.18 + v0.19 + v0.26.5 columns but missed two later groups. Brains upgrading from v0.14-era to current master crash before the migration ladder runs: 1. v0.20 Cathedral II — content_chunks.search_vector, parent_symbol_path, doc_comment, symbol_name_qualified. `CREATE INDEX idx_chunks_search_vector` and `CREATE INDEX idx_chunks_symbol_qualified` in schema.sql/PGLITE_SCHEMA_SQL crash with "column search_vector does not exist" / "column symbol_name_qualified does not exist". 2. v0.26.3 — mcp_request_log.agent_name, params, error_message. `CREATE INDEX idx_mcp_log_agent_time ON mcp_request_log(agent_name,...)` crashes with "column agent_name does not exist". Reproduces deterministically on a v0.13/v0.14 brain upgraded straight to current master. The user hits the wall before any of v15-v36 can run. Both engines now probe for these columns and pre-add them via `ALTER TABLE ADD COLUMN IF NOT EXISTS` before SCHEMA_SQL runs. Migrations v26, v27, v33 still run later via runMigrations and remain idempotent (they handle backfill on top of the bootstrap-added columns). Test coverage extended in test/schema-bootstrap-coverage.test.ts: REQUIRED_BOOTSTRAP_COVERAGE now lists 6 new forward references; the strip-and-rebuild block drops the corresponding indexes/triggers so the test exercises a brain that pre-dates v0.20 + v0.26.3 migrations. Repro: brain on schema v13/v14 + run `gbrain init --migrate-only` against current master → fails. With this patch → succeeds; ladder runs to v36. * fix(engines): pre-add v0.27 subagent_messages.provider_id in bootstrap PR #682 covered v0.20 (chunks) + v0.26.3 (mcp_request_log) but missed v0.27's subagent_messages.provider_id. The composite index `idx_subagent_messages_provider ON subagent_messages (job_id, provider_id)` in PGLITE_SCHEMA_SQL crashes on brains pinned at v0.18-v0.26 because provider_id is the SECOND column in the composite — array-extraction patterns that scan only first-column references miss it entirely. This is the wedge surfaced by issue #670 (v0.22.0 → v0.27.0 init --migrate-only crashes with "column 'provider_id' does not exist") and contributing to #661/#657. Both engines now probe for subagent_messages.provider_id and pre-add the column via ALTER TABLE ADD COLUMN IF NOT EXISTS before SCHEMA_SQL runs. Migration v36 (subagent_provider_neutral_persistence_v0_27) still runs later via runMigrations and remains idempotent. Note on the test side: REQUIRED_BOOTSTRAP_COVERAGE is hand-maintained and just gained a v0.27 entry. v0.28.5's Step 3 replaces this array with a SQL parser that auto-derives coverage from PGLITE_SCHEMA_SQL, including composite-index columns. This commit is the targeted follow-up to PR #682's cherry-pick; A2's parser closes the class permanently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): conditional schema-init on connect (closes #651) Adds `hasPendingMigrations(engine)` next to `runMigrations` in migrate.ts: single getConfig('version') probe, returns true when current < LATEST_VERSION, defensively returns true on getConfig failure (treats wedged-config as pending). `connectEngine` in cli.ts now wraps `engine.initSchema()` in a probe gate: short-lived CLI calls (gbrain stats, query, doctor, etc.) on already-migrated brains skip the bootstrap-probe + SCHEMA_SQL replay + ledger-check entirely. Wedged brains still auto-heal — the probe says "yes pending" and initSchema runs as before. Building on oyi77's investigation in PR #652. Same correctness as #652's unconditional initSchema-on-every-connect, but no perf regression on the hot path. Failure non-fatal: if probe or init throws, log a hint and let subsequent operations surface the real error in context. Test coverage in test/migrate.test.ts: 3 cases covering fully-migrated (false), version-rewound (true), and missing-version-config (defensive true). Pairs with v0.28.5's X1 (post-upgrade auto-apply) — the upgrade path runs initSchema explicitly while every other code path that goes through connectEngine gets the cheap probe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(upgrade): post-upgrade auto-applies pending schema migrations (X1) Prior behavior: `gbrain upgrade` → `gbrain post-upgrade` → `apply-migrations` only WARNs at apply-migrations.ts:296-302 when schema version is behind LATEST_VERSION, telling the user to run `gbrain init --migrate-only`. 11 wedge incidents over 2 years have proven users don't read that WARN — they file an issue instead. This commit makes `runPostUpgrade` explicitly call `engine.initSchema()` after the orchestrator migration pass, mirroring `init --migrate-only`'s flow. Side-effect: `gbrain upgrade` now walks away with a healthy brain in the cluster A wedge case (#670, #661, #657, #651, #625, #615, #609). Defensive: wrapped in try/catch so a connection or DDL failure falls back to the existing user-facing WARN. The hint to run `gbrain init --migrate-only` is preserved as the manual escape hatch. Pairs with v0.28.5's A1 (hasPendingMigrations probe in connectEngine): the upgrade path runs initSchema explicitly here, while every other code path that goes through connectEngine gets the cheap probe. Codex outside-voice review caught this gap during plan review: "the plan still does not prove `upgrade` will actually run schema migrations." This is the load-bearing fix that makes v0.28.5's headline outcome ("run upgrade, brain works") literally true for cluster A. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(bootstrap): auto-derive coverage from PGLITE_SCHEMA_SQL (A2) Replaces the hand-maintained REQUIRED_BOOTSTRAP_COVERAGE assertion with a SQL-parser-backed structural check. The new test: 1. parseIndexColumnReferences(PGLITE_SCHEMA_SQL) extracts every column referenced by every CREATE INDEX — including composite-index second and third columns. Codex outside-voice review caught that earlier first-col-only patterns missed v0.27's `idx_subagent_messages_provider ON subagent_messages (job_id, provider_id)`, which is exactly how the v0.28.5 wedge happened. 2. parseBaseTableColumns(PGLITE_SCHEMA_SQL) extracts every column declared in CREATE TABLE bodies (including via ALTER TABLE ADD COLUMN inside the schema blob). 3. parseAlterAddColumns(pglite-engine.ts source) extracts every column that applyForwardReferenceBootstrap adds. 4. Static contract: every (table, column) pair from step 1 must appear in either step 2 or step 3. Otherwise the test fails loud, names every uncovered pair, and points at the bootstrap function for the fix. Self-updating: any future CREATE INDEX added to PGLITE_SCHEMA_SQL on a column that bootstrap doesn't yet provide fails this test at PR time. No human required to remember to update an array. Closes the 11-incident wedge class identified in CLAUDE.md (#239, #243, #266, #357, #366, #374, #375, #378, #395, #396). Helper parsers also have their own unit tests covering composite-index second columns, function-wrapped columns (lower(col)), HNSW operator-class suffixes (vector_cosine_ops), and ALTER TABLE column extraction. Existing REQUIRED_BOOTSTRAP_COVERAGE-based tests preserved as a coarse-grained lower bound; the new parser-based test is the load-bearing structural gate going forward. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: support Voyage 2048d schema setup * fix: harden Voyage schema templating * feat: Voyage 4 embedding support + doctor eval - Add voyage-4-large/4/4-lite/4-nano + domain models to Voyage recipe - Fix AI SDK compatibility: strip encoding_format (Voyage rejects 'float'), patch response to add prompt_tokens from total_tokens - Add embedding_provider doctor check: live smoke test verifying model, API key, dimensions, and DB column alignment - Add embedding provider eval qrels for post-migration quality testing Closes: Voyage AI integration for gbrain embedding pipeline * fix: adaptive embed batch sizing for Voyage token limits Voyage's tokenizer is 3-4x denser than OpenAI tiktoken, causing batches of 50+ texts to exceed the 120K token-per-batch limit even when DB token counts (from tiktoken) suggest they'd fit. Changes: - Add max_batch_tokens to EmbeddingTouchpoint type (provider-declared limit) - Set Voyage recipe to 120K token limit - Gateway embed() now auto-splits batches using conservative char-to-token estimate (1:1 ratio, 80% budget utilization) - On token-limit errors, embedSubBatch recursively halves and retries (down to single-text batches before giving up) - Reduce embedding.ts BATCH_SIZE from 100 to 50 as a secondary guard - Add tests for batch splitting logic and error pattern matching Fixes infinite retry loops where the same oversized batch would fail repeatedly because WHERE embedding IS NULL re-fetches identical rows. * fix(init): error on existing-brain dim mismatch + embedding-migration recipe Adds A4 hard-error path: when `gbrain init --embedding-dimensions N` is run against an existing brain whose `content_chunks.embedding` column is a different `vector(M)`, init exits 1 with an inline four-step ALTER recipe and a pointer to docs/embedding-migrations.md. This kills the silent-corruption pattern surfaced by issue #673: the v0.27 schema seeded `('embedding_dimensions', '1536')` regardless of the flag, so users got a config saying 768 but a column at 1536 — first sync write blew up with "expected 1536, got 768." A4's contract: 1. Connect to engine BEFORE saveConfig so we can read the live column type 2. If column exists AND dim != requested, exit 1 (loud failure) 3. If column doesn't exist (fresh init) OR dim matches, proceed normally Recipe in docs/embedding-migrations.md (and inlined in init's error output) covers all four destructive steps codex's plan-review caught: 1. DROP INDEX IF EXISTS idx_chunks_embedding (HNSW won't survive ALTER) 2. ALTER TABLE content_chunks ALTER COLUMN embedding TYPE vector(N) 3. UPDATE content_chunks SET embedding = NULL, embedded_at = NULL 4. CREATE INDEX HNSW *only if N <= 2000* (pgvector cap) Step 4 is conditional: dims > 2000 (e.g. Voyage 4 Large 2048d) cannot be HNSW-indexed in pgvector; the recipe explicitly says "Skip reindex" in that case so the user doesn't paste a CREATE INDEX that crashes. Helper `readContentChunksEmbeddingDim` and message builder `embeddingMismatchMessage` live in src/core/embedding-dim-check.ts so doctor 8b (next commit) can reuse the same source of truth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): correct dim-mismatch error to point at manual ALTER recipe (#672) Previous error message recommended running `gbrain migrate --embedding-model … --embedding-dimensions …`, but `gbrain migrate` only handles engine migration (postgres ↔ pglite), not embedding reconfiguration. Following that hint produced a different error and confused users further. New message: - Names the actual options: change models OR migrate the existing brain - Inlines a one-line quick recipe (DROP INDEX → ALTER → UPDATE NULL → config set → embed --stale) - Points at docs/embedding-migrations.md (added in commit 306fc0e) for the full four-step recipe with HNSW conditional handling Closes #672. Note: #671 (config show hides embedding_model / dimensions) appears to be already fixed on master — `Object.entries(loadConfig())` in config.ts:24 correctly enumerates all keys including embedding_*. Will close #671 with that note when shipping v0.28.5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(types): doctor 8b uses portable executeRaw + Voyage fetch-shim cast #665's doctor 8b dim-probe used `engine.sql\`...\`` directly (Postgres template literal) which doesn't typecheck against the BrainEngine interface (only PostgresEngine has the .sql getter; PGLite does not). Refactored to use `readContentChunksEmbeddingDim` from src/core/embedding-dim-check.ts — same helper init's A4 hard-error path uses, runs portably on both engines. #680's Voyage fetch-shim passes a custom fetch handler to `createOpenAICompatible` for the encoding_format + prompt_tokens normalization. The SDK accepts the field at runtime but the typed parameter on the pinned version doesn't expose it. Cast to the parameter type so the shim ships without a type error. Both fixes are mechanical cleanup of cherry-picked PRs that didn't typecheck against current master's stricter shape. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cli): mark cli.ts executable so bun-linked installs work `package.json` declares `"bin": { "gbrain": "src/cli.ts" }`, and bun's linker creates `~/.bun/bin/gbrain` as a symlink to the file. The shebang `#!/usr/bin/env bun` works only when the target file is executable — otherwise bun runs it as a script (because it sees the script via the shebang interpreter), but executing the symlinked target itself fails: $ ls -la ~/.bun/bin/gbrain lrwxrwxrwx ... -> ../install/global/node_modules/gbrain/src/cli.ts $ ~/.bun/bin/gbrain --version /opt/homebrew/bin/bash: line 1: /Users/brandon/.bun/bin/gbrain: Permission denied This bites the postinstall hook that calls `gbrain apply-migrations` (masked by the `||` fallback) and any subprocess that invokes the binary by absolute path (e.g., subagent_messages migration v0.16's `execSync('gbrain init --migrate-only', ...)`). Setting the mode in-tree to 755 fixes both. No content change. * test(ci): guard against src/cli.ts mode-bit regression (cluster C) Cluster C cherry-pick (#683) restored the executable bit on src/cli.ts. This commit adds scripts/check-cli-executable.sh that asserts the git index mode is 100755 and wires it into `bun run verify` (and check:all). Why a CI guard: bun-link installs symlink to src/cli.ts directly. If the mode bit ever regresses to 100644, the very first `gbrain --version` fails with `permission denied` — the exact symptom that motivated #683. This guard runs in <100ms, fast enough for the inner verify loop. Failure mode: clear instructions on what command to run to fix (`chmod +x src/cli.ts && git add --chmod=+x src/cli.ts`) plus a pointer back to issue #683 so future maintainers know why the guard exists. Note: darwin and linux only. Windows preserves the git-stored mode regardless of filesystem chmod, so the index-mode check works the same on every platform CI uses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(upgrade): detect bun-link, warn on npm squatter (#656, #658) Rewrites detectInstallMethod() in src/commands/upgrade.ts:247 with three layered signals per v0.28.5 plan cluster D + codex finding C1: 1. bun-link signal (closes #656): when argv[1] is a symlink, walk up from realpath(argv[1]) up to 6 levels looking for a .git/config whose contents include `garrytan/gbrain` (case-insensitive substring). Returns 'bun-link'. Best-effort: forks, tarballs, and detached source trees fall through to the existing chain. 2. canonical bun authenticity check (closes #658 detection half): when the install lives in node_modules, read package.json and verify repository.url contains `garrytan/gbrain` OR src/cli.ts coexists (squatter ships compiled binary, not source). On 'suspect' verdict, print printSquatterRecovery() — names both git-clone AND release-binary recovery paths so users without a local clone can still recover. 3. Source-marker fallback inside (2). Codex flagged this is spoofable by a determined squatter; accepted — best-effort warning, not assertion. The structural fix is publishing under @garrytan/gbrain (tracked v0.29 follow-up). The squatter's `name: gbrain` field doesn't disambiguate (codex caught this in plan review of my original heuristic). repository.url is the field a careless squatter is least likely to set correctly; src/cli.ts presence is the secondary signal. bun-link installs return 'bun-link' from the switch in runUpgrade, which prints the source-clone upgrade path (`git pull && bun install && bun link`) instead of trying `bun update gbrain` which doesn't apply. README updated with the corresponding "DO NOT use `bun add -g gbrain`" callout naming both #658 and the v0.29 scoped-name plan. Tests in test/upgrade.test.ts cover return-type extension, bun-link signal shape, classifyBunInstall's two-signal check, and the recovery message contents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * v0.28.5 release: PGLite upgrade wedge + embedding dim corruption + bun-link foot-gun Fix wave bundling 9 community PRs to unwedge users stuck since v0.27. Cluster A — PGLite upgrade wedge (#670, #661, #657, #651, #625, #615, #609): - Bootstrap now covers v0.20+v0.26.3+v0.27 forward references (both engines) - hasPendingMigrations() probe gates initSchema() in connectEngine - Post-upgrade auto-applies pending schema migrations (X1) - SQL-parser-backed bootstrap coverage replaces hand-maintained array (A2) Cluster B — Embedding dim corruption (#673, #672, #666, #640): - Schema templating cascade fixed end-to-end (#641 from @100yenadmin) - gbrain doctor 8b live embedding-provider probe (#665) - Voyage adaptive batch sizing for 120K-token cap (#680) - gbrain init A4 hard-error on existing-brain dim mismatch - docs/embedding-migrations.md with conditional-HNSW four-step recipe - #672 misleading migrate-suggestion error replaced with inline recipe Cluster C — CLI exec bit (#683, dupe of #655): - src/cli.ts mode 100644 → 100755 (#683 from @brandonlipman) - scripts/check-cli-executable.sh CI guard against future regression Cluster D — bun add -g foot-gun (#656, #658): - 3-signal detectInstallMethod rewrite (bun-link, repo.url, source-marker) - Loud-red recovery message names source-clone AND release-binary paths - README "DO NOT use bun add -g gbrain" callout Contributors: @brandonlipman (#682, #683), @mdcruz88 (#668), @ChenyqThu (#627), @alan-mathison-enigma (#610), @oyi77 (#652 building block), @abkrim (#655), @100yenadmin (#641). VERSION 0.27.0 → 0.28.5 package.json 0.27.0 → 0.28.5 schema-embedded.ts regenerated via bun run build:schema llms-full.txt regenerated via bun run build:llms Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(e2e): v0.28.5 fix-wave end-to-end coverage PGLite-only E2E covering the three regression scenarios v0.28.5 was shipped to fix: 1. cluster A — pre-v0.20 brain (missing v0.20 + v0.26.3 + v0.27 columns) re-runs initSchema cleanly. Strips the column set v0.28.5's bootstrap claims to restore (search_vector, parent_symbol_path, doc_comment, symbol_name_qualified, agent_name, params, error_message, provider_id), resets the version row to 13, then re-runs initSchema. Asserts every column comes back AND version reaches LATEST_VERSION. Closes the gap that pre-v0.28.5 produced 11 wedge incidents. 2. cluster B — fresh init at non-default dims templates the column correctly (768d AND 2048d cases). The 2048d case explicitly verifies idx_chunks_embedding is NOT created (codex finding #8 — pgvector's HNSW cap is 2000). 3. A4 — existing-brain dim mismatch helper produces a recipe that inlines all four steps (DROP INDEX, ALTER TYPE, NULL, conditional reindex). Validates the conditional CREATE INDEX HNSW for dims <= 2000 AND its omission for dims > 2000. The recipe a user copy-pastes won't crash them on Voyage 4 Large. Plus a hasPendingMigrations() lifecycle test covering the four states (fresh / migrated / rewound / re-applied) — pairs with the unit test in test/migrate.test.ts but exercises the engine end-to-end. PGLite-only because none of these cases need real Postgres. Postgres-side bootstrap is covered by test/e2e/postgres-bootstrap.test.ts. Run: bun test test/e2e/v0_28_5-fix-wave.test.ts (no DATABASE_URL needed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: refactor embedding-dim-check.test.ts to canonical PGLite pattern Test-isolation lint (R3+R4) requires PGLiteEngine in beforeAll() context with afterAll() disconnect. Refactored to single-engine-per-file pattern; the fresh-brain test uses a one-off engine inside its own try/finally so the file-level engine stays at LATEST schema for the migrated-brain test. No behavior change to the assertions. `bun run verify` now passes clean (privacy + jsonb + progress + test-isolation + wasm + admin-build + cli-exec + typecheck). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(doctor): make 8b embedding-provider probe non-fatal (CI green) CI Tier 1 was failing on `gbrain doctor exits 0 on healthy DB` because the v0.28.5 doctor 8b check (cherry-picked from #665) pushed `status: 'fail'` in two non-fatal scenarios: 1. No API key configured (`isAvailable('embedding')` returns false) 2. Probe throws (network blip, transient 5xx, DNS, rate limit) Both are noise in CI and on offline workstations — the brain is healthy, the provider just isn't reachable from this environment. The v0.28.5 plan P1 decision called for non-fatal-on-offline behavior: > Doctor 8b probes live every run (taken as-is). Non-fatal on network > failure (warns rather than errors); silently skipped when no API key > configured. This commit aligns the implementation with that decision: - !available → status 'ok' with "Skipped (no provider credentials)" message so the run is visible in --json output without failing exit code - catch block → status 'warn' (was 'fail') so probe failures surface informationally without crashing CI / autopilot's periodic doctor runs The mismatch slipped past plan-time review because #665 was cherry-picked before P1 was finalized; the type-fix pass in 4c26e48 only adjusted the DB-column probe shape, not the API-availability gate. CI Tier 1 (Mechanical) — `test/e2e/mechanical.test.ts:1220` — "gbrain doctor exits 0 on healthy DB" now passes against a fresh Postgres without `OPENAI_API_KEY` / `VOYAGE_API_KEY` set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Brandon Lipman <brandon@offdeck.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Eva <eva@100yen.org> Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>

fix: support Voyage 2048d schema setup

fca5137

Copilot AI review requested due to automatic review settings May 5, 2026 10:30

Copilot started reviewing on behalf of 100yenadmin May 5, 2026 10:30 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread src/core/postgres-engine.ts Outdated

Comment thread src/core/postgres-engine.ts Outdated

fix: harden Voyage schema templating

169a79c

garrytan mentioned this pull request May 7, 2026

v0.28.5 fix(wave): PGLite upgrade wedge + embedding dim corruption + bun-link foot-gun #697

Merged

10 tasks

garrytan mentioned this pull request May 10, 2026

v0.32.0 feat: 5 new embedding recipes + discoverability pass (closes 17-PR cluster) #810

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support Voyage 4 Large 2048d schema setup#641

fix: support Voyage 4 Large 2048d schema setup#641
100yenadmin wants to merge 2 commits intogarrytan:masterfrom
electricsheephq:codex/upstream-voyage-2048

100yenadmin commented May 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

100yenadmin commented May 5, 2026

Uh oh!

blacksmith-sh Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

100yenadmin commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Plain-English Summary

What Changed

Root Cause

Human Walkthrough

Review-Fix Pass

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

100yenadmin commented May 5, 2026

Uh oh!

blacksmith-sh Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

100yenadmin commented May 5, 2026 •

edited

Loading