Summary
Follow-up to closed #640 — the schema-side fix (applyChunkEmbeddingIndexPolicy + getPostgresSchema(dims, model)) works for fresh installs, but upgrading an existing brain still trips the same column cannot have more than 2000 dimensions for hnsw index error, even when ~/.gbrain/config.json correctly reports embedding_dimensions: 2560.
The fix didn't reach the upgrade path. Migration v67 (and presumably any future migration shipped on master) is gated behind a bootstrap stage that still emits the HNSW index.
Repro
Pre-conditions:
- Brain on Postgres (Supabase Session pooler, port 5432).
content_chunks.embedding was ALTER COLUMN ... TYPE vector(2560) for ZeroEntropy zembed-1.
~/.gbrain/config.json:
{
"engine": "postgres",
"embedding_model": "zeroentropyai:zembed-1",
"embedding_dimensions": 2560
}
idx_chunks_embedding HNSW dropped at switch-time (pgvector refused to keep HNSW on 2560-dim column — that part is well-understood).
idx_takes_embedding_hnsw likewise dropped.
- Schema version: 66 (latest on v0.35.1.1).
Steps:
cd ~/gbrain
git pull origin master # → v0.35.7.0 (1dadd9ed)
bun install # postinstall runs `gbrain apply-migrations` (prints `All migrations up to date.` — misleading, see below)
gbrain init --migrate-only
Output (last 30 lines, after ~25 expected 42P07 NOTICEs):
{ severity: "NOTICE", code: "42P07",
message: 'relation "idx_chunks_page" already exists, skipping', ... }
column cannot have more than 2000 dimensions for hnsw index
Exit code: 1. Migration v67 never runs.
Expected
getPostgresSchema(dims=2560, model='zembed-1') should be invoked on the upgrade path (same way it's invoked on fresh-install). applyChunkEmbeddingIndexPolicy should replace the HNSW CREATE INDEX with the skip-comment, and conn.unsafe(sqlText) should pass cleanly.
Hypothesis (for the maintainer to confirm / falsify)
Looking at src/core/postgres-engine.ts on master:
let dims = 1536;
let model = 'text-embedding-3-large';
try {
const gw = await import('./ai/gateway.ts');
dims = gw.getEmbeddingDimensions();
model = gw.getEmbeddingModel().split(':').slice(1).join(':') || model;
} catch { /* gateway not yet configured — use defaults */ }
const sqlText = getPostgresSchema(dims, model);
// ...
await this.applyForwardReferenceBootstrap(conn);
await conn.unsafe(sqlText);
Two candidate roots:
(a) gw.getEmbeddingDimensions() returns the default 1536 instead of reading embedding_dimensions: 2560 from ~/.gbrain/config.json. The gateway likely sources from a separate config (env var? a different JSON file? a DB-resident setting?) that wasn't synced when the user switched embedders via direct SQL ALTER COLUMN.
(b) applyForwardReferenceBootstrap (which runs before conn.unsafe(sqlText)) emits an HNSW CREATE INDEX somewhere that the policy doesn't touch. Source-text scan against master doesn't show it in the bootstrap method, but the order-of-operations means the error throws before sqlText even reaches the connection, which is consistent with bootstrap being the culprit rather than the policy-applied SCHEMA_SQL.
The misleading All migrations up to date. line from the postinstall path is a separate UX bug worth fixing too — the apply-migrations --yes --non-interactive shell command appears to swallow the bootstrap failure silently, but gbrain init --migrate-only reproduces it deterministically.
Suggested fixes
-
Trace and patch the gap. If hypothesis (a) is correct, make the gateway read embedding_dimensions from ~/.gbrain/config.json as the authoritative source on the init/migrate path (or add an explicit gbrain config set embedding_dimensions setter that updates both gateway state and the on-disk JSON). If (b), apply the policy inside applyForwardReferenceBootstrap too.
-
Defensive: drop existing HNSW before re-emit. Even with (1) fixed, future contributors will reintroduce this class of bug whenever they touch the bootstrap. Add a guard: any time SCHEMA_SQL is about to emit CREATE INDEX ... USING hnsw on a vector(N) column, introspect pg_attribute.atttypmod first and skip when (typmod - 4) > 2000. That makes the bootstrap dim-aware at the SQL layer instead of relying on the gateway value being threaded correctly all the way down.
-
Postinstall reporting. Make gbrain apply-migrations --yes --non-interactive exit non-zero when bootstrap fails, instead of printing All migrations up to date. over a hidden failure.
I'm happy to draft a PR with fix (2) + a regression test that:
- Creates a fresh test brain with
embedding vector(2560),
- Drops
idx_chunks_embedding,
- Runs
initSchema(),
- Asserts no HNSW is created and
runMigrations() reaches the latest version.
Why this matters now
zembed-1 (2560-dim) and zembed-large (3072-dim) from ZeroEntropy beat OpenAI's text-embedding-3-large on long-document retrieval benchmarks; Voyage voyage-3-large is 2048-dim. The era where "1536 is the default and everything is HNSW-friendly" is ending. The #640 fix was the right shape but didn't cover the upgrade path — any user who switched embedders before v0.35.2 (when #640 landed) and now tries to upgrade past v0.35.1.1 will hit this.
Workaround for affected users
Stay on v0.35.1.1 (DB schema v66 — latest reachable without tripping bootstrap):
cd ~/gbrain && git reset --hard f004a274 && bun install && bun link
v67 (typed-claim columns on facts) is the only post-66 migration today, and it's purely additive, so rollback loses no data. But all future migrations are blocked on this getting fixed.
Environment
- gbrain
master HEAD = 1dadd9ed (v0.35.7.0)
- bun 1.3.13, macOS Darwin 25.3.0 (Apple Silicon)
- Postgres: Supabase Session pooler (
aws-1-ap-northeast-1.pooler.supabase.com:5432), pgvector 0.7+
GBRAIN_DISABLE_DIRECT_POOL=1 (IPv6 workaround; immaterial to this bug)
- Install method:
git clone + bun link
- Brain stats: 250 pages / 4887 chunks, all embedded at 2560-dim
Summary
Follow-up to closed #640 — the schema-side fix (
applyChunkEmbeddingIndexPolicy+getPostgresSchema(dims, model)) works for fresh installs, but upgrading an existing brain still trips the samecolumn cannot have more than 2000 dimensions for hnsw indexerror, even when~/.gbrain/config.jsoncorrectly reportsembedding_dimensions: 2560.The fix didn't reach the upgrade path. Migration v67 (and presumably any future migration shipped on
master) is gated behind a bootstrap stage that still emits the HNSW index.Repro
Pre-conditions:
content_chunks.embeddingwasALTER COLUMN ... TYPE vector(2560)for ZeroEntropyzembed-1.~/.gbrain/config.json:{ "engine": "postgres", "embedding_model": "zeroentropyai:zembed-1", "embedding_dimensions": 2560 }idx_chunks_embeddingHNSW dropped at switch-time (pgvector refused to keep HNSW on 2560-dim column — that part is well-understood).idx_takes_embedding_hnswlikewise dropped.Steps:
Output (last 30 lines, after ~25 expected
42P07NOTICEs):Exit code: 1. Migration v67 never runs.
Expected
getPostgresSchema(dims=2560, model='zembed-1')should be invoked on the upgrade path (same way it's invoked on fresh-install).applyChunkEmbeddingIndexPolicyshould replace the HNSWCREATE INDEXwith the skip-comment, andconn.unsafe(sqlText)should pass cleanly.Hypothesis (for the maintainer to confirm / falsify)
Looking at
src/core/postgres-engine.tsonmaster:Two candidate roots:
(a)
gw.getEmbeddingDimensions()returns the default1536instead of readingembedding_dimensions: 2560from~/.gbrain/config.json. The gateway likely sources from a separate config (env var? a different JSON file? a DB-resident setting?) that wasn't synced when the user switched embedders via direct SQLALTER COLUMN.(b)
applyForwardReferenceBootstrap(which runs beforeconn.unsafe(sqlText)) emits an HNSWCREATE INDEXsomewhere that the policy doesn't touch. Source-text scan againstmasterdoesn't show it in the bootstrap method, but the order-of-operations means the error throws beforesqlTexteven reaches the connection, which is consistent with bootstrap being the culprit rather than the policy-applied SCHEMA_SQL.The misleading
All migrations up to date.line from the postinstall path is a separate UX bug worth fixing too — theapply-migrations --yes --non-interactiveshell command appears to swallow the bootstrap failure silently, butgbrain init --migrate-onlyreproduces it deterministically.Suggested fixes
Trace and patch the gap. If hypothesis (a) is correct, make the gateway read
embedding_dimensionsfrom~/.gbrain/config.jsonas the authoritative source on the init/migrate path (or add an explicitgbrain config set embedding_dimensionssetter that updates both gateway state and the on-disk JSON). If (b), apply the policy insideapplyForwardReferenceBootstraptoo.Defensive: drop existing HNSW before re-emit. Even with (1) fixed, future contributors will reintroduce this class of bug whenever they touch the bootstrap. Add a guard: any time SCHEMA_SQL is about to emit
CREATE INDEX ... USING hnswon avector(N)column, introspectpg_attribute.atttypmodfirst and skip when(typmod - 4) > 2000. That makes the bootstrap dim-aware at the SQL layer instead of relying on the gateway value being threaded correctly all the way down.Postinstall reporting. Make
gbrain apply-migrations --yes --non-interactiveexit non-zero when bootstrap fails, instead of printingAll migrations up to date.over a hidden failure.I'm happy to draft a PR with fix (2) + a regression test that:
embedding vector(2560),idx_chunks_embedding,initSchema(),runMigrations()reaches the latest version.Why this matters now
zembed-1(2560-dim) andzembed-large(3072-dim) from ZeroEntropy beat OpenAI'stext-embedding-3-largeon long-document retrieval benchmarks; Voyagevoyage-3-largeis 2048-dim. The era where "1536 is the default and everything is HNSW-friendly" is ending. The #640 fix was the right shape but didn't cover the upgrade path — any user who switched embedders before v0.35.2 (when #640 landed) and now tries to upgrade past v0.35.1.1 will hit this.Workaround for affected users
Stay on v0.35.1.1 (DB schema v66 — latest reachable without tripping bootstrap):
v67 (typed-claim columns on
facts) is the only post-66 migration today, and it's purely additive, so rollback loses no data. But all future migrations are blocked on this getting fixed.Environment
masterHEAD =1dadd9ed(v0.35.7.0)aws-1-ap-northeast-1.pooler.supabase.com:5432), pgvector 0.7+GBRAIN_DISABLE_DIRECT_POOL=1(IPv6 workaround; immaterial to this bug)git clone+bun link