bug: HNSW 2000-dim cap still hit on upgrade path despite #640 fix (zembed-1 2560d, schema v66 → v67)

## Summary

Follow-up to closed #640 — the schema-side fix (`applyChunkEmbeddingIndexPolicy` + `getPostgresSchema(dims, model)`) works for fresh installs, but **upgrading an existing brain still trips the same `column cannot have more than 2000 dimensions for hnsw index` error**, even when `~/.gbrain/config.json` correctly reports `embedding_dimensions: 2560`.

The fix didn't reach the upgrade path. Migration v67 (and presumably any future migration shipped on `master`) is gated behind a bootstrap stage that still emits the HNSW index.

## Repro

Pre-conditions:
- Brain on Postgres (Supabase Session pooler, port 5432).
- `content_chunks.embedding` was `ALTER COLUMN ... TYPE vector(2560)` for ZeroEntropy `zembed-1`.
- `~/.gbrain/config.json`:
  ```json
  {
    "engine": "postgres",
    "embedding_model": "zeroentropyai:zembed-1",
    "embedding_dimensions": 2560
  }
  ```
- `idx_chunks_embedding` HNSW dropped at switch-time (pgvector refused to keep HNSW on 2560-dim column — that part is well-understood).
- `idx_takes_embedding_hnsw` likewise dropped.
- Schema version: 66 (latest on v0.35.1.1).

Steps:
```bash
cd ~/gbrain
git pull origin master   # → v0.35.7.0 (1dadd9ed)
bun install              # postinstall runs `gbrain apply-migrations` (prints `All migrations up to date.` — misleading, see below)
gbrain init --migrate-only
```

Output (last 30 lines, after ~25 expected `42P07` NOTICEs):
```
{ severity: "NOTICE", code: "42P07",
  message: 'relation "idx_chunks_page" already exists, skipping', ... }
column cannot have more than 2000 dimensions for hnsw index
```
Exit code: 1. Migration v67 never runs.

## Expected

`getPostgresSchema(dims=2560, model='zembed-1')` should be invoked on the upgrade path (same way it's invoked on fresh-install). `applyChunkEmbeddingIndexPolicy` should replace the HNSW `CREATE INDEX` with the skip-comment, and `conn.unsafe(sqlText)` should pass cleanly.

## Hypothesis (for the maintainer to confirm / falsify)

Looking at `src/core/postgres-engine.ts` on `master`:

```ts
let dims = 1536;
let model = 'text-embedding-3-large';
try {
  const gw = await import('./ai/gateway.ts');
  dims = gw.getEmbeddingDimensions();
  model = gw.getEmbeddingModel().split(':').slice(1).join(':') || model;
} catch { /* gateway not yet configured — use defaults */ }

const sqlText = getPostgresSchema(dims, model);
// ...
await this.applyForwardReferenceBootstrap(conn);
await conn.unsafe(sqlText);
```

Two candidate roots:

**(a)** `gw.getEmbeddingDimensions()` returns the default `1536` instead of reading `embedding_dimensions: 2560` from `~/.gbrain/config.json`. The gateway likely sources from a separate config (env var? a different JSON file? a DB-resident setting?) that wasn't synced when the user switched embedders via direct SQL `ALTER COLUMN`.

**(b)** `applyForwardReferenceBootstrap` (which runs *before* `conn.unsafe(sqlText)`) emits an HNSW `CREATE INDEX` somewhere that the policy doesn't touch. Source-text scan against `master` doesn't show it in the bootstrap method, but the order-of-operations means **the error throws before `sqlText` even reaches the connection**, which is consistent with bootstrap being the culprit rather than the policy-applied SCHEMA_SQL.

The misleading `All migrations up to date.` line from the postinstall path is a separate UX bug worth fixing too — the `apply-migrations --yes --non-interactive` shell command appears to swallow the bootstrap failure silently, but `gbrain init --migrate-only` reproduces it deterministically.

## Suggested fixes

1. **Trace and patch the gap.** If hypothesis (a) is correct, make the gateway read `embedding_dimensions` from `~/.gbrain/config.json` as the authoritative source on the init/migrate path (or add an explicit `gbrain config set embedding_dimensions` setter that updates both gateway state and the on-disk JSON). If (b), apply the policy inside `applyForwardReferenceBootstrap` too.

2. **Defensive: drop existing HNSW before re-emit.** Even with (1) fixed, future contributors will reintroduce this class of bug whenever they touch the bootstrap. Add a guard: any time SCHEMA_SQL is about to emit `CREATE INDEX ... USING hnsw` on a `vector(N)` column, introspect `pg_attribute.atttypmod` first and skip when `(typmod - 4) > 2000`. That makes the bootstrap dim-aware **at the SQL layer** instead of relying on the gateway value being threaded correctly all the way down.

3. **Postinstall reporting.** Make `gbrain apply-migrations --yes --non-interactive` exit non-zero when bootstrap fails, instead of printing `All migrations up to date.` over a hidden failure.

I'm happy to draft a PR with fix (2) + a regression test that:
1. Creates a fresh test brain with `embedding vector(2560)`,
2. Drops `idx_chunks_embedding`,
3. Runs `initSchema()`,
4. Asserts no HNSW is created and `runMigrations()` reaches the latest version.

## Why this matters now

`zembed-1` (2560-dim) and `zembed-large` (3072-dim) from ZeroEntropy beat OpenAI's `text-embedding-3-large` on long-document retrieval benchmarks; Voyage `voyage-3-large` is 2048-dim. The era where "1536 is the default and everything is HNSW-friendly" is ending. The #640 fix was the right shape but didn't cover the upgrade path — any user who switched embedders before v0.35.2 (when #640 landed) and now tries to upgrade past v0.35.1.1 will hit this.

## Workaround for affected users

Stay on v0.35.1.1 (DB schema v66 — latest reachable without tripping bootstrap):
```bash
cd ~/gbrain && git reset --hard f004a274 && bun install && bun link
```
v67 (typed-claim columns on `facts`) is the only post-66 migration today, and it's purely additive, so rollback loses no data. But all future migrations are blocked on this getting fixed.

## Environment

- gbrain `master` HEAD = `1dadd9ed` (v0.35.7.0)
- bun 1.3.13, macOS Darwin 25.3.0 (Apple Silicon)
- Postgres: Supabase Session pooler (`aws-1-ap-northeast-1.pooler.supabase.com:5432`), pgvector 0.7+
- `GBRAIN_DISABLE_DIRECT_POOL=1` (IPv6 workaround; immaterial to this bug)
- Install method: `git clone` + `bun link`
- Brain stats: 250 pages / 4887 chunks, all embedded at 2560-dim


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: HNSW 2000-dim cap still hit on upgrade path despite #640 fix (zembed-1 2560d, schema v66 → v67) #1141

Summary

Repro

Expected

Hypothesis (for the maintainer to confirm / falsify)

Suggested fixes

Why this matters now

Workaround for affected users

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

bug: HNSW 2000-dim cap still hit on upgrade path despite #640 fix (zembed-1 2560d, schema v66 → v67) #1141

Description

Summary

Repro

Expected

Hypothesis (for the maintainer to confirm / falsify)

Suggested fixes

Why this matters now

Workaround for affected users

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions