Skip to content

Fresh PGLite init can mismatch ZeroEntropy 1280 runtime with 1536 schema #1221

@MumuTW

Description

@MumuTW

Summary

Fresh PGLite installs can initialize the schema with the legacy OpenAI 1536-dim embedding column while the runtime gateway defaults to ZeroEntropy zembed-1 at 1280 dimensions. The first write/embed then fails with a dimension mismatch.

Observed behavior

On a fresh-ish local PGLite brain with no pages:

gbrain v0.37.1.0
config.json:
{
  "engine": "pglite",
  "database_path": "/home/opc/.gbrain/brain.pglite"
}

gbrain doctor --json reports the effective embedding provider as ZeroEntropy:

embedding_provider: ok / skipped
Model: zeroentropyai:zembed-1
connection: Connected, 0 pages

But a first gbrain put personal/goals.md fails when embedding:

expected 1536 dimensions, not 1280

The brain is empty, so this is not an old-data migration problem. The schema itself appears to have been initialized with vector(1536), while the active embedding path is producing 1280-dim ZeroEntropy vectors.

Why this looks like a default mismatch

The current runtime gateway defaults appear to be ZeroEntropy 1280:

// src/core/ai/gateway.ts
const DEFAULT_EMBEDDING_MODEL = 'zeroentropyai:zembed-1';
const DEFAULT_EMBEDDING_DIMENSIONS = 1280;

But several schema/init paths still appear to have legacy OpenAI defaults:

// src/core/pglite-schema.ts
export function getPGLiteSchema(dims: number = 1536, model: string = 'text-embedding-3-large'): string

Also found similar legacy fallbacks in PGLite/Postgres initialization paths:

src/core/pglite-engine.ts: let dims = 1536
src/core/postgres-engine.ts: let dims = 1536
src/core/schema-embedded.ts: embedding vector(1536)

This creates a split-brain default:

schema default: 1536 / OpenAI-style
runtime gateway default: 1280 / ZeroEntropy

Expected behavior

A new PGLite brain should be internally consistent.

Either:

  1. The schema initializes to ZeroEntropy's default dimensions:
embedding_model = zeroentropyai:zembed-1
embedding_dimensions = 1280
content_chunks.embedding = vector(1280)

or:

  1. If the schema intentionally remains 1536 for compatibility, the effective runtime embedding model/dimensions should also resolve to 1536 until the user explicitly runs a provider switch.

Suggested fix

Centralize the embedding defaults used by:

  • gateway runtime config
  • PGLite schema creation
  • Postgres schema creation
  • config seed rows
  • doctor/schema dimension checks

At minimum, gbrain init --pglite should seed embedding_model and embedding_dimensions consistently with the schema it creates.

doctor should also catch this before first embed/write by comparing:

actual schema vector width
configured/effective embedding_dimensions
configured/effective embedding_model

Reproduction outline

gbrain init --pglite
# with no explicit --model / --embedding-model args
# configure ZEROENTROPY_API_KEY or otherwise let runtime resolve to zeroentropyai:zembed-1

gbrain put personal/goals.md < some-file.md

Actual:

expected 1536 dimensions, not 1280

Expected:

put succeeds, or doctor/init catches and fixes the mismatch before embedding

Impact

This breaks the first write for a new PGLite brain even when the brain has 0 pages, because the schema width and runtime embedding provider dimensions disagree.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions