Bug
gbrain serve --http running on a stateless container (Railway, Fly, Docker, etc.) where the brain's content_chunks.embedding column is vector(1536) from a pre-v0.36.2.0 OpenAI install will silently fall back to DEFAULT_EMBEDDING_MODEL = 'zeroentropyai:zembed-1' (1280d) after upgrade. Every put_page then fails with:
{"error":"internal_error","message":"expected 1536 dimensions, not 1280"}
Reads continue to work because searchVector doesn't embed at query time when the cache is warm — so the issue surfaces only when writes start failing silently (signal-detector, ingestion skills, MCP put_page callers).
Root cause
src/core/config.ts:236-238:
"fields (embedding_model, etc.) keep their file/env-only loading because they..."
embedding_model and embedding_dimensions are NOT read from the DB config table — only from ~/.gbrain/config.json (file plane) or GBRAIN_EMBEDDING_MODEL / GBRAIN_EMBEDDING_DIMENSIONS env vars.
In a Railway / Fly / Docker deployment:
~/.gbrain/config.json doesn't exist (stateless container, no persistent home dir).
- No env var is set by default.
- →
configureGateway resolves both fields to undefined and applies DEFAULT_EMBEDDING_MODEL (which v0.36.2.0 flipped to zeroentropyai:zembed-1 / 1280d).
- The vector emitted at write time is 1280d; the column is
vector(1536); pgvector rejects the insert.
The v0.36.2.0 release notes describe the TTY-only ze-switch prompt and gbrain ze-switch --resume for recovery, but the failure mode for stateless server deployments isn't mentioned in the release notes or migration skill (skills/migrations/v0.36.2.0.md). The prompt skips in non-TTY by design, which is correct, but there's no fallback path that pins the existing model when the prompt is skipped.
Reproducer
- Provision a Supabase brain on v0.35 with default OpenAI
text-embedding-3-large (1536d).
- Deploy
gbrain serve --http on Railway / Fly / Docker without ~/.gbrain/config.json and without GBRAIN_EMBEDDING_MODEL.
- Upgrade the deployed binary to v0.36.2.0+.
- Call
put_page via the MCP. Observe expected 1536 dimensions, not 1280 on every write.
Suggested fixes
Three options, not mutually exclusive:
-
In loadConfigWithEngine(), also read embedding_model and embedding_dimensions from the DB config plane (with file/env still winning by precedence). The existing comment justifying the file/env-only path for these keys is from a different era and isn't obviously load-bearing now. Stateless hosts can then set the values once via gbrain config set against the remote DB.
-
In gbrain serve --http startup, refuse to start when the brain's content_chunks.embedding column width doesn't match the resolved embedding_dimensions (the existing embedding_width_consistency doctor check has the logic — fire it at startup instead of waiting for gbrain doctor to be invoked). Fail loud, paste-ready fix hint.
-
In the v0.36.2.0 migration skill, add a section for stateless host deployments explaining that they need to set GBRAIN_EMBEDDING_MODEL + GBRAIN_EMBEDDING_DIMENSIONS env vars OR run gbrain ze-switch against the host before the upgrade, or writes will silently break.
The first option is the structural fix; the second is the defense-in-depth; the third is the documentation patch.
Workaround (for anyone hitting this now)
Set both env vars on the host service and redeploy:
# Railway example
railway variables --set GBRAIN_EMBEDDING_MODEL=openai:text-embedding-3-large \
--set GBRAIN_EMBEDDING_DIMENSIONS=1536 \
--service gbrain-http
# Railway auto-redeploys when env vars change
After redeploy, put_page works again. No re-embed, no data loss, no schema change.
Environment
- gbrain v0.36.3.0 (also reproducible on v0.36.2.0)
- Topology 2 (cross-machine thin-client + remote
gbrain serve --http)
- Host: Railway with Supabase backend (pre-v0.36.2.0 brain,
vector(1536) column)
- Client: macOS thin-client, no local engine
Happy to test a candidate fix or open a PR for any of the three suggestions if useful.
Bug
gbrain serve --httprunning on a stateless container (Railway, Fly, Docker, etc.) where the brain'scontent_chunks.embeddingcolumn isvector(1536)from a pre-v0.36.2.0 OpenAI install will silently fall back toDEFAULT_EMBEDDING_MODEL = 'zeroentropyai:zembed-1'(1280d) after upgrade. Everyput_pagethen fails with:Reads continue to work because
searchVectordoesn't embed at query time when the cache is warm — so the issue surfaces only when writes start failing silently (signal-detector, ingestion skills, MCPput_pagecallers).Root cause
src/core/config.ts:236-238:embedding_modelandembedding_dimensionsare NOT read from the DBconfigtable — only from~/.gbrain/config.json(file plane) orGBRAIN_EMBEDDING_MODEL/GBRAIN_EMBEDDING_DIMENSIONSenv vars.In a Railway / Fly / Docker deployment:
~/.gbrain/config.jsondoesn't exist (stateless container, no persistent home dir).configureGatewayresolves both fields toundefinedand appliesDEFAULT_EMBEDDING_MODEL(which v0.36.2.0 flipped tozeroentropyai:zembed-1/ 1280d).vector(1536); pgvector rejects the insert.The v0.36.2.0 release notes describe the TTY-only ze-switch prompt and
gbrain ze-switch --resumefor recovery, but the failure mode for stateless server deployments isn't mentioned in the release notes or migration skill (skills/migrations/v0.36.2.0.md). The prompt skips in non-TTY by design, which is correct, but there's no fallback path that pins the existing model when the prompt is skipped.Reproducer
text-embedding-3-large(1536d).gbrain serve --httpon Railway / Fly / Docker without~/.gbrain/config.jsonand withoutGBRAIN_EMBEDDING_MODEL.put_pagevia the MCP. Observeexpected 1536 dimensions, not 1280on every write.Suggested fixes
Three options, not mutually exclusive:
In
loadConfigWithEngine(), also readembedding_modelandembedding_dimensionsfrom the DB config plane (with file/env still winning by precedence). The existing comment justifying the file/env-only path for these keys is from a different era and isn't obviously load-bearing now. Stateless hosts can then set the values once viagbrain config setagainst the remote DB.In
gbrain serve --httpstartup, refuse to start when the brain'scontent_chunks.embeddingcolumn width doesn't match the resolvedembedding_dimensions(the existingembedding_width_consistencydoctor check has the logic — fire it at startup instead of waiting forgbrain doctorto be invoked). Fail loud, paste-ready fix hint.In the v0.36.2.0 migration skill, add a section for stateless host deployments explaining that they need to set
GBRAIN_EMBEDDING_MODEL+GBRAIN_EMBEDDING_DIMENSIONSenv vars OR rungbrain ze-switchagainst the host before the upgrade, or writes will silently break.The first option is the structural fix; the second is the defense-in-depth; the third is the documentation patch.
Workaround (for anyone hitting this now)
Set both env vars on the host service and redeploy:
After redeploy,
put_pageworks again. No re-embed, no data loss, no schema change.Environment
gbrain serve --http)vector(1536)column)Happy to test a candidate fix or open a PR for any of the three suggestions if useful.