Summary
bun run src/cli.ts extract links --source brain --stale crashes partway through (~page 2200/353K) with a Postgres malformed array literal error. The payload in the error is the serialized context array from addLinksBatch, full of calendar-event text: Zoom URLs (?pwd=...), commas, em-dashes, and brace-looking fragments.
This blocks the entire --stale re-extraction sweep — one bad batch kills the run, so no edges get reconciled. It surfaced today when LINK_EXTRACTOR_VERSION_TS was bumped to 2026-05-31, marking everything stale and forcing a full re-extract.
Crash signature
malformed array literal: "{"(YC) [pin] https://ycombinator.zoom.us/j/95178948505?pwd=YmdFRWxXbWZadlNkaG9iNC9CYW12QT09, YC-SF-560-2-3 (15) [Z] — with [Mark Thurman](../../people/mark-thurman.md), [Doug Duhaime](../../people/doug-duhaime.md), [Kat Bernstein](../../people/..."
at _addLinksBatchOnce -> sql` ... unnest(${contexts}::text[]) ... `
The braces in the error are postgres-js's own serialization of the JS contexts: string[] into a Postgres text[] literal. One or more context strings contain characters (embedded quotes, backslashes, the , delimiters inside long calendar event descriptions) that break the ::text[] cast — Postgres parses the serialized literal and rejects it as malformed.
Location
src/core/postgres-engine.ts -> _addLinksBatchOnce() (~line 2528):
const contexts = links.map(l => l.context || '');
...
FROM unnest(
${fromSlugs}::text[], ${toSlugs}::text[], ${linkTypes}::text[],
${contexts}::text[], ...
)
The context field for calendar/meeting edges carries the full raw event line (location + Zoom URL + attendee link list), which is exactly the kind of string that trips array-literal escaping.
Why it's a real bug
- One poisoned batch aborts the whole
--stale sweep. No partial progress, no skip — the run dies. Today it meant a graph re-sync (needed to drop stale edges) could not complete at all.
- It's data-dependent and silent until a calendar page lands in a batch, so it recurs on every future
--stale run / version bump.
Repro
cd /data/gbrain
LINK_EXTRACTOR_VERSION_TS=2026-05-31 bun run src/cli.ts extract links --source brain --stale
# dies ~2200 pages in with: malformed array literal: "{...calendar text...}"
Suspected root cause
postgres-js array-literal serialization of context strings containing embedded double-quotes / backslashes / braces is not being escaped to a form the ::text[] cast accepts. Likely a specific char combo (a quote next to a backslash, or a literal brace inside the text) the serializer doesn't quote correctly for the explicit-cast path.
Suggested fixes (pick one)
- Stop hand-casting to
::text[]. Bind arrays via postgres-js native array binding (driver binds each element as a parameter) instead of the fragile literal-string ${arr}::text[] cast.
- Per-batch fallback + isolation: on
malformed array literal, retry the batch element-by-element so one bad row cannot kill 353K pages, and log the offending (from_slug, context) instead of aborting.
- Sanitize
context before binding: strip/escape NULs, normalize embedded quotes/backslashes, cap length (these contexts are huge raw calendar lines anyway).
Option 1 is the durable fix; option 2 makes the sweep resilient regardless.
Impact right now
Source data is clean (legacy Perplexity phantom-edge + repeated-string corruption already purged in brain main), but the graph DB cannot be re-synced to drop stale edges until this crash is fixed or the sweep is made batch-resilient. A scoped run over only people/ is the current workaround to avoid the calendar pages.
Filed by Wintermute on Garry's behalf.
Summary
bun run src/cli.ts extract links --source brain --stalecrashes partway through (~page 2200/353K) with a Postgresmalformed array literalerror. The payload in the error is the serializedcontextarray fromaddLinksBatch, full of calendar-event text: Zoom URLs (?pwd=...), commas, em-dashes, and brace-looking fragments.This blocks the entire
--stalere-extraction sweep — one bad batch kills the run, so no edges get reconciled. It surfaced today whenLINK_EXTRACTOR_VERSION_TSwas bumped to2026-05-31, marking everything stale and forcing a full re-extract.Crash signature
The braces in the error are postgres-js's own serialization of the JS
contexts: string[]into a Postgrestext[]literal. One or morecontextstrings contain characters (embedded quotes, backslashes, the,delimiters inside long calendar event descriptions) that break the::text[]cast — Postgres parses the serialized literal and rejects it as malformed.Location
src/core/postgres-engine.ts->_addLinksBatchOnce()(~line 2528):The
contextfield for calendar/meeting edges carries the full raw event line (location + Zoom URL + attendee link list), which is exactly the kind of string that trips array-literal escaping.Why it's a real bug
--stalesweep. No partial progress, no skip — the run dies. Today it meant a graph re-sync (needed to drop stale edges) could not complete at all.--stalerun / version bump.Repro
Suspected root cause
postgres-js array-literal serialization of
contextstrings containing embedded double-quotes / backslashes / braces is not being escaped to a form the::text[]cast accepts. Likely a specific char combo (a quote next to a backslash, or a literal brace inside the text) the serializer doesn't quote correctly for the explicit-cast path.Suggested fixes (pick one)
::text[]. Bind arrays via postgres-js native array binding (driver binds each element as a parameter) instead of the fragile literal-string${arr}::text[]cast.malformed array literal, retry the batch element-by-element so one bad row cannot kill 353K pages, and log the offending(from_slug, context)instead of aborting.contextbefore binding: strip/escape NULs, normalize embedded quotes/backslashes, cap length (these contexts are huge raw calendar lines anyway).Option 1 is the durable fix; option 2 makes the sweep resilient regardless.
Impact right now
Source data is clean (legacy Perplexity phantom-edge + repeated-string corruption already purged in brain
main), but the graph DB cannot be re-synced to drop stale edges until this crash is fixed or the sweep is made batch-resilient. A scoped run over onlypeople/is the current workaround to avoid the calendar pages.Filed by Wintermute on Garry's behalf.