Skip to content

extract links --stale crashes with malformed array literal on calendar/meeting pages (addLinksBatch text[] serialization) #1861

@garrytan

Description

@garrytan

Summary

bun run src/cli.ts extract links --source brain --stale crashes partway through (~page 2200/353K) with a Postgres malformed array literal error. The payload in the error is the serialized context array from addLinksBatch, full of calendar-event text: Zoom URLs (?pwd=...), commas, em-dashes, and brace-looking fragments.

This blocks the entire --stale re-extraction sweep — one bad batch kills the run, so no edges get reconciled. It surfaced today when LINK_EXTRACTOR_VERSION_TS was bumped to 2026-05-31, marking everything stale and forcing a full re-extract.

Crash signature

malformed array literal: "{"(YC) [pin] https://ycombinator.zoom.us/j/95178948505?pwd=YmdFRWxXbWZadlNkaG9iNC9CYW12QT09, YC-SF-560-2-3 (15) [Z] — with [Mark Thurman](../../people/mark-thurman.md), [Doug Duhaime](../../people/doug-duhaime.md), [Kat Bernstein](../../people/..."
  at _addLinksBatchOnce -> sql` ... unnest(${contexts}::text[]) ... `

The braces in the error are postgres-js's own serialization of the JS contexts: string[] into a Postgres text[] literal. One or more context strings contain characters (embedded quotes, backslashes, the , delimiters inside long calendar event descriptions) that break the ::text[] cast — Postgres parses the serialized literal and rejects it as malformed.

Location

src/core/postgres-engine.ts -> _addLinksBatchOnce() (~line 2528):

const contexts = links.map(l => l.context || '');
...
FROM unnest(
  ${fromSlugs}::text[], ${toSlugs}::text[], ${linkTypes}::text[],
  ${contexts}::text[], ...
)

The context field for calendar/meeting edges carries the full raw event line (location + Zoom URL + attendee link list), which is exactly the kind of string that trips array-literal escaping.

Why it's a real bug

  1. One poisoned batch aborts the whole --stale sweep. No partial progress, no skip — the run dies. Today it meant a graph re-sync (needed to drop stale edges) could not complete at all.
  2. It's data-dependent and silent until a calendar page lands in a batch, so it recurs on every future --stale run / version bump.

Repro

cd /data/gbrain
LINK_EXTRACTOR_VERSION_TS=2026-05-31 bun run src/cli.ts extract links --source brain --stale
# dies ~2200 pages in with: malformed array literal: "{...calendar text...}"

Suspected root cause

postgres-js array-literal serialization of context strings containing embedded double-quotes / backslashes / braces is not being escaped to a form the ::text[] cast accepts. Likely a specific char combo (a quote next to a backslash, or a literal brace inside the text) the serializer doesn't quote correctly for the explicit-cast path.

Suggested fixes (pick one)

  1. Stop hand-casting to ::text[]. Bind arrays via postgres-js native array binding (driver binds each element as a parameter) instead of the fragile literal-string ${arr}::text[] cast.
  2. Per-batch fallback + isolation: on malformed array literal, retry the batch element-by-element so one bad row cannot kill 353K pages, and log the offending (from_slug, context) instead of aborting.
  3. Sanitize context before binding: strip/escape NULs, normalize embedded quotes/backslashes, cap length (these contexts are huge raw calendar lines anyway).

Option 1 is the durable fix; option 2 makes the sweep resilient regardless.

Impact right now

Source data is clean (legacy Perplexity phantom-edge + repeated-string corruption already purged in brain main), but the graph DB cannot be re-synced to drop stale edges until this crash is fixed or the sweep is made batch-resilient. A scoped run over only people/ is the current workaround to avoid the calendar pages.

Filed by Wintermute on Garry's behalf.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions