Sync can never complete: DB is cross-region (us-east-1) from compute (us-west-2) — 71ms/query × ~90 queries/file = 6.5s per 2KB page → full sync ~14 days → last_sync never commits → permanent staleness

## Symptom

`gbrain doctor` perpetually reports `sync_freshness` FAIL for the `default` source. `last_sync` has been frozen at `2026-06-04 03:39 UTC` for 4+ days. Running `gbrain sync --source default` (the doctor's suggested fix) never clears it. The staleness alarm fires every doctor cycle, gets 'fixed', and immediately re-fails. It is not flaky — it is structurally impossible to satisfy under the current topology.

## Root cause (measured, not guessed)

**Compute and database are in different AWS regions.**
- Compute (Render): `AWS_REGION=us-west-2`
- Supabase Postgres: `aws-1-us-east-1.pooler.supabase.com`
- Measured RTT, `SELECT 1` over the pooler (prepare=false, transaction mode): **71.7ms average** (connect 602ms cold).

The importer processes one file per transaction with multiple sequential queries each. Observed in the live sync log, every `people/*.md` page (avg **2,217 bytes**) logs `import.process_file slow ~6500ms`. A 2KB file taking 6.5s is not content cost — it is ~90 sequential cross-country roundtrips × 71ms.

### The math that proves it can never finish
- Files in `default`: **185,879**
- Per-file cost: ~6.5s (cross-region serial queries)
- Full sync wall time: **~14 days**
- But `sync-cron.sh` wraps sync in `timeout 1500` (25 min) and `timeout 1800` (30 min).

Each run imports ~230 files, gets SIGTERM'd, resumes from checkpoint next run, and **never reaches the terminal commit that writes `last_sync`.** So the timestamp never advances and the source is permanently 'stale' regardless of how much work actually happens. The downstream noise (`worker_oom_loop`, stalled autopilot-cycle jobs hitting `max_stalled=3`, cleared locks) is all secondary to this one fact.

## Fix (in order of impact)

1. **Colocate the DB with compute.** Move the Supabase project to `us-west-2` (or move the Render service to `us-east-1`). 71ms -> <2ms. Sync drops from ~14 days to hours. This is the actual fix.
2. **Batch the importer.** One transaction per file with N sequential queries is pathological at any non-trivial RTT. Use multi-row `COPY`/batched upserts per N files, and overlap roundtrips (pipeline / higher real parallelism) so latency stops serializing. This makes sync survivable even cross-region and is worth doing regardless.
3. **Decouple `last_sync` from full-corpus completion** OR raise/remove the `timeout` wrapper so a sync can actually reach its terminal commit. Right now a 14-day job under a 25-min timeout can never record progress as 'fresh'. At minimum, commit `last_sync` incrementally per checkpoint, not only at full completion.

## Evidence
- RTT 71.7ms measured via `postgres` client against `GBRAIN_DATABASE_URL`.
- `AWS_REGION=us-west-2` in process env; DB host `aws-1-us-east-1`.
- Live log: repeated `[gbrain phase] import.process_file slow 6xxx-7xxxms people/*.md`, every file >5s.
- `sources status`: `default` LAST SYNC stuck at `2026-06-04 03:39:13` across 4 days of doctor runs.
- `sync-cron.sh:41` `timeout 1800 ...`, `:58` `timeout 1500 ...`.

## Not the cause (ruled out)
- Worker OOM / 16GB max-rss: box has 93GB free; OOM-restarts are a *consequence* of the cycle running for hours, not the staleness cause.
- Stale locks / wedged queue: doctor auto-clears these; timestamp still never moves.
- Corrupt files: straylight had 24 null-byte (0x00) files (`--skip-failed` cleared them); separate issue, already handled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync can never complete: DB is cross-region (us-east-1) from compute (us-west-2) — 71ms/query × ~90 queries/file = 6.5s per 2KB page → full sync ~14 days → last_sync never commits → permanent staleness #1958

Symptom

Root cause (measured, not guessed)

The math that proves it can never finish

Fix (in order of impact)

Evidence

Not the cause (ruled out)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sync can never complete: DB is cross-region (us-east-1) from compute (us-west-2) — 71ms/query × ~90 queries/file = 6.5s per 2KB page → full sync ~14 days → last_sync never commits → permanent staleness #1958

Description

Symptom

Root cause (measured, not guessed)

The math that proves it can never finish

Fix (in order of impact)

Evidence

Not the cause (ruled out)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions