Skip to content

Autopilot minion_jobs UPDATE can leave pgbouncer backend in ClientRead forever, holding RowExclusiveLock for days #361

@orendi84

Description

@orendi84

Summary

On Supabase Micro with the pooler (transaction mode, port 5432), an autopilot cycle's UPDATE minion_jobs SET lock_until = now() + ($1::double precision * interval '1 millisecond'), updated_at = now() occasionally leaves a backend session stuck in state='active', wait_event='ClientRead', holding a RowExclusiveLock on minion_jobs indefinitely. Seen twice on my instance: first at 22h age, then again at 23h51m age, blocking any subsequent ALTER TABLE minion_jobs (e.g. migration 15).

Observed state (from pg_stat_activity)

pid=213932 state=active wait=Client/ClientRead age=23:51:40
query=UPDATE minion_jobs SET lock_until = now() + ($1::double precision * interval '1 millisecond'), updated_at = now()

Held RowExclusiveLock on minion_jobs. Not naturally released; had to pg_terminate_backend() manually.

Suspected pattern

  1. Autopilot cycle (or a worker) issues a lock-refresh UPDATE on minion_jobs.
  2. The postgres.js client dies (launchctl reload, SIGKILL, crash) before COMMIT/ROLLBACK.
  3. Pgbouncer keeps the upstream Postgres backend alive but the client side is gone, so the backend sits in ClientRead waiting for the next command that never arrives.
  4. On Supabase Micro there's no default idle_in_transaction_session_timeout for pooler-held sessions, so these never time out on their own.
  5. Autovacuum can't clean them up because the session holds active locks, not idle.

Impact

  • Blocks schema migrations on the same table (reproduced: migration 15's ALTER hung until I terminated the backend).
  • Burns a connection slot on Micro (default 10 total from the pooler, smaller budget than it looks).
  • Dead weight that accumulates over time if autopilot restarts enough.

Suggested mitigations (any one helps; ideally #1)

  1. Set session-level timeouts in postgres.js on connect. idle_in_transaction_session_timeout = '2min' + statement_timeout = '5min' as the first statements after establishing a connection. Postgres.js exposes this via the connection.application_name / connection hook. This ensures an abandoned in-flight transaction gets killed on the server side even when the client vanishes.
  2. Wrap autopilot's lock-refresh UPDATE in sql.begin(async tx => ...). Not sure if it's currently a single-statement autocommit; if yes, this wouldn't change the situation, but if the statement is part of a longer implicit transaction, wrapping ensures ROLLBACK on client failure.
  3. Document for Supabase users: set ALTER DATABASE postgres SET idle_in_transaction_session_timeout = '5min'; at the DB level (requires direct connection, not pooler). Belt-and-suspenders even if feat: GBrain v0.1.0 — Postgres-native personal knowledge brain #1 is implemented.

Environment

  • gbrain 0.16.4 (autopilot via launchctl, ~150s cycle)
  • Supabase Micro (no PITR, no paid-tier restore-to-project) in eu-west-1
  • postgres.js 3.x via the session pooler (port 5432)
  • 15,580 pages, ~130k chunks in the brain

Happy to test a candidate fix on my instance if that helps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions