Skip to content

supervisor: max_crashes_exceeded gives up permanently on transient DB outages — needs retry backoff instead of hard stop #1994

@stevewandler

Description

@stevewandler

Problem

When the Supabase DB has a transient outage (port 5432 ECONNREFUSED), the worker supervisor hits its crash limit and sets max_crashes_exceeded, then stops permanently. It will not recover on its own even after the DB comes back online.

This is a poor failure mode for a transient network/DB issue. The supervisor should distinguish between application crashes (legitimate reason to stop retrying) and connection failures (should retry indefinitely with exponential backoff).

Observed behavior

~/.gbrain/worker-supervisor-error.log fills with:

Cannot connect to database: connect ECONNREFUSED <ip>:5432

After N crashes the supervisor sets max_crashes_exceeded and the gbrain jobs supervisor status shows:

Supervisor: not running
⚠ Max crashes exceeded at <timestamp>

Recovery requires manual intervention: gbrain jobs supervisor start --detach

Environment

  • gbrain 0.42.26.0
  • Supabase-hosted Postgres (direct connection, port 5432)
  • macOS, launchd-managed supervisor

Suggested fix

Two options (either would solve it):

  1. Distinguish crash types: Don't count DB connection failures toward max_crashes. Only count application-level crashes. Connection failures should retry with exponential backoff indefinitely (or with a configurable longer window).

  2. Configurable crash backoff: Add a supervisor.connection_error_retry_indefinitely config flag (default: true) so the supervisor keeps retrying on ECONNREFUSED/ETIMEDOUT without hitting the crash ceiling.

The current behavior turns a 10-minute Supabase hiccup into a permanently dead supervisor requiring manual intervention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions