supervisor: max_crashes_exceeded gives up permanently on transient DB outages — needs retry backoff instead of hard stop

## Problem

When the Supabase DB has a transient outage (port 5432 ECONNREFUSED), the worker supervisor hits its crash limit and sets `max_crashes_exceeded`, then stops permanently. It will not recover on its own even after the DB comes back online.

This is a poor failure mode for a transient network/DB issue. The supervisor should distinguish between application crashes (legitimate reason to stop retrying) and connection failures (should retry indefinitely with exponential backoff).

## Observed behavior

`~/.gbrain/worker-supervisor-error.log` fills with:
```
Cannot connect to database: connect ECONNREFUSED <ip>:5432
```

After N crashes the supervisor sets `max_crashes_exceeded` and the `gbrain jobs supervisor status` shows:
```
Supervisor: not running
⚠ Max crashes exceeded at <timestamp>
```

Recovery requires manual intervention: `gbrain jobs supervisor start --detach`

## Environment

- gbrain 0.42.26.0
- Supabase-hosted Postgres (direct connection, port 5432)
- macOS, launchd-managed supervisor

## Suggested fix

Two options (either would solve it):

1. **Distinguish crash types**: Don't count DB connection failures toward `max_crashes`. Only count application-level crashes. Connection failures should retry with exponential backoff indefinitely (or with a configurable longer window).

2. **Configurable crash backoff**: Add a `supervisor.connection_error_retry_indefinitely` config flag (default: true) so the supervisor keeps retrying on ECONNREFUSED/ETIMEDOUT without hitting the crash ceiling.

The current behavior turns a 10-minute Supabase hiccup into a permanently dead supervisor requiring manual intervention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supervisor: max_crashes_exceeded gives up permanently on transient DB outages — needs retry backoff instead of hard stop #1994

Problem

Observed behavior

Environment

Suggested fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

supervisor: max_crashes_exceeded gives up permanently on transient DB outages — needs retry backoff instead of hard stop #1994

Description

Problem

Observed behavior

Environment

Suggested fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions