Matrix provider connection failure causes rapid gateway process crash loop

## Description

When the Matrix homeserver becomes unreachable (e.g., host server goes down), the gateway process enters a rapid crash loop — spawning a new process every ~2 seconds — rather than gracefully retrying or backing off.

## Steps to Reproduce

1. Configure OpenClaw with a Matrix channel pointing to a self-hosted Synapse instance
2. Take the Matrix homeserver offline (power off the host)
3. Observe gateway logs

## Expected Behavior

The Matrix provider should:
- Catch connection errors gracefully
- Use exponential backoff for reconnection attempts
- Keep the gateway process alive (other channels like webchat should remain functional)
- Respect `channelMaxRestartsPerHour` for health-monitor-initiated restarts

## Actual Behavior

The Matrix SDK appears to throw an uncaught exception on connection failure that **kills the entire Node.js process**. The macOS LaunchAgent (or systemd) immediately restarts it, which tries Matrix again, crashes again, creating a tight loop.

### Evidence from logs (2026-04-07):

- **04:16–08:01 BST**: Gateway process restarted with a new PID every ~2 seconds for 3.5+ hours
- PIDs increment by ~23 each time (e.g., 77007, 77030, 77053, 77096...)
- Each cycle: starts → Matrix connect attempt → process dies → LaunchAgent restarts
- `channelMaxRestartsPerHour` had no effect because it's the **process crashing**, not the health monitor restarting the channel
- Other channels (webchat) were repeatedly disconnected with `code=1012 reason=service restart`

### Separate from health monitor restarts

The Matrix provider also has an `auto-restart attempt N/10` mechanism that does use backoff — this works correctly. The crash loop is something else: an unhandled exception that takes down the entire gateway.

## Environment

- OpenClaw version: 2026.3.13
- Node.js: v22.22.0
- OS: macOS (arm64)
- Matrix homeserver: self-hosted Synapse on NixOS
- Matrix config: `allowPrivateNetwork: true`

## Workaround

Setting `gateway.channelMaxRestartsPerHour` and `gateway.channelStaleEventThresholdMinutes` helps with the health-monitor-initiated restarts but does not prevent the crash loop.

## Suggestion

The Matrix provider (or the SDK integration layer) needs a top-level try/catch or process-level unhandled rejection handler that prevents connection failures from crashing the gateway process. Connection errors should be caught and retried with backoff, keeping the rest of the gateway operational.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Matrix provider connection failure causes rapid gateway process crash loop #62376

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Evidence from logs (2026-04-07):

Separate from health monitor restarts

Environment

Workaround

Suggestion

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Matrix provider connection failure causes rapid gateway process crash loop #62376

Description

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Evidence from logs (2026-04-07):

Separate from health monitor restarts

Environment

Workaround

Suggestion

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions