A Go-based watchdog daemon that monitors OpenClaw Gateway health and automatically restarts it when application-level failures are detected.
- Cross-Platform — Supports Linux, macOS, and Windows
- HTTP Health Monitoring — Polls
GET /healthendpoint every 30s (configurable) - Multi-Format Health Checks — Supports
{"ok":true},{"ready":true},{"mcpReady":true}, and{"status":"live"|"ready"|"ok"|"healthy"} - Application-Level Supervision — Detects application failures, timeouts, and degraded states that systemd cannot see
- Exponential Backoff with Jitter — Prevents restart storms on persistent failures (30s → 60s → 120s... max 5m)
- Single Instance — Uses platform-native file locking to prevent duplicate instances
- Graceful Shutdown — Handles SIGTERM/SIGINT with proper cleanup
- Dual Output Logging — JSON to file + human-readable text to stderr
- Prometheus Metrics — Optional
/metricsendpoint for monitoring - Flexible Configuration — TOML, environment variables, or CLI flags
- Multi-Service Guardian Mode — Orchestrates startup/shutdown of dependent services with topological ordering
- Monitor-Only Services — Services can be monitored without restart commands (Guardian mode)
- Child Process Cleanup — Prevents mcp-server process leaks during gateway restarts
- Deadlock Circuit Breaker — Prevents restart loops when failures persist
- Health Check Retries — Automatic retry with exponential backoff for transient failures (PR #4)
- Startup Grace Period — Configurable grace period after restart prevents false positives (PR #4)
| Platform | Service Manager | File Locking |
|---|---|---|
| Linux | systemd | flock(2) |
| macOS | launchd | flock(2) |
| Windows | Windows Service | LockFile API |
# Clone and build
git clone https://github.com/hrygo/openclaw-watchdog.git
cd openclaw-watchdog
make build
# Install
sudo make install
# Install systemd service (optional)
make install-service
# Start
systemctl --user start openclaw-watchdog# Build
make build-darwin
# Install
make install-darwin
# Install launchd service
make install-launchd
# Start
launchctl load ~/Library/LaunchAgents/openclaw-watchdog.plist# Build
go build -o openclaw-watchdog.exe ./cmd/watchdog
# Install (run as Administrator)
.\scripts\install-windows.bat# ~/.config/openclaw-watchdog/watchdog.toml
health_url = "http://localhost:18789/health"
check_interval = "30s"
timeout = "10s"
failure_threshold = 3
restart_timeout = "60s"
base_backoff = "30s"
max_backoff = "5m"
cooldown_checks = 5
metrics_addr = ":9090"# Multi-service orchestration
[[services]]
name = "gateway"
type = "http"
health_url = "http://localhost:18789/health"
restart_command = ["systemctl", "--user", "restart", "openclaw-gateway"]
check_interval = "30s"
failure_threshold = 3
[[services]]
name = "claude-mem-worker"
type = "http"
health_url = "http://localhost:37777/api/readiness"
restart_command = ["systemctl", "--user", "restart", "claude-mem-worker"]
depends_on = ["gateway"]
[deadlock]
max_restarts = 5
window = "10m"
half_open_after = "2m"
success_to_close = 3More configuration options: See docs/configuration.md
All paths are auto-detected based on platform. No configuration required:
| Platform | Config Location | Data Location |
|---|---|---|
| Linux | ~/.config/openclaw-watchdog/ |
~/.local/share/openclaw-watchdog/ |
| macOS | ~/Library/Application Support/openclaw-watchdog/ |
~/Library/Application Support/openclaw-watchdog/ |
| Windows | %APPDATA%\openclaw-watchdog\ |
%APPDATA%\openclaw-watchdog\ |
Enable metrics server with -metrics-addr :9090:
curl http://localhost:9090/metricsAvailable metrics:
| Metric | Type | Description |
|---|---|---|
openclaw_watchdog_total_checks |
counter | Total health checks |
openclaw_watchdog_failed_checks |
counter | Failed health checks |
openclaw_watchdog_timeout_checks |
counter | Timeout events (PR #4) |
openclaw_watchdog_unreachable_checks |
counter | Unreachable events (PR #4) |
openclaw_watchdog_total_restarts |
counter | Gateway restarts triggered |
openclaw_watchdog_current_backoff_seconds |
gauge | Current backoff duration |
openclaw_watchdog_consecutive_fails |
gauge | Consecutive failure count |
openclaw_watchdog_last_check_latency_ms |
gauge | Last check latency |
# Start
systemctl --user start openclaw-watchdog
# Stop
systemctl --user stop openclaw-watchdog
# Restart
systemctl --user restart openclaw-watchdog
# View status
systemctl --user status openclaw-watchdog
# View logs
journalctl --user -u openclaw-watchdog -f
# Uninstall
make uninstall-service# Load service
launchctl load ~/Library/LaunchAgents/openclaw-watchdog.plist
# Unload service
launchctl unload ~/Library/LaunchAgents/openclaw-watchdog.plist
# View logs
tail -f ~/Library/Logs/openclaw-watchdog.logFor Users:
- configuration.md - Complete configuration reference
- troubleshooting.md - Common issues and solutions
For Developers:
- AGENTS.md - Development guide for AI agents
- docs/architecture.md - System architecture and design
- docs/health-checks.md - Health check implementation details
- docs/git-workflow.md - Contribution workflow (MANDATORY)
Project History:
- docs/OPTIMIZATION_SUMMARY.md - PR #4 health check optimizations
- docs/INSTALLATION_SUMMARY.md - Installation process
- docs/EXECUTION_SUMMARY.md - Complete execution summary
Full Documentation Index: docs/README.md
- Check watchdog status:
systemctl --user status openclaw-watchdog - Verify restart command works manually
- Check logs:
journalctl --user -u openclaw-watchdog -n 50
Ensure Guardian mode is enabled with [[services]] configured. See docs/troubleshooting.md for details.
The deadlock circuit breaker will block restarts after too many failures. Check logs for circuit breaker open and investigate root cause.
More troubleshooting: docs/troubleshooting.md
- Go 1.21+
- GNU Make (optional)
# Build for current platform
make build
# Cross-compile for all platforms
make build-all
# Run tests
make test
# Run tests with coverage
go test ./... -race -coverIMPORTANT: All contributions MUST follow the Git Workflow. Do not push directly to the main repository.
- Fork the repository
- Create a feature branch
- Make changes and run tests
- Push to your fork (NOT upstream)
- Create Pull Request
See: docs/git-workflow.md for complete workflow with real examples.
MIT