Skip to content

Heartbeat polling silently stops after SIGUSR1 gateway restart #78187

@Suidge

Description

@Suidge

Bug Description

Heartbeat polling silently stops after a SIGUSR1 gateway restart. The heartbeat mechanism (configured with every: "30m") does not resume firing after the restart, even though the gateway process itself is running normally and cron jobs continue to execute on schedule.

Reproduction

  1. Gateway receives SIGUSR1 restart (via openclaw gateway restart or programmatic SIGUSR1)
  2. Heartbeat fires once immediately after restart
  3. Heartbeat never fires again until the next full gateway process restart

Evidence

Gateway log (logs/gateway.log):

2026-05-06T00:17:00.728+08:00 [gateway] received SIGUSR1; restarting
2026-05-06T00:17:01.345+08:00 [gateway] restart mode: full process restart (supervisor restart)
2026-05-06T00:17:08.569+08:00 [heartbeat] started    ← last heartbeat entry
# (no heartbeat entries for the next ~9.5 hours)

Cron jobs continued running normally during the gap (proves gateway was alive):

Job Last Run Status
Daily Memory Distillation (04:00) 05-06 04:00 ok
Morning Git EOD Cleanup (06:00) 05-06 06:00 ok
Morning Briefing (07:00) 05-06 07:00 ok
OpenRouter Monitor (08:00) 05-06 08:00 ok

openclaw status output during the gap showed:

Heartbeat: 30m (silvermoon), disabled (agent-codex), disabled (silvermoon-b)

— indicating heartbeat was configured but not actually firing.

Impact

  • All heartbeat-driven checks (email, GitHub notifications, todo reminders) were silently skipped for ~9.5 hours
  • No error logs or warnings about the heartbeat scheduler being stuck
  • User missed a todo reminder scheduled for 07:30

Environment

  • OpenClaw: 2026.5.4 (325df3e)
  • OS: macOS 26.4.1 (arm64)
  • Node: v25.9.0
  • Gateway: local LaunchAgent (active, pid 28130)
  • Heartbeat config: every: "30m", target: "feishu", directPolicy: "allow"

Expected Behavior

Heartbeat should resume its regular 30-minute polling cycle after a SIGUSR1 restart.

Notes

  • This is the second SIGUSR1 restart that day — multiple config changes were applied via gateway tool between 18:00–20:45 on May 5. Each restart was followed by at least one heartbeat firing, but the 00:17 restart on May 6 was the last one before the gap.
  • The gateway process itself was healthy (cron jobs, channels, sessions all functional).
  • Heartbeat resumed only after I manually triggered another restart via the gateway tool.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions