Skip to content

Double SIGUSR1 restart on model switch causes webchat message loss #9097

@justinhartbiz

Description

@justinhartbiz

Bug Description

When switching models via the webchat UI (e.g. /model opus), the gateway receives two SIGUSR1 signals in rapid succession (~2 seconds apart), causing a double restart. During this window, the webchat WebSocket disconnects twice and in-flight assistant responses can be silently lost.

Steps to Reproduce

  1. Open webchat UI
  2. Switch model (e.g. from sonnet to opus via config.apply)
  3. Observe gateway logs

Expected Behavior

Single clean restart: config.apply → SIGUSR1 → restart → webchat reconnects → stable.

Actual Behavior

Double restart:

  • config.apply at T+0 triggers reload detection → SIGUSR1 fix: add @lid format support and allowFrom wildcard handling #1
  • Gateway restarts, webchat disconnects (code=1012 reason=service restart)
  • Gateway comes back up at T+1s
  • Second SIGUSR1 at T+2s triggers another restart
  • Webchat disconnects again, reconnects again
  • Any assistant response in flight during this window is lost (appears as if the agent "crashed out" or the message never arrived)

Relevant Log Excerpt

2026-02-04T21:25:32.084Z [ws] config.apply 104ms
2026-02-04T21:25:32.578Z [reload] config change detected; evaluating reload
2026-02-04T21:25:32.620Z [gateway] signal SIGUSR1 received
2026-02-04T21:25:32.624Z [gateway] received SIGUSR1; restarting
2026-02-04T21:25:32.724Z [ws] webchat disconnected code=1012 reason=service restart
2026-02-04T21:25:33.015Z [gateway] agent model: anthropic/claude-opus-4-5
2026-02-04T21:25:33.015Z [gateway] listening on ws://127.0.0.1:18789
2026-02-04T21:25:34.023Z [ws] webchat connected
2026-02-04T21:25:34.153Z [gateway] signal SIGUSR1 received       ← SECOND signal
2026-02-04T21:25:34.154Z [gateway] received SIGUSR1; restarting  ← SECOND restart
2026-02-04T21:25:35.615Z [ws] webchat disconnected code=1012 reason=service restart
2026-02-04T21:25:35.640Z [gateway] agent model: anthropic/claude-opus-4-5
2026-02-04T21:25:36.449Z [ws] webchat connected                  ← finally stable

Additional Context

  • Also seeing: Config was last written by a newer OpenClaw (2026.2.1); current version is 0.0.0 — possible version detection issue contributing to the double-fire.
  • The config.apply RPC itself appears to both (a) send SIGUSR1 directly and (b) trigger the file-watcher reload path, which sends a second SIGUSR1. Likely needs deduplication or a debounce window.
  • macOS (Darwin arm64), Node v25.5.0, OpenClaw installed via npm

Suggested Fix

Debounce SIGUSR1 handling — if a restart is already in progress or was initiated within the last N seconds, ignore subsequent signals. Alternatively, ensure config.apply only triggers restart through one path (either the direct signal OR the file-watcher, not both).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions