-
-
Notifications
You must be signed in to change notification settings - Fork 52.7k
Description
Summary
On hard reboot (or any ungraceful shutdown), session lock files written to ~/.clawdbot/agents/main/sessions/*.lock are not cleaned up. When the gateway restarts, it checks if the pid stored in the lock file is alive — but after a reboot, pids get reused by the OS. The reused pid (now belonging to a completely different process, e.g. clawmetry) is seen as "alive", so the gateway treats the stale lock as valid and refuses to open the session.
Reproduction
- Have an active session writing a lock file (e.g. pid 364)
- Hard reboot the machine
- On restart, pid 364 gets reused by a different process (e.g.
clawmetry) - Gateway starts, finds the lock file, checks if pid 364 is alive — it is, so it considers the lock held
- Any incoming message that triggers that session hits a 10s lock timeout and fails
Root Cause
The lock check verifies pid liveness but not process identity. A correct implementation should also verify that the pid belongs to a clawdbot process (e.g. check /proc/<pid>/cmdline or write a unique token into the lock file and verify it on read).
Workaround
Clear all *.lock files before the gateway starts on boot:
rm -f ~/.clawdbot/agents/main/sessions/*.lockWe added this to our pm2 startup wrapper as a temporary fix, but it should be handled by the gateway itself on startup.
Suggested Fix
When reading a lock file, verify process identity, not just liveness:
- Check
/proc/<pid>/cmdlinecontainsclawdbot - Or write a unique session token into the lock file and verify it on read
- Or clear stale locks on gateway startup before accepting connections
Environment
- OS: Linux (OrbStack VM, arm64)
- Node: v22.22.0
- clawdbot: latest