-
-
Notifications
You must be signed in to change notification settings - Fork 52.6k
Description
Problem
When OpenClaw runs inside a Docker container with a persistent /data volume, redeploying the container (e.g., via Hostinger panel, docker restart, or image update) leaves behind stale .lock files from the previous container's process.
On startup, the new container's gateway finds these lock files and fails with:
⚠ Agent failed before reply: session file locked (timeout 10000ms):
pid=28 /data/.openclaw/agents/main/sessions/<session-id>.jsonl.lock
The PID in the lock file (e.g., pid=28) belonged to the old container's process, which no longer exists. But the gateway treats the lock as valid and times out.
Expected Behavior
The gateway should validate that the PID in the lock file is still alive before honoring it. If the process is dead (which it always will be after a container restart), the lock should be automatically cleaned up.
Current Workaround
Manually restarting the session from the OpenClaw UI, or adding find /data/.openclaw/agents -name "*.lock" -delete to the container entrypoint. Neither survives managed deployments (e.g., Hostinger panel) where users cannot modify the Dockerfile.
Environment
- Deployment: Hostinger VPS (KVM), managed Docker panel
- OS: Ubuntu 24.04
- Persistent volume:
/data - Reproduction: 100% on every redeploy when a session was active
Suggested Fix
In the lock acquisition/check logic, verify the PID is alive (kill -0 <pid> or check /proc/<pid>) before treating the lock as valid. If the PID is dead, delete the stale lock and proceed.