Skip to content

Stale session lock files cause startup failure in containerized deployments #27252

@vinmaketeam

Description

@vinmaketeam

Problem

When OpenClaw runs inside a Docker container with a persistent /data volume, redeploying the container (e.g., via Hostinger panel, docker restart, or image update) leaves behind stale .lock files from the previous container's process.

On startup, the new container's gateway finds these lock files and fails with:

⚠ Agent failed before reply: session file locked (timeout 10000ms):
pid=28 /data/.openclaw/agents/main/sessions/<session-id>.jsonl.lock

The PID in the lock file (e.g., pid=28) belonged to the old container's process, which no longer exists. But the gateway treats the lock as valid and times out.

Expected Behavior

The gateway should validate that the PID in the lock file is still alive before honoring it. If the process is dead (which it always will be after a container restart), the lock should be automatically cleaned up.

Current Workaround

Manually restarting the session from the OpenClaw UI, or adding find /data/.openclaw/agents -name "*.lock" -delete to the container entrypoint. Neither survives managed deployments (e.g., Hostinger panel) where users cannot modify the Dockerfile.

Environment

  • Deployment: Hostinger VPS (KVM), managed Docker panel
  • OS: Ubuntu 24.04
  • Persistent volume: /data
  • Reproduction: 100% on every redeploy when a session was active

Suggested Fix

In the lock acquisition/check logic, verify the PID is alive (kill -0 <pid> or check /proc/<pid>) before treating the lock as valid. If the PID is dead, delete the stale lock and proceed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions