Skip to content

SessionWriteLockTimeoutError: long LLM turns block incoming system events (Wren/cron updates lost) #75949

@pagem7632-hcm

Description

@pagem7632-hcm

Summary

Long-running agent turns hold the session JSONL write lock for their entire duration. Incoming system events (e.g. cron heartbeats, openclaw agent calls from peer agents) arrive during the active turn and fail with SessionWriteLockTimeoutError after the 10s timeout. The delivery is silently dropped — no retry, no queue.

Observed error

SessionWriteLockTimeoutError: session file locked (timeout 10000ms): pid=3447 /Users/page/.openclaw/agents/nova/sessions/f6bf05fa-f292-4f94-8fc1-d89d7f34d74b.jsonl.lock
lane task error: lane=main durationMs=14610 error="SessionWriteLockTimeoutError: ..."
lane task error: lane=session:agent:nova:main durationMs=14614 error="SessionWriteLockTimeoutError: ..."

Incident

  • Date: 2026-05-02, ~13:41–14:21 AEST (03:41–04:21 UTC)
  • Platform: macOS (arm64), OpenClaw 2026.4.26, Node v24.14.0
  • Session: f6bf05fa-f292-4f94-8fc1-d89d7f34d74b (Nova agent, Telegram group channel)
  • Lock holder: PID 3447 (openclaw-gateway) — processing a multi-tool response turn with ~10 exec calls and file reads
  • Blocked sender: Wren (Codex app-server thread) attempting to deliver SDLC gate completion updates via openclaw agent --agent nova

Impact: Wren's updates (GATE-2 complete, GATE-3 active, GATE-4 contract tests passing) were lost for ~40 minutes. The receiving agent had no indication updates were dropped and reported stale state to the operator.

Steps to reproduce

  1. Start a long agent turn involving multiple tool calls (exec, file reads, API calls) — anything that takes >10s total
  2. While the turn is in progress, send a system event or openclaw agent message to the same session
  3. Observe SessionWriteLockTimeoutError in logs; the incoming message is dropped

Expected behaviour

Incoming system events should be queued during an active write-lock and delivered when the turn completes, or the lock timeout should be configurable to a longer value for multi-agent workflows.

Workaround

None currently. Peer agents must poll or retry manually after the turn completes.

Environment

  • OpenClaw: 2026.4.26 (be8c246)
  • Node: v24.14.0
  • OS: macOS Darwin 25.3.0 arm64
  • Channel: Telegram (group)
  • Model: anthropic/claude-sonnet-4-6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions