Skip to content

Session write lock race condition causes intermittent 10s timeouts #15623

@1kuna

Description

@1kuna

Summary

There is a race condition in src/agents/session-write-lock.ts during the final release() path. The in-process re-entrancy map entry is deleted before async cleanup (closing the file handle + removing the lock file).

Bug details

On the final release, the code currently does:

  1. HELD_LOCKS.delete(sessionKey)
  2. await handle.close()
  3. await fs.rm(lockPath)

During the async gap after step (1), a concurrent acquire in the same Node process sees no HELD_LOCKS entry but the lock file still exists on disk, so it falls back to the filesystem retry loop and can spin until the 10s acquire timeout.

This is especially likely if handle.close() is slow or rejects (today a close() rejection prevents rm() from running), leaving a persistent lock file while the pid is still alive.

How to reproduce

Run high concurrency work that frequently touches the same session file from multiple tasks in the same process (e.g. multiple crons + subagents + heartbeat) so releases and acquires interleave.

Our config that triggers this intermittently:

  • maxConcurrent: 4
  • subagents.maxConcurrent: 8
  • 6 cron jobs

Impact

Intermittent 10s lock acquisition timeouts (seen as FailoverError / agent failures) which cascade into failed runs / missed cron work.

Proposed fix

Add an in-memory releasing promise state to the held lock entry. On final release, set held.releasing before any awaits and have acquires that observe a releasing state await it (instead of spinning on the filesystem lock file). Also ensure fs.rm() runs even if handle.close() fails (close wrapped in catch / finally).

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions