Session write lock race condition causes intermittent 10s timeouts

## Summary
There is a race condition in `src/agents/session-write-lock.ts` during the final `release()` path. The in-process re-entrancy map entry is deleted before async cleanup (closing the file handle + removing the lock file).

## Bug details
On the final release, the code currently does:
1. `HELD_LOCKS.delete(sessionKey)`
2. `await handle.close()`
3. `await fs.rm(lockPath)`

During the async gap after step (1), a concurrent acquire in the *same Node process* sees no `HELD_LOCKS` entry but the lock file still exists on disk, so it falls back to the filesystem retry loop and can spin until the 10s acquire timeout.

This is especially likely if `handle.close()` is slow or rejects (today a `close()` rejection prevents `rm()` from running), leaving a persistent lock file while the pid is still alive.

## How to reproduce
Run high concurrency work that frequently touches the same session file from multiple tasks in the same process (e.g. multiple crons + subagents + heartbeat) so releases and acquires interleave.

Our config that triggers this intermittently:
- `maxConcurrent: 4`
- `subagents.maxConcurrent: 8`
- 6 cron jobs

## Impact
Intermittent 10s lock acquisition timeouts (seen as `FailoverError` / agent failures) which cascade into failed runs / missed cron work.

## Proposed fix
Add an in-memory `releasing` promise state to the held lock entry. On final release, set `held.releasing` *before* any awaits and have acquires that observe a `releasing` state await it (instead of spinning on the filesystem lock file). Also ensure `fs.rm()` runs even if `handle.close()` fails (close wrapped in `catch` / `finally`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Session write lock race condition causes intermittent 10s timeouts #15623

Summary

Bug details

How to reproduce

Impact

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Session write lock race condition causes intermittent 10s timeouts #15623

Description

Summary

Bug details

How to reproduce

Impact

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions