fix(sessions): reclaim orphan same-pid legacy lock files#32081
fix(sessions): reclaim orphan same-pid legacy lock files#32081steipete merged 2 commits intoopenclaw:mainfrom
Conversation
🔒 Aisle Security AnalysisWe found 1 potential security issue(s) in this PR:
1. 🟠 Race condition: orphan-self-pid reclaim can delete a freshly created lock and allow double acquisition
DescriptionThe new Because Impact scenarios:
Vulnerable behavior is introduced by forcing RecommendationDo not immediately reclaim/delete a contended lock solely because Safer options:
// inside the while loop, before attempting fs.open or right after EEXIST
if (allowReentrant) {
const heldNow = HELD_LOCKS.get(normalizedSessionFile);
if (heldNow) {
heldNow.count += 1;
return { release: async () => await releaseHeldLock(normalizedSessionFile, heldNow) };
}
}
This prevents deleting a freshly created lock that is merely missing Analyzed PR: #32081 at commit Last updated on: 2026-03-02T20:18:01Z |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1e919bf5d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const hasValidStarttime = isValidLockNumber(params.payload?.starttime); | ||
| if (hasValidStarttime) { | ||
| return false; | ||
| } | ||
| return !HELD_LOCKS.has(params.normalizedSessionFile); |
There was a problem hiding this comment.
Guard orphan-self reclaim against namespace PID collisions
This reclaim path can delete an actively-held lock when the session directory is shared across processes in different PID namespaces (for example, two containers where both writers run as PID 1) and the lock file is in legacy shape without a valid starttime. In that case pid === process.pid is true in both processes, HELD_LOCKS is local to the current process, and the second process will remove the first process’s live lock, allowing concurrent writers and possible session corruption; before this change it would block on the existing lock instead of force-reclaiming it.
Useful? React with 👍 / 👎.
Greptile SummaryAdded logic to reclaim orphan lock files when the lock file's PID matches the current process but lacks a valid Key changes:
Testing:
Confidence Score: 5/5
Last reviewed commit: a1e919b |
a1e919b to
960ee03
Compare
|
Landed via temp rebase onto main.
Thanks @bmendonca3! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 960ee03a80
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const hasValidStarttime = isValidLockNumber(params.payload?.starttime); | ||
| if (hasValidStarttime) { | ||
| return false; | ||
| } | ||
| return !HELD_LOCKS.has(params.normalizedSessionFile); |
There was a problem hiding this comment.
Avoid reclaiming same-PID locks without owner disambiguation
In the new orphan-self path, a contended lock is reclaimable whenever pid === process.pid, starttime is missing/invalid, and the current process has no in-memory held entry. Because HELD_LOCKS is process-local, this misidentifies live locks when multiple processes share the same lock directory but run in different PID namespaces with overlapping PIDs (for example, two containers both using PID 1): process B will delete process A's active legacy lock and proceed, allowing concurrent writers and possible session corruption. Before this change, that scenario blocked on the existing lock instead of force-reclaiming it.
Useful? React with 👍 / 👎.
Summary
Describe the problem and fix in 2–5 bullets:
*.jsonl.lockfiles (nostarttime) can block session writes fortimeoutMswhen container PID is reused (same PID appears alive).session file locked (timeout 10000ms)and all models on that session are effectively blocked until manual cleanup/restart.pid===process.pid+ missing/invalidstarttime+ no in-process held lock entry as stale (orphan-self-pid) and reclaims it.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
starttime) instead of timing out.Security Impact (required)
No)No)No)No)No)Yes, explain risk + mitigation:Repro + Verification
Environment
Steps
createdAt, but withoutstarttime(legacy shape).acquireSessionWriteLockfor the same session file withtimeoutMs=200and defaultstaleMs.Expected
Actual
session file locked (timeout 200ms): pid=<current> ...session.jsonl.lock.Evidence
Attach at least one:
Before patch repro output:
After patch repro output:
Passing tests/commands:
Human Verification (required)
What you personally verified (not just CI), and how:
starttimeis not reclaimed whenallowReentrant: false; active in-process lock with malformedstarttimeis not reclaimed.Compatibility / Migration
Yes)No)No)Failure Recovery (if this breaks)
sessions: reclaim orphan self-pid lock files.src/agents/session-write-lock.ts,src/agents/session-write-lock.test.ts.Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write
None.starttimeis missing/invalid, and no in-process held lock entry exists; regression tests cover active lock scenarios.