fix: detect PID recycling in session write lock staleness check#26443
Closed
HirokiKobayashi-R wants to merge 3 commits intoopenclaw:mainfrom
Closed
fix: detect PID recycling in session write lock staleness check#26443HirokiKobayashi-R wants to merge 3 commits intoopenclaw:mainfrom
HirokiKobayashi-R wants to merge 3 commits intoopenclaw:mainfrom
Conversation
The session lock uses isPidAlive() to determine if a lock holder is still running. In containers, PID recycling can cause a different process to inherit the same PID, making the lock appear valid when the original holder is dead. Record the process start time (field 22 of /proc/pid/stat) in the lock file and compare it during staleness checks. If the PID is alive but its start time differs from the recorded value, the lock is treated as stale and reclaimed immediately. Backward compatible: lock files without starttime are handled with the existing PID-alive + age-based logic. Non-Linux platforms skip the starttime check entirely (getProcessStartTime returns null).
18 tasks
Member
|
Opened follow-up draft PR from my branch with the same fixes and changelog credit update: Reason: I could not rebase/push directly to |
Member
|
Cleanup note: this work landed via #31320 (merged), including the changelog credit update ( |
Contributor
Author
|
Sorry for the extra work — I'll make sure maintainer push access is enabled on future PRs. Thanks for landing it and the credit! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The session write lock uses
isPidAlive()to determine if a lock holder is still running. In containers (Docker/Kubernetes), PID recycling can cause a different process to inherit the same PID number, making a stale lock appear valid. This leads tosession file locked (timeout 10000ms)errors that persist until the 30-minute age-based stale threshold is reached.This PR adds process start time (
/proc/pid/statfield 22) comparison to detect PID recycling:starttime(clock ticks since boot) in the lock file alongsidepidandcreatedAtstarttimewith the currentstarttimeof that PIDstarttimediffers from the recorded value, the lock is treated as stale (recycled-pid) and reclaimed immediatelyWhy this happens in containers
spawn(detached: true)) frees the parent PID for reuse(…) &in entrypoint scripts) can inherit recycled PIDskill(pid, 0)returns true for the new (unrelated) process, and the lock appears validDesign
(pid, starttime)comparison is the technique used by psutil, and recommended by kernel developers for/proc/pid/stat-based process identity verificationlastIndexOf(")")parsing: The only correct way to parse/proc/pid/stat— thecommfield can contain spaces and nested parentheses (CVE-2017-1000367, CVE-2021-25683)starttime(from older versions) are handled with the existing PID-alive + age-based logic — no false reclamationgetProcessStartTime()returnsnullon non-Linux platforms; the field is omitted from lock files and the recycling check is skipped entirelyChanges
src/shared/pid-alive.tsgetProcessStartTime()— reads/proc/pid/statfield 22src/agents/session-write-lock.tsstarttimein lock file; detectrecycled-pidininspectLockPayload()src/shared/pid-alive.test.tsgetProcessStartTime(Linux mock, non-Linux, invalid PIDs, nested-paren comm fields)src/agents/session-write-lock.test.tsTest plan
vitest run)starttimeis reclaimed immediatelystarttimefield is NOT treated as recycledgetProcessStartTimereturnsnull, no behavioral changeGreptile Summary
This PR adds PID recycling detection to the session write lock mechanism by recording and comparing process start times. The implementation follows the industry-standard
(pid, starttime)approach used by psutil and recommended by kernel developers.Key changes:
getProcessStartTime()function that reads/proc/pid/statfield 22 (starttime in clock ticks since boot)starttimealongsidepidandcreatedAtstarttimewith current processstarttimeto detect PID recyclingrecycled-pidstale reasonstarttimeuse existing PID-alive + age-based logicgetProcessStartTime()returnsnullon non-Linux platforms, skipping recycling checkSecurity considerations:
lastIndexOf(")")parsing to handle comm fields with spaces and nested parentheses, avoiding CVE-2017-1000367 and CVE-2021-25683 vulnerabilitiesTest coverage:
Confidence Score: 5/5
Last reviewed commit: 49a6bf1