Skip to content

fix: guard fcntl.flock(LOCK_UN) with try/except OSError in finally blocks#23873

Open
vanthinh6886 wants to merge 4 commits into
NousResearch:mainfrom
vanthinh6886:fix/guard-flock-unlock
Open

fix: guard fcntl.flock(LOCK_UN) with try/except OSError in finally blocks#23873
vanthinh6886 wants to merge 4 commits into
NousResearch:mainfrom
vanthinh6886:fix/guard-flock-unlock

Conversation

@vanthinh6886

Copy link
Copy Markdown
Contributor

Problem

Several finally blocks call fcntl.flock(fd, LOCK_UN) without catching OSError. If the unlock fails (e.g., fd already closed, NFS stale handle, signal interruption), the exception propagates and skips subsequent cleanup code — most critically fd.close(), which leaks the file descriptor and leaves the lock held indefinitely.

The msvcrt (Windows) path is already guarded with try/except OSError in every case, but the POSIX fcntl path is not. This asymmetry means the bug only manifests on Linux/macOS.

Fix

Wrap each fcntl.flock(fd, LOCK_UN) call in try/except OSError: pass, matching the existing guard on the msvcrt path.

Fixed Sites

File Line Severity
cron/scheduler.py 1809 Highlock_fd.close() skipped on failure → fd leak + permanent lock
tools/environments/file_sync.py 292 Highlock_fd.close() skipped on failure → fd leak + permanent lock
agent/google_oauth.py 229 MediumOSError propagates through except ImportError, outer finally still closes fd
agent/shell_hooks.py 627 Low — inside with context manager, fd closed by CM, but exception propagates

Before vs After

# BEFORE — OSError skips close(), lock held forever
finally:
    fcntl.flock(lock_fd, fcntl.LOCK_UN)   # raises OSError
    lock_fd.close()                        # NEVER REACHED

# AFTER — unlock failure is tolerated, close() always runs
finally:
    try:
        fcntl.flock(lock_fd, fcntl.LOCK_UN)
    except OSError:
        pass
    lock_fd.close()                        # always reached

Tests

All 4 modified files compile cleanly (python3 -m py_compile). The fix is purely defensive — no behavioral change when flock succeeds.

vanthinh6886 and others added 4 commits May 11, 2026 14:23
…ocks

Several finally blocks call fcntl.flock(fd, LOCK_UN) without catching
OSError. If the unlock fails (e.g., fd already closed, NFS stale handle,
signal interruption), the exception propagates and skips subsequent
cleanup code — most critically fd.close(), which leaks the file descriptor
and leaves the lock held indefinitely.

The msvcrt path is already guarded in every case; this brings the POSIX
fcntl path to parity.

Fixed sites:
- cron/scheduler.py:1809 — lock_fd.close() skipped on failure
- agent/google_oauth.py:229 — OSError propagated through except ImportError
- tools/environments/file_sync.py:292 — lock_fd.close() skipped on failure
- agent/shell_hooks.py:627 — exception propagated from generator context
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder comp/cron Cron scheduler and job management P3 Low — cosmetic, nice to have duplicate This issue or pull request already exists labels May 11, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #16274 (which covers 5 files including shell_hooks.py, scheduler.py, file_sync.py). Also overlaps with #21719, #20529, and #23819 — all applying the same fcntl.flock(LOCK_UN) guard. This PR adds google_oauth.py as a new file but the core fix is the same pattern already addressed in the earlier PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cron Cron scheduler and job management duplicate This issue or pull request already exists P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants