Skip to content

fix: guard fcntl.flock(LOCK_UN) with try/except OSError in finally blocks#20529

Closed
vominh1919 wants to merge 1 commit into
NousResearch:mainfrom
vominh1919:fix/unguarded-flock-lock-un
Closed

fix: guard fcntl.flock(LOCK_UN) with try/except OSError in finally blocks#20529
vominh1919 wants to merge 1 commit into
NousResearch:mainfrom
vominh1919:fix/unguarded-flock-lock-un

Conversation

@vominh1919

Copy link
Copy Markdown
Contributor

Problem

fcntl.flock(fd, LOCK_UN) can raise OSError (e.g. EBADF on an already-closed fd) in finally blocks. When this happens:

  1. The exception replaces the original exception being propagated, making debugging much harder
  2. The subsequent fd.close() is skipped, leaving the file lock held forever

The msvcrt (Windows) unlock path was already guarded with try/except (OSError, IOError) in all locations. The fcntl (POSIX) path was not.

Fix

Wrap all unguarded fcntl.flock(fd, LOCK_UN) calls in finally blocks with try/except OSError: pass.

Files changed (5 locations):

File Function Impact
tools/memory_tool.py _file_lock() Memory file locking — lock held forever on error
tools/environments/file_sync.py _sync_back() Sync-back file locking — lock held forever on error
cron/scheduler.py _run_ticks() Cron job singleton lock — blocks all future cron runs
agent/shell_hooks.py _allowlist_lock() Shell allowlist locking — blocks all shell hook operations
hermes_cli/auth.py _auth_file_lock() Auth store locking — blocks all auth operations

Before vs After

Scenario Before After
flock(LOCK_UN) raises EBADF Exception replaces original, fd.close() skipped, lock held forever Exception swallowed, fd.close() runs, lock released
Normal unlock Works Works (no change)

Tests

No existing tests cover these code paths (the flock error scenario is hard to trigger in unit tests). The fix is defensive-only — it cannot break existing behavior since flock(LOCK_UN) only raises when the fd is already invalid, which means the lock is already released.

References

The codebase already has properly guarded examples:

  • gateway/status.py:308_release_file_lock() wraps flock(LOCK_UN) in try/except OSError
  • agent/google_oauth.py:229 — wraps in try/except ImportError

This PR applies the same pattern consistently across all remaining locations.

…ocks

fcntl.flock(fd, LOCK_UN) can raise OSError (e.g. EBADF on an already-closed
fd) in finally blocks. When this happens, the exception replaces the original
exception being propagated, making debugging much harder, and the subsequent
fd.close() is skipped, leaving the lock held forever.

Fixed 5 locations across the codebase:
- tools/memory_tool.py (memory file locking)
- tools/environments/file_sync.py (sync-back file locking)
- cron/scheduler.py (cron job singleton lock)
- agent/shell_hooks.py (shell allowlist locking)
- hermes_cli/auth.py (auth store locking)

The msvcrt (Windows) unlock path was already guarded in all locations.
This brings the fcntl (POSIX) path to parity.

The codebase already has properly guarded examples in gateway/status.py
(_release_file_lock) — this commit applies the same pattern consistently.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cron Cron scheduler and job management tool/memory Memory tool and memory providers area/auth Authentication, OAuth, credential pools duplicate This issue or pull request already exists labels May 6, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #16274 — same fix (guard fcntl.flock(LOCK_UN) in finally blocks), same 5 files, same author. #16274 is already open and covers this exact change.

@teknium1

Copy link
Copy Markdown
Contributor

This is an automated hermes-sweeper review. Current main already has the defensive POSIX flock(..., LOCK_UN) guards this PR adds.

Evidence:

  • tools/memory_tool.py:233 wraps fcntl.flock(fd, fcntl.LOCK_UN) in try/except (OSError, IOError).
  • tools/environments/file_sync.py:293 wraps fcntl.flock(lock_fd, fcntl.LOCK_UN) before lock_fd.close().
  • cron/scheduler.py:2294 wraps the cron lock unlock path.
  • agent/shell_hooks.py:635 wraps the shell allowlist unlock path.
  • hermes_cli/auth.py:1011 wraps the auth lock unlock path.
  • git blame shows these guards came from 62573f44cfee8895eec2cb18e7c45b9bff97081a (fix: guard yaml.safe_load, flock unlock, TOCTOU races, and atomic writes), which is contained in v2026.5.28 and later release tags.

The duplicate note in the PR discussion is also consistent with the current state: the exact five-file unlock guard change has landed on main.

@teknium1 teknium1 closed this Jun 11, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/auth Authentication, OAuth, credential pools comp/cron Cron scheduler and job management duplicate This issue or pull request already exists P2 Medium — degraded but workaround exists sweeper:implemented-on-main Sweeper: behavior already present on current main tool/memory Memory tool and memory providers type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants