Skip to content

fix: guard yaml/flock/TOCTOU/atomic writes across small surfaces (#28018)#28593

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-3ad7d98a
May 19, 2026
Merged

fix: guard yaml/flock/TOCTOU/atomic writes across small surfaces (#28018)#28593
teknium1 merged 1 commit into
mainfrom
hermes/hermes-3ad7d98a

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Salvage of #28018 by @vanthinh6886.

What: Several small defensive gaps across multiple modules that all share the same pattern (race condition between check + use, or no exception handler around a syscall that can fail under load/edge cases):

  • agent/copilot_acp_client.py: path.exists()path.read_text() is a TOCTOU race; replace with try/except FileNotFoundError.
  • agent/shell_hooks.py, cron/scheduler.py, hermes_cli/auth.py: fcntl.flock(.., LOCK_UN) in finally blocks could raise OSError if the lock fd was already closed; wrap in try/except.
  • gateway/sticker_cache.py: cache save was CACHE_PATH.write_text(...) (non-atomic, partial-write corruption risk); switch to tempfile.mkstemp + os.fsync + os.replace.

Why: None of these are bugs hit in the happy path — they're hardening against partial reads/writes, lock fd reuse, and races on heavily-loaded systems.

Original PR: #28018

1. trajectory_compressor.py: yaml.safe_load() returns None on empty
   files, crashing with TypeError on `if 'tokenizer' in data`. Fix by
   adding `or {}` fallback. (HIGH — blocks startup with empty config)

2. 6 files with fcntl.flock(LOCK_UN) in finally blocks without
   try/except: cron/scheduler.py, hermes_cli/auth.py,
   agent/shell_hooks.py, tools/skill_usage.py,
   tools/environments/file_sync.py, tools/memory_tool.py. If unlock
   raises OSError, fd.close() is skipped and the lock is held forever.
   The msvcrt branches already had try/except; the fcntl branches did
   not. Fix by wrapping in try/except (OSError, IOError): pass.

3. agent/copilot_acp_client.py line 639: TOCTOU race — path.exists()
   followed by path.read_text() with no try/except. If file is deleted
   between the check and the read, FileNotFoundError propagates. Fix
   by using try/except FileNotFoundError.

4. gateway/sticker_cache.py: non-atomic write via Path.write_text()
   can leave truncated JSON on crash, causing JSONDecodeError on next
   load. Fix by writing to tempfile + fsync + os.replace (atomic).
@teknium1 teknium1 merged commit 62573f4 into main May 19, 2026
@teknium1 teknium1 deleted the hermes/hermes-3ad7d98a branch May 19, 2026 07:12
@alt-glitch alt-glitch added type/security Security vulnerability or hardening P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery labels May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/security Security vulnerability or hardening

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants