Skip to content

fix(logging): recover gateway.log handler from external rotation#34349

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-36d28e24
May 29, 2026
Merged

fix(logging): recover gateway.log handler from external rotation#34349
teknium1 merged 1 commit into
mainfrom
hermes/hermes-36d28e24

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

gateway.log keeps recording after external rotation. Previously, anything that renamed/unlinked the file outside of doRollover() (logrotate, manual mv, transient rm) left _ManagedRotatingFileHandler's open fd pinned to the rotated inode — every subsequent write went to gateway.log.1 forever, producing the visible symptom "gateway.log frozen mid-write while agent.log keeps growing with gateway.* records." Field-reported by a Discord user whose gateway.log stopped at 2026-05-26 17:31:37 but whose agent.log had gateway.run / hermes_plugins.* entries through 2026-05-28.

Changes

  • hermes_logging.py: _ManagedRotatingFileHandler now snapshots (dev,ino) of baseFilename and re-checks on every emit(). On mismatch it closes the stale stream and reopens at the expected path (WatchedFileHandler.reopenIfNeeded() pattern, adapted for rotating handlers). doRollover() refreshes the snapshot so our own rollovers aren't misread as external ones.
  • tests/test_hermes_logging.py: five regression tests — external rename, external unlink, external truncate (must NOT reopen — inode unchanged), normal doRollover() still works, and end-to-end reproduction (rotate + re-call setup_logging).

Root cause

The original CLI→gateway init-order bug (#8404) was fixed by #16229 (April 2026) — that gets the gateway.log handler attached. This is the sibling fix for what happens after attach: an open file descriptor doesn't follow path renames on POSIX, so once anything rotates the file out from under us, the handler silently writes to the wrong inode until the process restarts. The fix detects inode drift cheaply on each emit (os.stat is sub-microsecond on a hot file thanks to dentry cache) and reopens.

Validation

Before After
mv gateway.log gateway.log.1 then write line lands in .1, gateway.log never reappears line lands in fresh gateway.log
rm gateway.log then write line lost (writes to deleted inode) file recreated, line lands
: > gateway.log then write line lands (inode unchanged) line lands (no spurious reopen)
Handler-driven doRollover() works works (snapshot refreshed)

Targeted tests: tests/test_hermes_logging.py — 55/55 pass (50 existing + 5 new).
Broader: tests/gateway/ — 5972/5972 pass.

Related

Infographic

gateway.log fix

External rotation (logrotate, manual `mv gateway.log gateway.log.1`,
another process rotating the file) leaves `_ManagedRotatingFileHandler`'s
open fd pinned to the renamed inode. All subsequent writes go to the
rotated backup instead of the file every operator expects to read,
producing the symptom 'gateway.log frozen mid-write while agent.log
keeps growing with gateway.* records'.

PR #16229 fixed the original CLI->gateway init-order bug (#8404) so the
handler attaches in the first place. This is the sibling fix for what
happens after attach, when something external rotates underneath us.

Adds a WatchedFileHandler-style inode check on emit(): if baseFilename
no longer matches the open stream's (dev,ino), close the stale fd and
reopen at the expected path. doRollover() refreshes the snapshot so our
own rollover isn't misidentified as external.

Five regression tests cover the matrix: external rename, external
unlink, external truncate (must NOT trigger reopen — inode unchanged),
normal doRollover() (must still work), and the end-to-end
Allen-reproduction (rotate + re-call setup_logging).

55/55 tests in tests/test_hermes_logging.py pass; 5972/5972 in
tests/gateway/ pass.
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/gateway Gateway runner, session dispatch, delivery labels May 29, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Competing with #27681 — both add inode-aware reopen to _ManagedRotatingFileHandler for external rotation recovery. This PR targets gateway.log specifically; #27681 targets agent.log multiprocess rotation.

@teknium1 teknium1 merged commit 75d2c08 into main May 29, 2026
22 of 25 checks passed
@teknium1 teknium1 deleted the hermes/hermes-36d28e24 branch May 29, 2026 05:26
KKT-OPT pushed a commit to KKT-OPT/hermes-agent that referenced this pull request May 31, 2026
…sResearch#34349)

External rotation (logrotate, manual `mv gateway.log gateway.log.1`,
another process rotating the file) leaves `_ManagedRotatingFileHandler`'s
open fd pinned to the renamed inode. All subsequent writes go to the
rotated backup instead of the file every operator expects to read,
producing the symptom 'gateway.log frozen mid-write while agent.log
keeps growing with gateway.* records'.

PR NousResearch#16229 fixed the original CLI->gateway init-order bug (NousResearch#8404) so the
handler attaches in the first place. This is the sibling fix for what
happens after attach, when something external rotates underneath us.

Adds a WatchedFileHandler-style inode check on emit(): if baseFilename
no longer matches the open stream's (dev,ino), close the stale fd and
reopen at the expected path. doRollover() refreshes the snapshot so our
own rollover isn't misidentified as external.

Five regression tests cover the matrix: external rename, external
unlink, external truncate (must NOT trigger reopen — inode unchanged),
normal doRollover() (must still work), and the end-to-end
Allen-reproduction (rotate + re-call setup_logging).

55/55 tests in tests/test_hermes_logging.py pass; 5972/5972 in
tests/gateway/ pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants