fix(logging): recover gateway.log handler from external rotation#34349
Merged
Conversation
External rotation (logrotate, manual `mv gateway.log gateway.log.1`, another process rotating the file) leaves `_ManagedRotatingFileHandler`'s open fd pinned to the renamed inode. All subsequent writes go to the rotated backup instead of the file every operator expects to read, producing the symptom 'gateway.log frozen mid-write while agent.log keeps growing with gateway.* records'. PR #16229 fixed the original CLI->gateway init-order bug (#8404) so the handler attaches in the first place. This is the sibling fix for what happens after attach, when something external rotates underneath us. Adds a WatchedFileHandler-style inode check on emit(): if baseFilename no longer matches the open stream's (dev,ino), close the stale fd and reopen at the expected path. doRollover() refreshes the snapshot so our own rollover isn't misidentified as external. Five regression tests cover the matrix: external rename, external unlink, external truncate (must NOT trigger reopen — inode unchanged), normal doRollover() (must still work), and the end-to-end Allen-reproduction (rotate + re-call setup_logging). 55/55 tests in tests/test_hermes_logging.py pass; 5972/5972 in tests/gateway/ pass.
Collaborator
KKT-OPT
pushed a commit
to KKT-OPT/hermes-agent
that referenced
this pull request
May 31, 2026
…sResearch#34349) External rotation (logrotate, manual `mv gateway.log gateway.log.1`, another process rotating the file) leaves `_ManagedRotatingFileHandler`'s open fd pinned to the renamed inode. All subsequent writes go to the rotated backup instead of the file every operator expects to read, producing the symptom 'gateway.log frozen mid-write while agent.log keeps growing with gateway.* records'. PR NousResearch#16229 fixed the original CLI->gateway init-order bug (NousResearch#8404) so the handler attaches in the first place. This is the sibling fix for what happens after attach, when something external rotates underneath us. Adds a WatchedFileHandler-style inode check on emit(): if baseFilename no longer matches the open stream's (dev,ino), close the stale fd and reopen at the expected path. doRollover() refreshes the snapshot so our own rollover isn't misidentified as external. Five regression tests cover the matrix: external rename, external unlink, external truncate (must NOT trigger reopen — inode unchanged), normal doRollover() (must still work), and the end-to-end Allen-reproduction (rotate + re-call setup_logging). 55/55 tests in tests/test_hermes_logging.py pass; 5972/5972 in tests/gateway/ pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gateway.logkeeps recording after external rotation. Previously, anything that renamed/unlinked the file outside ofdoRollover()(logrotate, manualmv, transientrm) left_ManagedRotatingFileHandler's open fd pinned to the rotated inode — every subsequent write went togateway.log.1forever, producing the visible symptom "gateway.logfrozen mid-write whileagent.logkeeps growing withgateway.*records." Field-reported by a Discord user whosegateway.logstopped at2026-05-26 17:31:37but whoseagent.loghadgateway.run/hermes_plugins.*entries through2026-05-28.Changes
hermes_logging.py:_ManagedRotatingFileHandlernow snapshots(dev,ino)ofbaseFilenameand re-checks on everyemit(). On mismatch it closes the stale stream and reopens at the expected path (WatchedFileHandler.reopenIfNeeded()pattern, adapted for rotating handlers).doRollover()refreshes the snapshot so our own rollovers aren't misread as external ones.tests/test_hermes_logging.py: five regression tests — external rename, external unlink, external truncate (must NOT reopen — inode unchanged), normaldoRollover()still works, and end-to-end reproduction (rotate + re-callsetup_logging).Root cause
The original CLI→gateway init-order bug (#8404) was fixed by #16229 (April 2026) — that gets the gateway.log handler attached. This is the sibling fix for what happens after attach: an open file descriptor doesn't follow path renames on POSIX, so once anything rotates the file out from under us, the handler silently writes to the wrong inode until the process restarts. The fix detects inode drift cheaply on each emit (
os.statis sub-microsecond on a hot file thanks to dentry cache) and reopens.Validation
mv gateway.log gateway.log.1then write.1,gateway.lognever reappearsgateway.logrm gateway.logthen write: > gateway.logthen writedoRollover()Targeted tests:
tests/test_hermes_logging.py— 55/55 pass (50 existing + 5 new).Broader:
tests/gateway/— 5972/5972 pass.Related
RotatingFileHandlerrotation races) but a strictly different code path; this fix is single-process inode tracking.Infographic