Skip to content

[Bug]: Multiprocess logging can keep writing to rotated agent.log.N files #27649

@lsaether

Description

@lsaether

Bug Description

Low-priority observability/logging issue: when multiple Hermes processes are running, ~/.hermes/logs/agent.log can rotate while existing processes keep writing to the renamed file descriptor (for example agent.log.1). Newer processes may write to the new agent.log, while older TUI/gateway/slash-worker processes continue writing to the rotated file.

The practical symptom is that the live log stream can be split across agent.log and rotated backup files. Tools that follow only agent.log may miss current Hermes activity until they also inspect agent.log*.

I'm flagging this mostly because any source-level fix likely implies a small but real project decision about multi-process log rotation and/or adding a dependency. It does not seem urgent or user-facing in the normal chat path.

Steps to Reproduce

  1. Run several Hermes processes that share the same Hermes home/log directory, e.g. gateway plus multiple hermes --tui sessions and their TUI gateway/slash-worker subprocesses.
  2. Keep enough activity going for ~/.hermes/logs/agent.log to cross the configured rotation threshold.
  3. Inspect ~/.hermes/logs/agent.log* mtimes/sizes and process file descriptors for agent.log / agent.log.N.
  4. Observe that more than one log file can receive live writes after rotation.

Expected Behavior

After rotation, live Hermes logs should remain in a predictable current stream, or the logging design should make it clear that consumers need to aggregate multiple active log files.

Actual Behavior

Live writes can be split between the base log and a rotated backup file. In one local run, the newest mtime was on agent.log.1 while agent.log also had active writers.

Sanitized observation from the affected machine:

agent.log    size≈5 KB      mtime older
agent.log.1  size≈4.3 MB    mtime newer
agent.log.2  older backup
agent.log.3  older backup

process fds included:
- gateway process -> agent.log
- several hermes --tui / tui_gateway.entry / slash_worker processes -> agent.log.1
- newer slash_worker processes -> agent.log

Affected Component

  • CLI / TUI process logging
  • Gateway logging
  • Shared logging setup in hermes_logging.py

Messaging Platform (if gateway-related)

N/A / not tied to a specific messaging platform.

Debug Report

Not attached for now. This report is based on a sanitized local observation plus source inspection; a full debug share would include unrelated local config/session/log context. Happy to provide more targeted debug output if it would help.

Environment

  • OS: Arch Linux, kernel 7.0.5-arch1-1
  • Hermes: Hermes Agent v0.14.0 (2026.5.16)
  • Python used by Hermes: 3.11.15
  • Local logging config: logging.max_size_mb: 5, logging.backup_count: 3

I did not run an in-place hermes update because this checkout has local changes. I did fetch origin/main and confirmed hermes_logging.py on current origin/main still uses logging.handlers.RotatingFileHandler / _ManagedRotatingFileHandler for these files.

Additional Logs / Traceback (optional)

No traceback; this appears to be a logging/rotation behavior issue rather than a crash.

Root Cause Analysis (optional)

hermes_logging.py uses stdlib RotatingFileHandler through _ManagedRotatingFileHandler. That handler is process-local; with multiple processes sharing the same log file, a process that opened the file before rotation can keep writing to the old inode after the file is renamed to agent.log.1.

Relevant source shape on current origin/main:

from logging.handlers import RotatingFileHandler

class _ManagedRotatingFileHandler(RotatingFileHandler):
    ...

handler = _ManagedRotatingFileHandler(
    str(path), maxBytes=max_bytes, backupCount=backup_count, encoding="utf-8"
)

Proposed Fix (optional)

No strong prescription from me. This may be a project-level choice between keeping stdlib-only behavior/documenting that agent.log* can be active, or adopting a multi-process-safe rotation approach. The reason I am opening the issue is that the latter may involve a dependency decision rather than a purely local one-line change.

Are you willing to submit a PR for this?

Not immediately; happy to test or share more sanitized observations if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions