Problem
When hermes-agent runs as a systemd service (StandardOutput=journal) and the journal pipe becomes unavailable (idle timeout, buffer exhaustion, socket reset), any print() call inside run_conversation() raises OSError: [Errno 5] Input/output error. This is a realistic production condition for any headless daemon deployment (systemd, Docker, nohup).
Two calls in run_agent.py sit in the critical failure path for cron jobs running with quiet_mode=True:
Line ~4062 — inside quiet_mode branch
if self.quiet_mode:
clean = self._strip_think_blocks(turn_content).strip()
if clean:
print(f" ┊ 💬 {clean}") # raises OSError when stdout pipe is broken
This fires during any tool-calling turn when the model produces intermediate commentary. The OSError becomes the exception e caught by the outer except Exception handler.
Line ~4228 — in except Exception error handler
except Exception as e:
error_msg = f"Error during OpenAI-compatible API call #{api_call_count}: {str(e)}"
print(f"❌ {error_msg}") # also raises OSError — now propagates out of run_conversation()
When the OSError from line ~4062 arrives here as e, this second print() also raises OSError. This propagates out of run_conversation() entirely, causing the cron scheduler to mark the job as status: "error" — the agent's completed work is never delivered.
Additional unguarded print() calls in the same hot loop
The same pattern exists at several points not gated by quiet_mode, reachable during any cron job run:
| Approx. line |
Triggered by |
| ~4064 |
Model context length discovery (first run per model) |
| ~4108 |
Interrupt received during API call |
| ~4153–4161 |
Any API retry (rate limit, timeout, network error) — most likely in production |
| ~4166 |
Interrupt detected during retry error handling |
| ~4458 |
All API retries exhausted |
Lines ~4153–4161 are the highest-risk: they fire on every transient API error (rate limits, network timeouts), which are common in production.
Observed failure
Confirmed on a deployment running as a systemd user service (StandardOutput=journal). Cron jobs scheduled at 06:00 and 13:00 UTC (when the system is idle and the journal pipe is stale) fail consistently with this traceback in the output file:
File "run_agent.py", line 4062, in run_conversation
print(f" ┊ 💬 {clean}")
OSError: [Errno 5] Input/output error
During handling of the above exception, another exception occurred:
File "run_agent.py", line 4228, in run_conversation
print(f"❌ {error_msg}")
OSError: [Errno 5] Input/output error
The same jobs run successfully at 22:00 UTC when the system has active user sessions and the journal pipe is healthy — confirming this is an environmental stdout availability issue, not a logic bug.
Fix
Wrap each affected print() in try/except OSError, falling back to logger.error() for calls inside error handlers (where losing the message would hide the root cause):
# Cosmetic lines (quiet_mode display, status messages) — silent drop is fine:
try:
print(f" ┊ 💬 {clean}")
except OSError:
pass
# Error handler lines — must not lose the message:
try:
print(f"❌ {error_msg}")
except OSError:
logger.error(error_msg)
# API retry error block (~4153–4161) — same pattern:
try:
print(f"{self.log_prefix}⚠️ API call failed ...")
except OSError:
logger.warning(...)
Related issues
This is more severe than those issues because it actively crashes the job rather than just losing a log line.
Problem
When hermes-agent runs as a systemd service (
StandardOutput=journal) and the journal pipe becomes unavailable (idle timeout, buffer exhaustion, socket reset), anyprint()call insiderun_conversation()raisesOSError: [Errno 5] Input/output error. This is a realistic production condition for any headless daemon deployment (systemd, Docker, nohup).Two calls in
run_agent.pysit in the critical failure path for cron jobs running withquiet_mode=True:Line ~4062 — inside
quiet_modebranchThis fires during any tool-calling turn when the model produces intermediate commentary. The
OSErrorbecomes the exceptionecaught by the outerexcept Exceptionhandler.Line ~4228 — in
except Exceptionerror handlerWhen the
OSErrorfrom line ~4062 arrives here ase, this secondprint()also raisesOSError. This propagates out ofrun_conversation()entirely, causing the cron scheduler to mark the job asstatus: "error"— the agent's completed work is never delivered.Additional unguarded
print()calls in the same hot loopThe same pattern exists at several points not gated by
quiet_mode, reachable during any cron job run:Lines ~4153–4161 are the highest-risk: they fire on every transient API error (rate limits, network timeouts), which are common in production.
Observed failure
Confirmed on a deployment running as a systemd user service (
StandardOutput=journal). Cron jobs scheduled at 06:00 and 13:00 UTC (when the system is idle and the journal pipe is stale) fail consistently with this traceback in the output file:The same jobs run successfully at 22:00 UTC when the system has active user sessions and the journal pipe is healthy — confirming this is an environmental stdout availability issue, not a logic bug.
Fix
Wrap each affected
print()intry/except OSError, falling back tologger.error()for calls inside error handlers (where losing the message would hide the root cause):Related issues
fix: replace debug print() with logger.error() in file_tools(same root cause, different file)fix: log exceptions instead of silently swallowing in cron scheduler(same theme: silent failure in production daemon paths)This is more severe than those issues because it actively crashes the job rather than just losing a log line.