Summary
When the cron prompt-injection scanner blocks a job (raises the "exfil_curl_url" / "Cron prompts must not contain injection or exfiltration payloads" path in tools/cronjob_tools.py), the resulting exception kills the gateway's cron ticker thread silently. The gateway process stays up, systemd shows the unit as active (running), but no further cron job ever fires until the gateway is restarted. Heartbeat file ~/.hermes/cron/.tick.lock freezes at the moment of the scanner block.
This turns a single bad-job event into a complete cron outage for every job in jobs.json. The failure is invisible — no errors are written to gateway.log after the scanner WARNING, the systemd unit looks healthy, hermes gateway list reports the gateway as running.
Repro
- Create or edit a cron job whose assembled prompt contains a payload that triggers
_CRON_EXFIL_COMMAND_PATTERNS — easiest is a delivery-from-prompt block of the shape:
curl -sS -X POST "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/sendMessage" -d "chat_id=..." --data-urlencode "text=$REPORT"
(token variable substituted into the URL — exfil_curl_url).
- Trigger that job:
hermes cron run <job_id>.
- On the next tick the scheduler logs:
WARNING cron.scheduler: Job '...' (ID: ...): blocked by prompt-injection scanner — Blocked: prompt matches threat pattern 'exfil_curl_url'.
- The job is correctly marked
last_status: error, next_run_at advances. So far so good.
- From this point on, no further cron job fires.
.tick.lock stops updating. Other jobs whose next_run_at passes are simply skipped silently — no error, no warning, no last_run_at advance.
systemctl is-active hermes-gateway.service → active. Gateway process is still alive (Telegram polling, kanban dispatcher, etc. continue working). Only the cron ticker thread is dead.
- Restarting the gateway (
sudo systemctl restart hermes-gateway.service) restores cron ticking immediately.
Observed
- Gateway:
hermes-gateway.service running default profile, version installed today via hermes -p default gateway install --system.
- One healthy tick at 14:35:08 fired two cron jobs successfully.
- Scanner block triggered at 14:41:11 on a corpusiq-agent-profile job invoked from default-profile scheduler.
.tick.lock Modify timestamp froze at 14:41:11 and stayed there until manual systemctl restart at 14:49.
- Between 14:41 and 14:49, four other cron jobs had
next_run_at slots come and go with no fire.
Expected
The scanner block should mark the offending job as errored (already happens) and the ticker should continue ticking. One job's bad prompt should not silently take out the entire scheduler.
Probable cause (without reading the patch path)
The exception raised by the scanner appears to bubble out of the tick loop instead of being caught per-job. Wrapping the per-job processing block in a try/except around the scanner call (and any other prompt-assembly step) so that the ticker logs the failure and moves on would fix it.
A secondary improvement: when the cron ticker thread dies, the gateway should either (a) restart it, or (b) exit non-zero so systemd's Restart=always brings the gateway back. Right now the gateway process survives in a half-functional state, which makes the outage invisible to all health checks.
Environment
- Hermes: install at
~/.hermes/hermes-agent/venv, current main as of 2026-05-31.
- OS: Ubuntu under WSL2-style userspace, systemd-managed gateway via
hermes -p <profile> gateway install --system.
- Two gateway units active (default + named profile), neither raced; the failure was in the default-profile gateway only.
Happy to provide the exact systemd journal excerpt and the corresponding jobs.json snapshot (sanitized) if useful.
Summary
When the cron prompt-injection scanner blocks a job (raises the "exfil_curl_url" / "Cron prompts must not contain injection or exfiltration payloads" path in
tools/cronjob_tools.py), the resulting exception kills the gateway's cron ticker thread silently. The gateway process stays up, systemd shows the unit asactive (running), but no further cron job ever fires until the gateway is restarted. Heartbeat file~/.hermes/cron/.tick.lockfreezes at the moment of the scanner block.This turns a single bad-job event into a complete cron outage for every job in
jobs.json. The failure is invisible — no errors are written togateway.logafter the scanner WARNING, the systemd unit looks healthy,hermes gateway listreports the gateway as running.Repro
_CRON_EXFIL_COMMAND_PATTERNS— easiest is a delivery-from-prompt block of the shape:exfil_curl_url).hermes cron run <job_id>.last_status: error,next_run_atadvances. So far so good..tick.lockstops updating. Other jobs whosenext_run_atpasses are simply skipped silently — no error, no warning, nolast_run_atadvance.systemctl is-active hermes-gateway.service→active. Gateway process is still alive (Telegram polling, kanban dispatcher, etc. continue working). Only the cron ticker thread is dead.sudo systemctl restart hermes-gateway.service) restores cron ticking immediately.Observed
hermes-gateway.servicerunning default profile, version installed today viahermes -p default gateway install --system..tick.lockModify timestamp froze at 14:41:11 and stayed there until manualsystemctl restartat 14:49.next_run_atslots come and go with no fire.Expected
The scanner block should mark the offending job as errored (already happens) and the ticker should continue ticking. One job's bad prompt should not silently take out the entire scheduler.
Probable cause (without reading the patch path)
The exception raised by the scanner appears to bubble out of the tick loop instead of being caught per-job. Wrapping the per-job processing block in a try/except around the scanner call (and any other prompt-assembly step) so that the ticker logs the failure and moves on would fix it.
A secondary improvement: when the cron ticker thread dies, the gateway should either (a) restart it, or (b) exit non-zero so systemd's
Restart=alwaysbrings the gateway back. Right now the gateway process survives in a half-functional state, which makes the outage invisible to all health checks.Environment
~/.hermes/hermes-agent/venv, current main as of 2026-05-31.hermes -p <profile> gateway install --system.Happy to provide the exact systemd journal excerpt and the corresponding
jobs.jsonsnapshot (sanitized) if useful.