Skip to content

Cron: delivery failure should not block or fail the main task #8846

@haru3613

Description

@haru3613

Problem

When a cron job completes successfully but Telegram delivery fails (e.g., 502 Bad Gateway, schema validation error), the job's overall status is reported as error — even though the actual task (LLM execution, tool calls, output generation) succeeded.

This conflates two independent concerns:

  1. Task execution — did the agent complete the work?
  2. Delivery notification — did the result reach the messaging platform?

A transient Telegram outage shouldn't make all cron jobs appear broken.

Observed behavior

  1. Cron job fires, agent runs, produces correct output saved to $HERMES_HOME/cron/output/<job-id>/<timestamp>.md
  2. Agent attempts to deliver result via Telegram
  3. Telegram returns 502 or schema validation error
  4. Job status flips to error despite successful execution
  5. hermes cron list shows Last run: ... error — misleading

Impact

  • Users see "all cron jobs errored" when only Telegram was flaky
  • Retry logic re-runs the entire task (expensive LLM calls) instead of just retrying delivery
  • Monitoring/alerting triggers false positives

Expected Behavior

  1. Task status and delivery status should be tracked independently:
    • task_status: ok | error
    • delivery_status: ok | failed | pending
  2. Delivery failure should trigger retry (delivery only, not the full task)
  3. hermes cron list should show both statuses separately
  4. Cron output file should be written regardless of delivery outcome (this already works)

Suggested Architecture

cron tick
  → spawn agent session (task execution)
    → task completes → write output file → task_status = ok
  → delivery pipeline (async, non-blocking)
    → attempt Telegram delivery
    → on failure: retry 3x with backoff
    → delivery_status = ok | failed
  → final job state = { task_status, delivery_status }

Current Workaround

SOUL.md instructs the agent to treat delivery failure as non-blocking and to report task/delivery status separately. This is a behavioral workaround; the underlying scheduler still records error.

Environment

  • Hermes Agent v0.8.0 (nousresearch/hermes-agent:latest, arm64)
  • Telegram platform in polling mode
  • 11 active cron jobs

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/cronCron scheduler and job managementsweeper:implemented-on-mainSweeper: behavior already present on current maintype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions