Cron: API failure incorrectly reported as last_status=ok, no error notification delivered

## Summary

When a cron job's LLM API call fails (e.g. timeout, retries exhausted), the job's `last_status` is incorrectly set to `"ok"` and no error notification is delivered to the user. The job appears to have succeeded even though the agent produced no useful output.

## Root Cause

Two issues in `cron/scheduler.py`:

### 1. `run_job()` ignores the agent's `failed` flag (primary bug)

`agent.run_conversation()` returns a result dict with `"failed": True, "completed": False, "error": "..."` when the LLM API call fails internally (e.g. all retries exhausted). However, `run_job()` only reads `result.get("final_response")` and ignores the `failed` / `completed` / `error` fields:

**`cron/scheduler.py` lines ~1099-1129:**
```python
final_response = result.get("final_response", "") or ""
# ... only uses final_response, never checks result.get("failed")
logger.info("Job '%s' completed successfully", job_name)
return True, output, final_response, None  # Always returns success=True
```

The agent's `run_agent.py` line ~11507 generates:
```python
_final_response = f"API call failed after {max_retries} retries: {_final_summary}"
return {
    "final_response": _final_response,
    "failed": True,        # <-- ignored by run_job
    "completed": False,    # <-- ignored by run_job
    "error": _final_summary,
}
```

Since `final_response` is non-empty (contains the error text), `_process_job`'s empty-response check at line ~1277 also doesn't trigger. Result: `last_status="ok"` with no error notification.

### 2. `_process_job()`: empty-response check runs AFTER delivery logic

The soft-failure detection for empty responses (line ~1277) happens after the delivery attempt (line ~1269). When `success=True` and `final_response=""`, delivery is skipped because `should_deliver = bool("") == False`, then success is corrected to False — but by then the delivery window has passed:

```python
# Delivery happens here (line ~1269) — already decided based on original success
if should_deliver:
    delivery_error = _deliver_result(...)

# Empty response check happens AFTER delivery (line ~1277)
if success and not final_response:
    success = False
    error = "Agent completed but produced empty response..."
```

## Reproduction

1. Configure a cron job with a custom provider that is slow/unreachable (e.g. a self-hosted endpoint)
2. Let the API call time out after all retries
3. Observe `last_status: "ok"`, `last_error: null`, `last_delivery_error: null`
4. No notification is delivered to the user
5. The output file contains the error text but is treated as a successful run

## Observed in Production

```
Job: "搞钱路子调研 - 早间 9:00" (ID: 5e91f26431f2)
Provider: custom (hrs.kstu.vip:10070)
Model: qwen3.6-27b
Last Run: 2026-04-30T01:05:55
Session: only 1 message (user prompt), no assistant reply
Output: "API call failed after 3 retries: Request timed out."
last_status: "ok"   ← wrong
last_error: null     ← should contain the timeout error
last_delivery_error: null
```

## Suggested Fix

**In `run_job()`** — check the agent's `failed` flag before returning success:

```python
_agent_failed = result.get("failed", False)
_agent_error = result.get("error") or ""
_agent_completed = result.get("completed", True)

# ... build output doc ...

if _agent_failed or not _agent_completed:
    error = _agent_error or final_response or "Agent failed to produce a response"
    return False, output, "", error

return True, output, final_response, None
```

**In `_process_job()`** — move failure detection before delivery so error notifications are sent:

```python
# 1. Detect failures FIRST
if success and not final_response:
    success = False
    error = "Agent completed but produced empty response ..."

# 2. Then deliver (failed jobs get error notification)
deliver_content = final_response if success else f"⚠️ Cron job failed:\n{error}"
```

Optionally, also attempt delivery in the outer `except Exception` block of `_process_job` so unexpected crashes also notify the user.

## Impact

- Users are not notified when cron jobs fail silently
- Failed jobs appear as successful in `hermes cron list`, making debugging difficult
- No way to detect the failure without manually checking output files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cron: API failure incorrectly reported as last_status=ok, no error notification delivered #17855

Summary

Root Cause

1. `run_job()` ignores the agent's `failed` flag (primary bug)

2. `_process_job()`: empty-response check runs AFTER delivery logic

Reproduction

Observed in Production

Suggested Fix

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cron: API failure incorrectly reported as last_status=ok, no error notification delivered #17855

Description

Summary

Root Cause

1. run_job() ignores the agent's failed flag (primary bug)

2. _process_job(): empty-response check runs AFTER delivery logic

Reproduction

Observed in Production

Suggested Fix

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `run_job()` ignores the agent's `failed` flag (primary bug)

2. `_process_job()`: empty-response check runs AFTER delivery logic