Skip to content

[Bug]: Scheduled cron runs timeout at ~60s in 2026.4.1 — manual triggers work fine (regression from 2026.3.31) #59579

@trevenag

Description

@trevenag

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

After upgrading from 2026.3.31 to 2026.4.1, all scheduled isolated agentTurn cron jobs timeout at ~60s with profileFailureReason: "timeout" at the Anthropic API call level; manually triggering the same jobs via cron run succeeds every time.

Steps to reproduce

  1. Have any agentTurn cron job with sessionTarget: "isolated" (tested with both claude-sonnet-4-20250514 and claude-opus-4-6)
  2. Wait for scheduled execution
  3. Job fails at ~60-64s with "Request timed out before a response was generated"
  4. Manually trigger the exact same job via cron run → succeeds every time (30-78s)

Expected behavior

Scheduled cron runs should complete successfully with the same timeout behaviour as manual triggers.

Actual behavior

All scheduled runs abort at ~60s. All manual triggers succeed. Tested across both models, multiple prompts (complex and minimal), various timeoutSeconds values (300, 600), and multiple gateway restarts. The timeout occurs at the Anthropic API request level (profileFailureReason: "timeout"), not the cron job timeout level.

Time Trigger Model Result Duration
11:27 AM Manual Sonnet ✅ OK 78s
12:01 PM Scheduled Sonnet ❌ Timeout 64s
1:01 PM Scheduled Sonnet ❌ Timeout 64s
3:11 PM Manual Sonnet ✅ OK 39s
4:01 PM Scheduled Sonnet ❌ Timeout 64s
7:51 PM Manual Opus ✅ OK 30s
8:00 PM Scheduled Opus ❌ Timeout 64s

OpenClaw version

2026.4.1 (da64a97) — regression from 2026.3.31

Operating system

Ubuntu 24.04 (Linux 6.8.0-106-generic x64)

Install method

npm global

Model

anthropic/claude-sonnet-4-20250514 and anthropic/claude-opus-4-6 (both tested, both fail on schedule)

Provider / routing chain

openclaw -> anthropic (direct, no proxy)

Additional provider/model setup details

Default model: anthropic/claude-opus-4-6
Cron jobs use anthropic/claude-sonnet-4-20250514 (also tested with claude-opus-4-6)
Auth: Anthropic API key via auth-profiles.json
No proxy, no router, no Cloudflare — direct to Anthropic API
Node 22.22.2, systemd-managed gateway service
Worked perfectly on 2026.3.31 with identical config and API key

Logs, screenshots, and evidence

Log entry from scheduled run (8:00 PM, Opus):

{
  "event": "embedded_run_failover_decision",
  "tags": ["error_handling", "failover", "assistant", "surface_error"],
  "stage": "assistant",
  "decision": "surface_error",
  "failoverReason": "timeout",
  "profileFailureReason": "timeout",
  "provider": "anthropic",
  "model": "claude-opus-4-6",
  "profileId": "sha256:154a23a3efe6",
  "fallbackConfigured": false,
  "timedOut": true,
  "aborted": true
}


Evidence table — same job, same day, same config:

| Time | Trigger | Model | Result | Duration |
|------|---------|-------|--------|----------|
| 7:04 AM | Scheduled | Sonnet | ✅ OK | 43s |
| 8:01 AM | Scheduled | Sonnet | ❌ Timeout | 64s |
| 9:01 AM | Scheduled | Sonnet | ❌ Timeout | 61s |
| 10:01 AM | Scheduled | Sonnet | ❌ Timeout | 64s |
| 11:27 AM | **Manual** | Sonnet | ✅ OK | 78s |
| 12:01 PM | Scheduled | Sonnet | ❌ Timeout | 64s |
| 1:01 PM | Scheduled | Sonnet | ❌ Timeout | 64s |
| 3:11 PM | **Manual** | Sonnet | ✅ OK | 39s |
| 4:01 PM | Scheduled | Sonnet | ❌ Timeout | 64s |
| 5:01 PM | Scheduled | Sonnet | ❌ Timeout | 93s |
| 6:01 PM | Scheduled | Sonnet | ❌ Timeout | 61s |
| 7:51 PM | **Manual** | Opus | ✅ OK | 30s |
| 8:00 PM | **Scheduled** | Opus | ❌ Timeout | 64s |

All runs use sessionTarget: "isolated", same Anthropic API key, same gateway.
On 2026.3.31 (previous version), identical crons ran successfully for weeks.

Impact and severity

Affected: All users running isolated agentTurn cron jobs on 2026.4.1
Severity: High (blocks all scheduled automation — security briefings, hourly check-ins, recurring tasks)
Frequency: 100% of scheduled runs fail; 100% of manual triggers succeed (13/13 observed today)
Consequence: All scheduled cron automation is non-functional after upgrading to 2026.4.1. Workaround is manual triggering only, which defeats the purpose of cron scheduling. Downgrade to 2026.3.31 is the only reliable workaround.

Additional information

The ~60s timeout appears to be an API HTTP request timeout applied only in the scheduled execution path, not the manual cron run path. The payload.timeoutSeconds (tested 300 and 600) and agents.defaults.timeoutSeconds (900) have no effect on this inner timeout. The 7:04 AM run succeeded — possibly because it ran before the gateway was fully loaded with sessions, suggesting the timeout may be related to session/context loading overhead in the scheduled path.

Possibly related to #57581 (LiveSessionModelSwitch blocks isolated cron sessions
with explicit model override) which was closed in this release cycle. Our cron jobs
use explicit model overrides (model: "anthropic/claude-sonnet-4-20250514") in the
payload. The fix for #57581 may have introduced or failed to fully resolve the
timeout regression for scheduled isolated sessions.

Also see #59265 (agents working in secret on 2026.4.1) for potentially related
execution path issues.

Workaround: Set LLM_REQUEST_TIMEOUT=120 as an environment variable in the
systemd service file and restart. This overrides the ~60s default and restores
scheduled cron functionality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingclose:duplicateClosed as duplicatededupe:childDuplicate issue/PR child in dedupe clusterregressionBehavior that previously worked and now fails

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions