fix(cron): retry recurring jobs on transient errors with backoff by onesmash · Pull Request #46423 · openclaw/openclaw

onesmash · 2026-03-14T17:19:18Z

Summary

Recurring (cron / every) jobs now detect transient errors (network, timeout, rate-limit, 5xx) and retry with exponential backoff, instead of waiting until the next natural schedule time
Permanent errors still wait for the natural schedule (existing behavior preserved)
Adds regression tests for issue [Bug] Recurring cron jobs silently skip transient-error retry — only one-shot jobs retry with backoff #46341

Root Cause

One-shot (at) jobs already had isTransientCronError detection (added in #24355), but the recurring job path at src/cron/service/timer.ts:416 was missing it. For a daily job with a network error, Math.max(normalNext=+24h, backoffNext=+30s) always picked the 24-hour schedule, making backoff completely ineffective.

Behavior Change

Error type	Before	After
Transient (network/timeout/5xx/rate-limit)	Wait for next natural schedule	Retry with backoff (30s → 1m → 5m → 15m → 60m)
Permanent	Wait for next natural schedule	Wait for next natural schedule (unchanged)

Test Plan

src/cron/service.issue-46341-recurring-transient-retry.test.ts — 5 new tests covering transient retry, permanent error schedule, backoff escalation
pnpm test passes

Closes #46341

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2dbc92e3a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-14T17:22:37Z

+        // backoff regardless of schedule interval. Without this, long-period jobs
+        // (hourly, daily) would wait until the next natural schedule time even
+        // when the error is self-healing (e.g. a brief network blip).
+        job.state.nextRunAtMs = backoffNext;


Respect preserveSchedule on forced transient failures

When applyJobResult is invoked from a manual force run (src/cron/service/ops.ts:479-490 passes preserveSchedule: true), transient errors now always write nextRunAtMs = backoffNext and ignore the preserved natural schedule. That means a forced run of a long-interval recurring every job can now trigger an unexpected automatic retry within ~30s after a network blip, whereas the prior behavior kept the original cadence/anchor for force runs.

Useful? React with 👍 / 👎.

greptile-apps · 2026-03-14T17:22:55Z

Greptile Summary

This PR correctly extends the transient-error retry logic (previously only for at one-shot jobs) to recurring cron/every jobs. The root cause — Math.max(normalNext, backoffNext) always favouring the natural schedule for long-period jobs — is accurately diagnosed and fixed.

Key issues found:

retryConfig.backoffMs silently ignored for recurring jobs (timer.ts:419): resolveRetryConfig is called and its retryOn list is forwarded to isTransientCronError, but retryConfig.backoffMs is never passed to errorBackoffMs. The one-shot path correctly passes retryConfig.backoffMs; the recurring path always falls back to DEFAULT_BACKOFF_SCHEDULE_MS, making cronConfig.retry.backoffMs a no-op for cron/every jobs.
New transient variable not included in log statement (timer.ts:444–453): The transient flag is the key branching signal but is absent from the log.info call, reducing observability. A pre-existing duplicate consecutiveErrors key in the same log object was also left in place and should be replaced with transient.
Backoff escalation test uses fresh job instances (test line 145–168): Each call to runError pre-seeds consecutiveErrors on a new job rather than letting errors accumulate on the same instance, so the counter accumulation code path is not exercised by this test.

Confidence Score: 3/5

The fix is directionally correct but has a configuration bug (retryConfig.backoffMs ignored) that could affect deployments relying on custom backoff schedules.
The core logic is sound and the failure mode (transient errors on long-period recurring jobs not retrying with backoff) is clearly fixed. However, retryConfig.backoffMs being fetched but not forwarded means any operator-configured backoff schedule silently has no effect on recurring jobs — a real functional gap. The missing transient log field is a lower-severity observability gap. These issues are confined to the error-handling path and are unlikely to cause data loss, but the configuration inconsistency warrants a fix before merging.
Pay close attention to src/cron/service/timer.ts around the errorBackoffMs call in the recurring error branch (line 419).

Comments Outside Diff (1)

src/cron/service/timer.ts, line 444-453 (link)

transient computed but not logged; duplicate key in log object

The PR introduces the transient variable, which is now the key branching signal, but it is not included in the log.info call. This makes it hard to distinguish transient-path retries from permanent-path retries in production logs. Additionally, the pre-existing log object already has a duplicate consecutiveErrors key (lines 447–448) — one of those was almost certainly intended to be transient.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 444-453

Comment:
**`transient` computed but not logged; duplicate key in log object**

The PR introduces the `transient` variable, which is now the key branching signal, but it is not included in the `log.info` call. This makes it hard to distinguish transient-path retries from permanent-path retries in production logs. Additionally, the pre-existing log object already has a duplicate `consecutiveErrors` key (lines 447–448) — one of those was almost certainly intended to be `transient`.



How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 419

Comment:
**`retryConfig.backoffMs` fetched but silently ignored**

`resolveRetryConfig` is called to get `retryConfig`, and `retryConfig.retryOn` is correctly forwarded to `isTransientCronError`. However, `retryConfig.backoffMs` is never passed to `errorBackoffMs`, so any per-deployment backoff schedule configured via `cronConfig.retry.backoffMs` is silently ignored for recurring jobs. The one-shot path at line 385 correctly forwards `retryConfig.backoffMs`:

```typescript
const backoff = errorBackoffMs(consecutive, retryConfig.backoffMs);
```

The recurring path should do the same:

```suggestion
      const backoff = errorBackoffMs(job.state.consecutiveErrors ?? 1, retryConfig.backoffMs);
```

Without this, `cronConfig.retry.backoffMs` only takes effect for `at` jobs, not `cron`/`every` jobs, which is inconsistent and likely unintended.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 444-453

Comment:
**`transient` computed but not logged; duplicate key in log object**

The PR introduces the `transient` variable, which is now the key branching signal, but it is not included in the `log.info` call. This makes it hard to distinguish transient-path retries from permanent-path retries in production logs. Additionally, the pre-existing log object already has a duplicate `consecutiveErrors` key (lines 447–448) — one of those was almost certainly intended to be `transient`.

```suggestion
      state.deps.log.info(
        {
          jobId: job.id,
          jobName: job.name,
          transient,
          consecutiveErrors: job.state.consecutiveErrors,
          backoffMs: backoff,
          nextRunAtMs: job.state.nextRunAtMs,
        },
        "cron: applying error backoff",
      );
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/cron/service.issue-46341-recurring-transient-retry.test.ts
Line: 145-168

Comment:
**Backoff escalation test creates fresh job instances instead of reusing one**

Each `runError(n)` call creates a fresh `CronJob` and manually pre-sets `consecutiveErrors` before calling `applyJobResult`. This means the test does not exercise the actual counter accumulation path across repeated failures on the same job — it only verifies that `errorBackoffMs` returns a larger value for larger inputs.

A more realistic regression test would apply errors to the same job instance in sequence:

```typescript
it("uses increasing backoff on consecutive transient errors for a daily job", () => {
  const state = makeState();
  const job = makeCronJob("0 10 * * *");

  const recordNext = () => {
    applyJobResult(state, job, {
      status: "error",
      error: "econnreset",
      startedAt: NOW_MS,
      endedAt: ENDED_AT,
    });
    return job.state.nextRunAtMs!;
  };

  const first = recordNext();   // consecutiveErrors → 1, 30s backoff
  const second = recordNext();  // consecutiveErrors → 2, 60s backoff
  const third = recordNext();   // consecutiveErrors → 3, 5min backoff

  expect(second).toBeGreaterThan(first);
  expect(third).toBeGreaterThan(second);
  expect(third).toBeLessThanOrEqual(ENDED_AT + 5 * 60_000);
});
```

This also implicitly validates that `consecutiveErrors` increments correctly rather than relying on manually supplied values.

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: daed538}

greptile-apps · 2026-03-14T17:22:59Z

@@ -414,7 +414,8 @@ export function applyJobResult(
        }
      }
    } else if (result.status === "error" && job.enabled) {
-      // Apply exponential backoff for errored jobs to prevent retry storms.
+      const retryConfig = resolveRetryConfig(state.deps.cronConfig);
+      const transient = isTransientCronError(result.error, retryConfig.retryOn);
      const backoff = errorBackoffMs(job.state.consecutiveErrors ?? 1);


retryConfig.backoffMs fetched but silently ignored

resolveRetryConfig is called to get retryConfig, and retryConfig.retryOn is correctly forwarded to isTransientCronError. However, retryConfig.backoffMs is never passed to errorBackoffMs, so any per-deployment backoff schedule configured via cronConfig.retry.backoffMs is silently ignored for recurring jobs. The one-shot path at line 385 correctly forwards retryConfig.backoffMs:

const backoff = errorBackoffMs(consecutive, retryConfig.backoffMs);

The recurring path should do the same:

Suggested change

const backoff = errorBackoffMs(job.state.consecutiveErrors ?? 1);

const backoff = errorBackoffMs(job.state.consecutiveErrors ?? 1, retryConfig.backoffMs);

Without this, cronConfig.retry.backoffMs only takes effect for at jobs, not cron/every jobs, which is inconsistent and likely unintended.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/cron/service/timer.ts Line: 419 Comment: **`retryConfig.backoffMs` fetched but silently ignored** `resolveRetryConfig` is called to get `retryConfig`, and `retryConfig.retryOn` is correctly forwarded to `isTransientCronError`. However, `retryConfig.backoffMs` is never passed to `errorBackoffMs`, so any per-deployment backoff schedule configured via `cronConfig.retry.backoffMs` is silently ignored for recurring jobs. The one-shot path at line 385 correctly forwards `retryConfig.backoffMs`: ```typescript const backoff = errorBackoffMs(consecutive, retryConfig.backoffMs); ``` The recurring path should do the same: ```suggestion const backoff = errorBackoffMs(job.state.consecutiveErrors ?? 1, retryConfig.backoffMs); ``` Without this, `cronConfig.retry.backoffMs` only takes effect for `at` jobs, not `cron`/`every` jobs, which is inconsistent and likely unintended. How can I resolve this? If you propose a fix, please make it concise.

Addressed in `7a98b18f5`: recurring error path now takes backoff schedule from `resolveRetryConfig(state.deps.cronConfig).backoffMs` when `cron.retry.backoffMs` is set (same reference as one-shot). Full default ladder still used when unset. Also merged latest `openclaw/main` into this branch to refresh CI.

greptile-apps · 2026-03-14T17:23:03Z

+  it("uses increasing backoff on consecutive transient errors for a daily job", () => {
+    const state = makeState();
+
+    const runError = (consecutiveErrors: number) => {
+      const job = makeCronJob("0 10 * * *");
+      job.state.consecutiveErrors = consecutiveErrors;
+      applyJobResult(state, job, {
+        status: "error",
+        error: "econnreset",
+        startedAt: NOW_MS,
+        endedAt: ENDED_AT,
+      });
+      return job.state.nextRunAtMs!;
+    };
+
+    const first = runError(0); // 1st error → 30s backoff
+    const second = runError(1); // 2nd error → 60s backoff
+    const third = runError(2); // 3rd error → 5min backoff
+
+    expect(second).toBeGreaterThan(first);
+    expect(third).toBeGreaterThan(second);
+    // All well within 5 minutes, not waiting 24 hours
+    expect(third).toBeLessThanOrEqual(ENDED_AT + 5 * 60_000);
+  });


Backoff escalation test creates fresh job instances instead of reusing one

Each runError(n) call creates a fresh CronJob and manually pre-sets consecutiveErrors before calling applyJobResult. This means the test does not exercise the actual counter accumulation path across repeated failures on the same job — it only verifies that errorBackoffMs returns a larger value for larger inputs.

A more realistic regression test would apply errors to the same job instance in sequence:

it("uses increasing backoff on consecutive transient errors for a daily job", () => { const state = makeState(); const job = makeCronJob("0 10 * * *"); const recordNext = () => { applyJobResult(state, job, { status: "error", error: "econnreset", startedAt: NOW_MS, endedAt: ENDED_AT, }); return job.state.nextRunAtMs!; }; const first = recordNext(); // consecutiveErrors → 1, 30s backoff const second = recordNext(); // consecutiveErrors → 2, 60s backoff const third = recordNext(); // consecutiveErrors → 3, 5min backoff expect(second).toBeGreaterThan(first); expect(third).toBeGreaterThan(second); expect(third).toBeLessThanOrEqual(ENDED_AT + 5 * 60_000); });

This also implicitly validates that consecutiveErrors increments correctly rather than relying on manually supplied values.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/cron/service.issue-46341-recurring-transient-retry.test.ts Line: 145-168 Comment: **Backoff escalation test creates fresh job instances instead of reusing one** Each `runError(n)` call creates a fresh `CronJob` and manually pre-sets `consecutiveErrors` before calling `applyJobResult`. This means the test does not exercise the actual counter accumulation path across repeated failures on the same job — it only verifies that `errorBackoffMs` returns a larger value for larger inputs. A more realistic regression test would apply errors to the same job instance in sequence: ```typescript it("uses increasing backoff on consecutive transient errors for a daily job", () => { const state = makeState(); const job = makeCronJob("0 10 * * *"); const recordNext = () => { applyJobResult(state, job, { status: "error", error: "econnreset", startedAt: NOW_MS, endedAt: ENDED_AT, }); return job.state.nextRunAtMs!; }; const first = recordNext(); // consecutiveErrors → 1, 30s backoff const second = recordNext(); // consecutiveErrors → 2, 60s backoff const third = recordNext(); // consecutiveErrors → 3, 5min backoff expect(second).toBeGreaterThan(first); expect(third).toBeGreaterThan(second); expect(third).toBeLessThanOrEqual(ENDED_AT + 5 * 60_000); }); ``` This also implicitly validates that `consecutiveErrors` increments correctly rather than relying on manually supplied values. How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1305d8fd3a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-14T17:31:37Z

-      const backoff = errorBackoffMs(job.state.consecutiveErrors ?? 1);
+      const retryConfig = resolveRetryConfig(state.deps.cronConfig);
+      const transient = isTransientCronError(result.error, retryConfig.retryOn);
+      const backoff = errorBackoffMs(job.state.consecutiveErrors ?? 1, retryConfig.backoffMs);


Keep recurring retries on full backoff ladder

Passing retryConfig.backoffMs into the recurring error path changes the default ladder for recurring jobs from [30s, 60s, 5m, 15m, 60m] to only the first three entries, because resolveRetryConfig falls back to DEFAULT_BACKOFF_SCHEDULE_MS.slice(0, 3). After the 3rd consecutive failure, recurring jobs now retry every 5 minutes instead of backing off to 15/60 minutes, which increases retry pressure for short-interval jobs and weakens the existing retry-storm protection.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3fdfd5a35e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-15T02:32:06Z

+      const backoffSchedule =
+        Array.isArray(state.deps.cronConfig?.retry?.backoffMs) &&
+        state.deps.cronConfig.retry.backoffMs.length > 0
+          ? state.deps.cronConfig.retry.backoffMs
+          : DEFAULT_BACKOFF_SCHEDULE_MS;


Honor custom retry backoff during startup catch-up

Using cronConfig.retry.backoffMs here changes recurring jobs’ persisted nextRunAtMs, but the restart gate in isErrorBackoffPending still computes backoff with the default ladder (errorBackoffMs(consecutiveErrors)), so the two paths diverge. In a configured environment with a longer custom backoff (for example 2h), a gateway restart after 30s+ can treat backoff as expired and planStartupCatchup (with allowCronMissedRunByLastRun) may run the cron job before the configured cooldown and before its stored nextRunAtMs, which breaks operator-configured retry pacing after restarts.

Useful? React with 👍 / 👎.

onesmash · 2026-03-18T11:22:36Z

Merged latest main (commit d72ffc41) to refresh CI — prior run had stale-base diff failure, unrelated TS errors, and OOM in test shards. Maintainer: please re-check CI on this push.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d72ffc41e8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T11:25:29Z

+      const retryConfig = resolveRetryConfig(state.deps.cronConfig);
+      const transient = isTransientCronError(result.error, retryConfig.retryOn);


Decouple recurring transient checks from one-shot retryOn config

The recurring error path now calls isTransientCronError(..., retryConfig.retryOn), so any custom cron.retry.retryOn setting will also gate recurring retries. This is a regression for deployments that tuned cron.retry for one-shot jobs only (as documented in src/config/schema.help.ts:1199), because a config like retryOn: ["rate_limit"] will make recurring network/timeout failures skip the new backoff retry behavior and wait until the next natural schedule again. Please keep recurring transient classification independent of one-shot retry config (or update the config contract everywhere).

Useful? React with 👍 / 👎.

onesmash · 2026-03-18T17:25:22Z

Push `27d0a309d` — Codex P2 (retryOn)

Change: Recurring cron/every error path now calls isTransientCronError(error) without cron.retry.retryOn, so a narrowed retryOn (documented for one-shot jobs) no longer disables network/timeout/5xx early retry on recurring jobs. Added regression test; jobName added to error-backoff log.

CI note (prior run 23242260608): Red jobs look unrelated to this PR:

extensions / channels / contracts: extensions/tlon submodule npm install fails with TS1360 in channelContentConfig.ts / groupTemplates.ts.
extensions: llm-task-tool.test.ts 1 failed (~142s); provider-ollama-setup.js missing in embedded-agent path.
windows: shard failures (likely flake/load).

Please re-run CI on this push; cron unit tests and core shards were green.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a98b18f54

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T23:27:49Z

+      const recurringBackoffMs =
+        Array.isArray(state.deps.cronConfig?.retry?.backoffMs) &&
+        state.deps.cronConfig.retry.backoffMs.length > 0
+          ? retryConfig.backoffMs
+          : DEFAULT_BACKOFF_SCHEDULE_MS;


Align restart backoff gating with recurring backoff config

Using cron.retry.backoffMs for recurring errors here changes persisted nextRunAtMs, but startup replay gating still uses isErrorBackoffPending with the default ladder only. In a config like cron.retry.backoffMs: [7200000], a cron job that failed transiently can be replayed on restart (via missed-slot logic) well before its configured 2-hour cooldown and before its stored nextRunAtMs, because the gate considers backoff elapsed after the default 30s. This breaks operator-configured retry pacing specifically after restarts.

Useful? React with 👍 / 👎.

Addressed in 52a71a93a: startup missed-slot gating now reuses the same recurring backoff schedule as applyJobResult, so custom cron.retry.backoffMs prevents premature replay after restart. Added a restart-catchup regression test for the custom-backoff case, and targeted cron tests passed.

onesmash · 2026-03-19T05:20:36Z

定时维护检查（2026-03-19）

代码审查项： Greptile 在 daed538 上提的三点在当前 HEAD 7a98b18 已覆盖：

cron.retry.backoffMs：recurring 路径在配置了自定义 backoffMs 时通过 retryConfig.backoffMs 传入 errorBackoffMs（与 one-shot 一致）；未配置时仍用 DEFAULT_BACKOFF_SCHEDULE_MS。
日志：cron: applying error backoff 已包含 transient，且无重复 consecutiveErrors key。
测试：uses increasing backoff on consecutive transient errors for a daily job 已在同一 job 实例上连续 applyJobResult。

CI： 最新 run 中失败主要为 pnpm test:channels / pnpm test:extensions / Windows 分片；日志为大量 extensions/* 用例失败，与本次 src/cron 改动无交集。本地 vitest run src/cron/service.issue-46341-recurring-transient-retry.test.ts 6/6 通过。

建议在 upstream 侧 Re-run failed jobs 或待 main 稳定后再看是否仍为红。PR 逻辑侧当前无待改项。

onesmash · 2026-03-19T11:21:11Z

定时维护（2026-03-19）

HEAD: 7a98b18 — Greptile 在 daed538 上提的三点（retry.backoffMs 传入 recurring、transient 日志、同一 job 连续退避测试）在当前分支已覆盖。

CI（run 23271919932）：pnpm test:channels、test:extensions、Linux test shard 1、多路 Windows 分片等为红；日志显示失败集中在 extensions/telegram、discord、slack、whatsapp、signal 等用例，与本次 src/cron 改动无交集。无仓库 admin 权限无法代为 Re-run failed jobs，请维护者在 Actions 上重跑或待上游稳定后再看。

Codex P2（preserveSchedule + transient）：当前 timer.ts 在 opts.preserveSchedule && normalNext !== undefined 时使用 Math.max(normalNext, backoffNext)，强制跑场景下仍保留自然节奏而非仅用短退避；若认可可在该条 inline review 上 resolve。

合并： PR 为 MERGEABLE，但 reviewDecision 仍为空，需维护者 review/approve。

onesmash · 2026-03-27T12:51:28Z

定时维护（2026-03-27）

HEAD: 52a71a93a — 所有代码审查项已覆盖：

Greptile: retryConfig.backoffMs、transient 日志、同一 job 连续退避测试 ✅
Codex P2: preserveSchedule（transient path 使用 Math.max(normalNext, backoffNext)）、recurringBackoffMs、startup catch-up backoff 一致性、retryOn 解耦 ✅

CI（run 23307940289，2026-03-19）：macos 1 失败，与本次 src/cron 改动无交集（macOS job 完成时间 18:58 UTC，共享 run 号，属 flaky/pre-existing）；30 项 extension-fast 检查全绿。

待办：PR 当前无代码待改项，reviewDecision 仍为空，等待维护者 review/approve。若方便，请 Re-run failed jobs 或直接 approve 合并。

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30aa07f73c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-28T20:33:01Z

-      const backoff = errorBackoffMs(job.state.consecutiveErrors ?? 1);
+      // Recurring jobs: use default transient patterns only. cron.retry.retryOn is documented
+      // for one-shot retries; narrowing it must not disable recurring network/timeout retries.
+      const transient = isTransientCronError(result.error);


Classify built-in timeout errors as transient

The new recurring retry branch now depends on isTransientCronError(result.error), but the scheduler’s own timeout failures are emitted as "cron: job execution timed out" (from timeoutErrorMessage()), which does not match the current timeout transient pattern (timeout|etimedout). In practice, timed-out recurring jobs (for example a daily job that hits the execution timeout) are treated as permanent errors and wait for the next natural schedule instead of taking the intended backoff retry path, so the timeout case in this fix is not actually covered.

Useful? React with 👍 / 👎.

Addressed in 18f0891690: updated TRANSIENT_PATTERNS.timeout to /(timeout|timed\\s+out|etimedout)/i so "cron: job execution timed out" from timeoutErrorMessage() is correctly classified as transient. Added regression test for the exact internal message.

Made-with: Cursor

When applyJobResult is invoked with preserveSchedule (e.g. manual force run), transient errors now keep the preserved natural schedule with backoff floor instead of always scheduling an immediate backoff retry. Made-with: Cursor

…backoffMs Respect cronConfig.retry.backoffMs when set; otherwise use DEFAULT_BACKOFF_SCHEDULE_MS (full ladder) for recurring error path to preserve retry-storm protection (15m/60m) instead of resolving via resolveRetryConfig which defaults to slice(0,3). Made-with: Cursor

Align recurring path with one-shot: pass retryConfig.backoffMs to errorBackoffMs so cronConfig.retry.backoffMs is respected for cron/every jobs (addresses review feedback). Made-with: Cursor

obviyus · 2026-05-09T15:20:34Z

Thanks for the PR. This branch appears to include a large replay of unrelated main/release commits, so it is not reviewable as a focused Telegram change. Please reopen as a narrow PR with only the intended fix.

openclaw-barnacle Bot added the size: S label Mar 14, 2026

onesmash force-pushed the fix/recurring-cron-transient-retry branch from b2dbc92 to daed538 Compare March 14, 2026 17:22

chatgpt-codex-connector Bot reviewed Mar 14, 2026

View reviewed changes

onesmash force-pushed the fix/recurring-cron-transient-retry branch from daed538 to d3951f5 Compare March 14, 2026 17:22

greptile-apps Bot reviewed Mar 14, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Mar 14, 2026

View reviewed changes

openclaw-barnacle Bot added size: M and removed size: S labels Mar 15, 2026

chatgpt-codex-connector Bot reviewed Mar 15, 2026

View reviewed changes

openclaw-barnacle Bot added size: S size: M and removed size: M size: S labels Mar 17, 2026

chatgpt-codex-connector Bot reviewed Mar 18, 2026

View reviewed changes

onesmash force-pushed the fix/recurring-cron-transient-retry branch from 52a71a9 to 30aa07f Compare March 28, 2026 20:27

chatgpt-codex-connector Bot reviewed Mar 28, 2026

View reviewed changes

onesmash added 6 commits March 30, 2026 16:28

fix(cron): retry recurring jobs on transient errors with backoff

bca5b97

fix(cron): address review feedback

7b52efb

fix(cron): use expr not cron in test schedule type (TS2353)

14a988c

Made-with: Cursor

fix(cron): use retryConfig.backoffMs for recurring job error backoff

b2cbf73

Align recurring path with one-shot: pass retryConfig.backoffMs to errorBackoffMs so cronConfig.retry.backoffMs is respected for cron/every jobs (addresses review feedback). Made-with: Cursor

github-actions Bot mentioned this pull request Apr 30, 2026

🦞 OpenClaw Ecosystem Digest 2026-04-30 borq168/big_model_radar#84

Open

clawsweeper Bot mentioned this pull request Apr 30, 2026

fix(cron): keep recurring backoff stable after reload #74068

Draft

BradGroux mentioned this pull request May 5, 2026

WORKING: All Microsoft Issues and PRs (refresh) #74163

Draft

	const backoff = errorBackoffMs(job.state.consecutiveErrors ?? 1);
	const backoff = errorBackoffMs(job.state.consecutiveErrors ?? 1, retryConfig.backoffMs);

		const retryConfig = resolveRetryConfig(state.deps.cronConfig);
		const transient = isTransientCronError(result.error, retryConfig.retryOn);

Uh oh!

Conversation

onesmash commented Mar 14, 2026

Summary

Root Cause

Behavior Change

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

onesmash Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

onesmash commented Mar 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

onesmash commented Mar 18, 2026

Push 27d0a309d — Codex P2 (retryOn)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

onesmash Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

onesmash commented Mar 19, 2026

定时维护检查（2026-03-19）

Uh oh!

onesmash commented Mar 19, 2026

定时维护（2026-03-19）

Uh oh!

onesmash commented Mar 27, 2026

定时维护（2026-03-27）

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 28, 2026

greptile-apps Bot commented Mar 14, 2026 •

edited

Loading

Push `27d0a309d` — Codex P2 (retryOn)