Cron model preflight skips entire run when local primary is unreachable, ignoring configured cloud fallbacks [AI]

# Cron model preflight skips entire run when local primary is unreachable, ignoring configured cloud fallbacks [AI]

## Summary

When a cron job uses an agent whose `model.primary` is a local provider (e.g. `ollama/gemma4:26b-nvfp4`) and `model.fallbacks` lists a cloud provider (e.g. `openrouter/nvidia/nemotron-3-super-120b-a12b:free`), and the local provider endpoint is temporarily unreachable, the cron run is **silently skipped** with `status: skipped` instead of falling back to the cloud provider that is healthy.

The skip happens at preflight time inside the cron isolated-agent runner, **before any model invocation**. The fallback chain configured on the agent is never consulted.

This is distinct from the post-invocation fallback failures discussed in #44353 (provider-level errors) and #74985 (embedded agent timeout): here, the failure happens earlier — the preflight short-circuits the run.

Net effect for operators relying on local→cloud failover: when Ollama hiccups (busy, paused for upgrade, momentary network blip on `127.0.0.1`), entire scheduled cron runs disappear with no retry until the next scheduled tick. For 6 daily scheduled runs over a transient 5-min Ollama outage, you can lose 1 entire run silently — and operationally pay it back the next morning when a watchdog finally fires.

## Real behavior proof

### A. Source-level proof

The preflight is implemented in `src/cron/isolated-agent/model-preflight.runtime.ts` (compiled at `dist/model-preflight.runtime-D3BkBmU5.js`):

```js
// preflightCronModelProvider — params.provider/model = the *resolved primary*.
async function preflightCronModelProvider(params) {
    const providerConfig = resolveProviderConfig(params.cfg, params.provider);
    if (!providerConfig) return { status: "available" };
    const baseUrl = normalizeBaseUrl(providerConfig.baseUrl);
    const api = normalizeProbeApi(providerConfig);
    if (!baseUrl || !api || !isLocalProviderBaseUrl(baseUrl)) return { status: "available" };
    // ...probes baseUrl with 2.5s timeout, returns "unavailable" on failure...
}
```

The function only ever consults `cfg.models.providers[params.provider].baseUrl`. It never reads `cfg.agents.list[*].model.fallbacks` nor `cfg.agents.defaults.model.fallbacks`.

The caller in `src/cron/isolated-agent/run.ts` (compiled at `dist/isolated-agent-DPJcOmiU.js:485-502`) consumes only the `status` and short-circuits:

```js
const preflight = await (await loadCronModelPreflightRuntime()).preflightCronModelProvider({
    cfg: cfgWithAgentDefaults,
    provider,    // resolved primary only
    model,
});
if (preflight.status === "unavailable") {
    logWarn(`[cron:${input.job.id}] ${preflight.reason}`);
    return {
        ok: false,
        result: withRunSession({
            status: "skipped",
            error: preflight.reason,
            diagnostics: createCronRunDiagnosticsFromError("model-preflight", preflight.reason, { severity: "warn" }),
            provider,
            model,
        })
    };
}
```

The `return` happens before any fallback resolver runs. The skip is final for this scheduled tick.

### B. Standalone repro (no infra needed)

Save as `repro.mjs` and run with `node repro.mjs`:

```js
import { preflightCronModelProvider } from "/opt/homebrew/lib/node_modules/openclaw/dist/model-preflight.runtime.js";

// Port libre → preflight TimeoutError → status:"unavailable"
const cfg = {
    models: {
        providers: {
            ollama:     { api: "ollama", baseUrl: "http://127.0.0.1:11999" },
            openrouter: { api: "openai-completions", baseUrl: "https://openrouter.ai/api/v1" },
        },
    },
    agents: {
        list: [{
            id: "bourse",
            model: {
                primary: "ollama/gemma4:26b-nvfp4",
                fallbacks: ["openrouter/nvidia/nemotron-3-super-120b-a12b:free"],
            },
        }],
    },
};

const r = await preflightCronModelProvider({
    cfg, provider: "ollama", model: "gemma4:26b-nvfp4",
});
console.log(r);
```

Output (verbatim):
```
{
  status: 'unavailable',
  provider: 'ollama',
  model: 'gemma4:26b-nvfp4',
  baseUrl: 'http://127.0.0.1:11999',
  retryAfterMs: 300000,
  reason: 'Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11999. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TypeError: fetch failed'
}
```

The `cfg.agents.list[0].model.fallbacks` is fully populated and points to a healthy cloud provider, but the preflight result does not look at it.

### C. Production trace (real cron run, redacted IDs)

Cron `marche-preopen-eu` (agent `bourse`), agent config at the time:

```json
{
  "id": "bourse",
  "model": {
    "primary":  "ollama/gemma4:26b-nvfp4",
    "fallbacks": ["openrouter/nvidia/nemotron-3-super-120b-a12b:free"]
  }
}
```

Run history entry (Ollama briefly busy at 07:45 due to concurrent cron consuming RAM):

```json
{
  "ts": 1778132782311,
  "action": "finished",
  "status": "skipped",
  "error": "Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11434. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TimeoutError: request timed out",
  "diagnostics": {
    "summary": "Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11434. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TimeoutError: request timed out",
    "entries": [{
      "source": "model-preflight",
      "severity": "warn",
      "message": "Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11434. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TimeoutError: request timed out"
    }]
  },
  "model": "gemma4:26b-nvfp4",
  "provider": "ollama"
}
```

Three minutes later, Ollama responded normally; OpenRouter Nemotron was healthy throughout. The configured fallback would have run the cron successfully.

## Verification

### Reproducing the bug (no Ollama interference required)

1. Confirm OpenClaw version: `openclaw --version` (tested on 2026.5.4).
2. Save the standalone repro above as `repro.mjs`.
3. Run: `node repro.mjs`.
4. Observe `status: "unavailable"` with no consultation of the `agents.list[*].model.fallbacks` from the cfg.

### End-to-end live verification (optional, requires controlled outage)

1. Configure an agent with `model.primary: "ollama/<model>"` and `model.fallbacks: ["<healthy cloud provider>/<model>"]`.
2. Schedule a one-shot cron: `openclaw cron add --agent <agent> --at 1m --message "..." --tools exec`.
3. Briefly stop Ollama (`launchctl kill TERM gui/$UID/com.ollama` on macOS, or `systemctl stop ollama` on Linux) ≈ 30s before the cron fires.
4. Restart Ollama after the cron has fired.
5. Inspect run with `openclaw cron runs --id <id>`.

Expected (current): `status: "skipped"`, diagnostic source `model-preflight`, no fallback attempted.
Desired: `status: "ok"` with the fallback model used; or at minimum `status: "skipped"` only after the fallback chain has been exhausted.

## Suggested fix (sketch — feedback welcome)

Two non-exclusive options:

1. **Defer preflight until after fallback resolution.** Extend `preflightCronModelProvider` to receive the full fallback chain and walk it in order, returning `available` as soon as one candidate's local probe succeeds (or as soon as a cloud candidate is hit, since cloud preflight is currently a no-op). This keeps the existing semantic of "we only probe local providers".

2. **On `unavailable`, attempt fallback before returning `skipped`.** In `cron/isolated-agent/run.ts`, when preflight is `unavailable`, look up the agent's `model.fallbacks` and rotate to the next candidate (re-running preflight for it if local). Only emit `skipped` when no candidate passes preflight.

Option 1 is preferred — it keeps the failure path centralized in one runtime and avoids racing with the in-flight fallback resolver used during agent invocation.

## Related

- #44353 — Fallback models not triggered on provider-level errors. *Different code path*: that issue is about runtime fallback after invocation; this issue is about preflight skip *before* any invocation. Fixing this issue would also help #44353-style cases when the failure is detectable as "provider unreachable" rather than "provider returned bad response".
- #74985 — Embedded agent Kimi timeout with no fallback. *Different code path*: that one is in `pi-embedded-runner`; this one is in cron isolated-agent.
- #63229 — Gateway falsely marks healthy local vLLM endpoints as timed out. *Related but distinct*: that one concerns false positives from the timeout heuristic; this one concerns legitimate `unavailable` results that should not skip the run when fallbacks exist.

## Environment

- OpenClaw 2026.5.4 (commit `325df3e`)
- macOS Darwin arm64 (Apple Silicon M4 Pro)
- Node.js 25.x (npm global install)
- Affected runtime: `dist/model-preflight.runtime-D3BkBmU5.js` + `dist/isolated-agent-DPJcOmiU.js`
- Tested with: `provider: ollama` baseUrl `http://127.0.0.1:11999` (port libre, garantit `fetch failed`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cron model preflight skips entire run when local primary is unreachable, ignoring configured cloud fallbacks [AI] #79329

Cron model preflight skips entire run when local primary is unreachable, ignoring configured cloud fallbacks [AI]

Summary

Real behavior proof

A. Source-level proof

B. Standalone repro (no infra needed)

C. Production trace (real cron run, redacted IDs)

Verification

Reproducing the bug (no Ollama interference required)

End-to-end live verification (optional, requires controlled outage)

Suggested fix (sketch — feedback welcome)

Related

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Cron model preflight skips entire run when local primary is unreachable, ignoring configured cloud fallbacks [AI] #79329

Description

Cron model preflight skips entire run when local primary is unreachable, ignoring configured cloud fallbacks [AI]

Summary

Real behavior proof

A. Source-level proof

B. Standalone repro (no infra needed)

C. Production trace (real cron run, redacted IDs)

Verification

Reproducing the bug (no Ollama interference required)

End-to-end live verification (optional, requires controlled outage)

Suggested fix (sketch — feedback welcome)

Related

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions