Cron model preflight skips entire run when local primary is unreachable, ignoring configured cloud fallbacks [AI]
Summary
When a cron job uses an agent whose model.primary is a local provider (e.g. ollama/gemma4:26b-nvfp4) and model.fallbacks lists a cloud provider (e.g. openrouter/nvidia/nemotron-3-super-120b-a12b:free), and the local provider endpoint is temporarily unreachable, the cron run is silently skipped with status: skipped instead of falling back to the cloud provider that is healthy.
The skip happens at preflight time inside the cron isolated-agent runner, before any model invocation. The fallback chain configured on the agent is never consulted.
This is distinct from the post-invocation fallback failures discussed in #44353 (provider-level errors) and #74985 (embedded agent timeout): here, the failure happens earlier — the preflight short-circuits the run.
Net effect for operators relying on local→cloud failover: when Ollama hiccups (busy, paused for upgrade, momentary network blip on 127.0.0.1), entire scheduled cron runs disappear with no retry until the next scheduled tick. For 6 daily scheduled runs over a transient 5-min Ollama outage, you can lose 1 entire run silently — and operationally pay it back the next morning when a watchdog finally fires.
Real behavior proof
A. Source-level proof
The preflight is implemented in src/cron/isolated-agent/model-preflight.runtime.ts (compiled at dist/model-preflight.runtime-D3BkBmU5.js):
// preflightCronModelProvider — params.provider/model = the *resolved primary*.
async function preflightCronModelProvider(params) {
const providerConfig = resolveProviderConfig(params.cfg, params.provider);
if (!providerConfig) return { status: "available" };
const baseUrl = normalizeBaseUrl(providerConfig.baseUrl);
const api = normalizeProbeApi(providerConfig);
if (!baseUrl || !api || !isLocalProviderBaseUrl(baseUrl)) return { status: "available" };
// ...probes baseUrl with 2.5s timeout, returns "unavailable" on failure...
}
The function only ever consults cfg.models.providers[params.provider].baseUrl. It never reads cfg.agents.list[*].model.fallbacks nor cfg.agents.defaults.model.fallbacks.
The caller in src/cron/isolated-agent/run.ts (compiled at dist/isolated-agent-DPJcOmiU.js:485-502) consumes only the status and short-circuits:
const preflight = await (await loadCronModelPreflightRuntime()).preflightCronModelProvider({
cfg: cfgWithAgentDefaults,
provider, // resolved primary only
model,
});
if (preflight.status === "unavailable") {
logWarn(`[cron:${input.job.id}] ${preflight.reason}`);
return {
ok: false,
result: withRunSession({
status: "skipped",
error: preflight.reason,
diagnostics: createCronRunDiagnosticsFromError("model-preflight", preflight.reason, { severity: "warn" }),
provider,
model,
})
};
}
The return happens before any fallback resolver runs. The skip is final for this scheduled tick.
B. Standalone repro (no infra needed)
Save as repro.mjs and run with node repro.mjs:
import { preflightCronModelProvider } from "/opt/homebrew/lib/node_modules/openclaw/dist/model-preflight.runtime.js";
// Port libre → preflight TimeoutError → status:"unavailable"
const cfg = {
models: {
providers: {
ollama: { api: "ollama", baseUrl: "http://127.0.0.1:11999" },
openrouter: { api: "openai-completions", baseUrl: "https://openrouter.ai/api/v1" },
},
},
agents: {
list: [{
id: "bourse",
model: {
primary: "ollama/gemma4:26b-nvfp4",
fallbacks: ["openrouter/nvidia/nemotron-3-super-120b-a12b:free"],
},
}],
},
};
const r = await preflightCronModelProvider({
cfg, provider: "ollama", model: "gemma4:26b-nvfp4",
});
console.log(r);
Output (verbatim):
{
status: 'unavailable',
provider: 'ollama',
model: 'gemma4:26b-nvfp4',
baseUrl: 'http://127.0.0.1:11999',
retryAfterMs: 300000,
reason: 'Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11999. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TypeError: fetch failed'
}
The cfg.agents.list[0].model.fallbacks is fully populated and points to a healthy cloud provider, but the preflight result does not look at it.
C. Production trace (real cron run, redacted IDs)
Cron marche-preopen-eu (agent bourse), agent config at the time:
{
"id": "bourse",
"model": {
"primary": "ollama/gemma4:26b-nvfp4",
"fallbacks": ["openrouter/nvidia/nemotron-3-super-120b-a12b:free"]
}
}
Run history entry (Ollama briefly busy at 07:45 due to concurrent cron consuming RAM):
{
"ts": 1778132782311,
"action": "finished",
"status": "skipped",
"error": "Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11434. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TimeoutError: request timed out",
"diagnostics": {
"summary": "Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11434. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TimeoutError: request timed out",
"entries": [{
"source": "model-preflight",
"severity": "warn",
"message": "Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11434. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TimeoutError: request timed out"
}]
},
"model": "gemma4:26b-nvfp4",
"provider": "ollama"
}
Three minutes later, Ollama responded normally; OpenRouter Nemotron was healthy throughout. The configured fallback would have run the cron successfully.
Verification
Reproducing the bug (no Ollama interference required)
- Confirm OpenClaw version:
openclaw --version (tested on 2026.5.4).
- Save the standalone repro above as
repro.mjs.
- Run:
node repro.mjs.
- Observe
status: "unavailable" with no consultation of the agents.list[*].model.fallbacks from the cfg.
End-to-end live verification (optional, requires controlled outage)
- Configure an agent with
model.primary: "ollama/<model>" and model.fallbacks: ["<healthy cloud provider>/<model>"].
- Schedule a one-shot cron:
openclaw cron add --agent <agent> --at 1m --message "..." --tools exec.
- Briefly stop Ollama (
launchctl kill TERM gui/$UID/com.ollama on macOS, or systemctl stop ollama on Linux) ≈ 30s before the cron fires.
- Restart Ollama after the cron has fired.
- Inspect run with
openclaw cron runs --id <id>.
Expected (current): status: "skipped", diagnostic source model-preflight, no fallback attempted.
Desired: status: "ok" with the fallback model used; or at minimum status: "skipped" only after the fallback chain has been exhausted.
Suggested fix (sketch — feedback welcome)
Two non-exclusive options:
-
Defer preflight until after fallback resolution. Extend preflightCronModelProvider to receive the full fallback chain and walk it in order, returning available as soon as one candidate's local probe succeeds (or as soon as a cloud candidate is hit, since cloud preflight is currently a no-op). This keeps the existing semantic of "we only probe local providers".
-
On unavailable, attempt fallback before returning skipped. In cron/isolated-agent/run.ts, when preflight is unavailable, look up the agent's model.fallbacks and rotate to the next candidate (re-running preflight for it if local). Only emit skipped when no candidate passes preflight.
Option 1 is preferred — it keeps the failure path centralized in one runtime and avoids racing with the in-flight fallback resolver used during agent invocation.
Related
Environment
- OpenClaw 2026.5.4 (commit
325df3e)
- macOS Darwin arm64 (Apple Silicon M4 Pro)
- Node.js 25.x (npm global install)
- Affected runtime:
dist/model-preflight.runtime-D3BkBmU5.js + dist/isolated-agent-DPJcOmiU.js
- Tested with:
provider: ollama baseUrl http://127.0.0.1:11999 (port libre, garantit fetch failed)
Cron model preflight skips entire run when local primary is unreachable, ignoring configured cloud fallbacks [AI]
Summary
When a cron job uses an agent whose
model.primaryis a local provider (e.g.ollama/gemma4:26b-nvfp4) andmodel.fallbackslists a cloud provider (e.g.openrouter/nvidia/nemotron-3-super-120b-a12b:free), and the local provider endpoint is temporarily unreachable, the cron run is silently skipped withstatus: skippedinstead of falling back to the cloud provider that is healthy.The skip happens at preflight time inside the cron isolated-agent runner, before any model invocation. The fallback chain configured on the agent is never consulted.
This is distinct from the post-invocation fallback failures discussed in #44353 (provider-level errors) and #74985 (embedded agent timeout): here, the failure happens earlier — the preflight short-circuits the run.
Net effect for operators relying on local→cloud failover: when Ollama hiccups (busy, paused for upgrade, momentary network blip on
127.0.0.1), entire scheduled cron runs disappear with no retry until the next scheduled tick. For 6 daily scheduled runs over a transient 5-min Ollama outage, you can lose 1 entire run silently — and operationally pay it back the next morning when a watchdog finally fires.Real behavior proof
A. Source-level proof
The preflight is implemented in
src/cron/isolated-agent/model-preflight.runtime.ts(compiled atdist/model-preflight.runtime-D3BkBmU5.js):The function only ever consults
cfg.models.providers[params.provider].baseUrl. It never readscfg.agents.list[*].model.fallbacksnorcfg.agents.defaults.model.fallbacks.The caller in
src/cron/isolated-agent/run.ts(compiled atdist/isolated-agent-DPJcOmiU.js:485-502) consumes only thestatusand short-circuits:The
returnhappens before any fallback resolver runs. The skip is final for this scheduled tick.B. Standalone repro (no infra needed)
Save as
repro.mjsand run withnode repro.mjs:Output (verbatim):
The
cfg.agents.list[0].model.fallbacksis fully populated and points to a healthy cloud provider, but the preflight result does not look at it.C. Production trace (real cron run, redacted IDs)
Cron
marche-preopen-eu(agentbourse), agent config at the time:{ "id": "bourse", "model": { "primary": "ollama/gemma4:26b-nvfp4", "fallbacks": ["openrouter/nvidia/nemotron-3-super-120b-a12b:free"] } }Run history entry (Ollama briefly busy at 07:45 due to concurrent cron consuming RAM):
{ "ts": 1778132782311, "action": "finished", "status": "skipped", "error": "Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11434. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TimeoutError: request timed out", "diagnostics": { "summary": "Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11434. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TimeoutError: request timed out", "entries": [{ "source": "model-preflight", "severity": "warn", "message": "Agent cron job uses ollama/gemma4:26b-nvfp4 but the local provider endpoint is not reachable at http://127.0.0.1:11434. Skipping this cron run; OpenClaw will retry the provider preflight on a later scheduled run. Last error: TimeoutError: request timed out" }] }, "model": "gemma4:26b-nvfp4", "provider": "ollama" }Three minutes later, Ollama responded normally; OpenRouter Nemotron was healthy throughout. The configured fallback would have run the cron successfully.
Verification
Reproducing the bug (no Ollama interference required)
openclaw --version(tested on 2026.5.4).repro.mjs.node repro.mjs.status: "unavailable"with no consultation of theagents.list[*].model.fallbacksfrom the cfg.End-to-end live verification (optional, requires controlled outage)
model.primary: "ollama/<model>"andmodel.fallbacks: ["<healthy cloud provider>/<model>"].openclaw cron add --agent <agent> --at 1m --message "..." --tools exec.launchctl kill TERM gui/$UID/com.ollamaon macOS, orsystemctl stop ollamaon Linux) ≈ 30s before the cron fires.openclaw cron runs --id <id>.Expected (current):
status: "skipped", diagnostic sourcemodel-preflight, no fallback attempted.Desired:
status: "ok"with the fallback model used; or at minimumstatus: "skipped"only after the fallback chain has been exhausted.Suggested fix (sketch — feedback welcome)
Two non-exclusive options:
Defer preflight until after fallback resolution. Extend
preflightCronModelProviderto receive the full fallback chain and walk it in order, returningavailableas soon as one candidate's local probe succeeds (or as soon as a cloud candidate is hit, since cloud preflight is currently a no-op). This keeps the existing semantic of "we only probe local providers".On
unavailable, attempt fallback before returningskipped. Incron/isolated-agent/run.ts, when preflight isunavailable, look up the agent'smodel.fallbacksand rotate to the next candidate (re-running preflight for it if local). Only emitskippedwhen no candidate passes preflight.Option 1 is preferred — it keeps the failure path centralized in one runtime and avoids racing with the in-flight fallback resolver used during agent invocation.
Related
pi-embedded-runner; this one is in cron isolated-agent.unavailableresults that should not skip the run when fallbacks exist.Environment
325df3e)dist/model-preflight.runtime-D3BkBmU5.js+dist/isolated-agent-DPJcOmiU.jsprovider: ollamabaseUrlhttp://127.0.0.1:11999(port libre, garantitfetch failed)