Bug type
Regression (worked before, now fails)
Summary
Cron jobs return enqueued: true from both scheduled triggers and openclaw cron run but never actually execute — no session start, no LLM call, no tool activity appears in logs.
Steps to reproduce
- Have one or more cron jobs configured in jobs.json with enabled: true
- Run: openclaw cron run
- Observe response: {"ok": true, "enqueued": true, "runId": "manual:..."}
- Wait 15-30 seconds
- Run: openclaw logs --max-bytes 50000 | tail -40
- Observe: only heartbeat and timer-armed lines — no session start, no LLM call, no execution
- Run: openclaw gateway restart
- Observe log: "cron: clearing stale running marker on startup" for the same jobId
- Repeat steps 2-8 — cycle repeats indefinitely
Expected behavior
After openclaw cron run returns enqueued: true, the cron lane should pick up the job within seconds, start an isolated session, send the payload to the configured agent and model, and log execution activity including session start, LLM request, and completion.
Actual behavior
The job is enqueued and runningAtMs is immediately written to jobs.json. The cron lane never dispatches it. No session is created, no LLM call is made. On the next gateway restart, OpenClaw logs "cron: clearing stale running marker on startup" for the job — then the same cycle repeats on the next enqueue. Deleting ~/.openclaw/cron/runs/ and manually removing runningAtMs from jobs.json does not resolve the issue — the marker is re-written on the next enqueue and the job still never executes.
OpenClaw version
2026.3.8 (3caab92)
Operating system
Ubuntu 24.04 LTS
Install method
npm install -g openclaw@latest — running as systemd service via openclaw-gateway.service
Model
vllm/Qwen/Qwen3.5-9B (self-hosted via vLLM on RunPod, OpenAI-compatible endpoint)
Provider / routing chain
vllm provider → https://-8000.proxy.runpod.net/v1 → vLLM serving Qwen/Qwen3.5-9B
Config file / key location
~/.openclaw/openclaw.json → models.providers.vllm, plugins.entries, agents.smith-vciso
Additional provider/model setup details
Provider defined with "api": "openai-completions", "contextWindow": 131072, "maxTokens": 16000. Agent smith-vciso uses "model": "vllm/Qwen/Qwen3.5-9B". The same model works correctly for interactive chat sessions — only cron execution is broken. The issue reproduces with any cron job regardless of which agent or model is configured.
Logs, screenshots, and evidence
07:43:36 warn cron clearing stale running marker on startup
jobId: 57731dfe-e8d7-4ba7-92ef-eb6e408919be
runningAtMs: 1773215045982
07:44:31 info { "ok": true, "enqueued": true,
"runId": "manual:57731dfe-...:1773215071210:1" }
[next 5 minutes: only cron timer-armed + web heartbeat lines]
[no session start, no LLM call, no execution of any kind]
Impact and severity
Affected: All cron jobs across all agents — 100% failure rate
Severity: Blocks workflow entirely — the entire scheduled automation layer is non-functional
Frequency: Always reproduces — every enqueue, every scheduled trigger
Consequence: All scheduled HR agent workflows (morning prompts, evening reminders, daily reports, escalations) fail silently. No WhatsApp messages sent to team members. No reports generated. Production automation completely down.
Additional information
Interactive chat sessions via WhatsApp and Telegram work correctly — only the cron execution lane is affected. The bug persists across gateway restarts, runs/ directory deletion, and manual runningAtMs removal from jobs.json. Previously working cron jobs stopped executing after a period of LLM timeouts (RunPod pod was stopped while cron jobs were scheduled to run), suggesting the lane may have entered a broken state from unresolved LLM timeout errors and never recovered.
Bug type
Regression (worked before, now fails)
Summary
Cron jobs return enqueued: true from both scheduled triggers and openclaw cron run but never actually execute — no session start, no LLM call, no tool activity appears in logs.
Steps to reproduce
Expected behavior
After openclaw cron run returns enqueued: true, the cron lane should pick up the job within seconds, start an isolated session, send the payload to the configured agent and model, and log execution activity including session start, LLM request, and completion.
Actual behavior
The job is enqueued and runningAtMs is immediately written to jobs.json. The cron lane never dispatches it. No session is created, no LLM call is made. On the next gateway restart, OpenClaw logs "cron: clearing stale running marker on startup" for the job — then the same cycle repeats on the next enqueue. Deleting ~/.openclaw/cron/runs/ and manually removing runningAtMs from jobs.json does not resolve the issue — the marker is re-written on the next enqueue and the job still never executes.
OpenClaw version
2026.3.8 (3caab92)
Operating system
Ubuntu 24.04 LTS
Install method
npm install -g openclaw@latest — running as systemd service via openclaw-gateway.service
Model
vllm/Qwen/Qwen3.5-9B (self-hosted via vLLM on RunPod, OpenAI-compatible endpoint)
Provider / routing chain
vllm provider → https://-8000.proxy.runpod.net/v1 → vLLM serving Qwen/Qwen3.5-9B
Config file / key location
~/.openclaw/openclaw.json → models.providers.vllm, plugins.entries, agents.smith-vciso
Additional provider/model setup details
Provider defined with "api": "openai-completions", "contextWindow": 131072, "maxTokens": 16000. Agent smith-vciso uses "model": "vllm/Qwen/Qwen3.5-9B". The same model works correctly for interactive chat sessions — only cron execution is broken. The issue reproduces with any cron job regardless of which agent or model is configured.
Logs, screenshots, and evidence
07:43:36 warn cron clearing stale running marker on startup jobId: 57731dfe-e8d7-4ba7-92ef-eb6e408919be runningAtMs: 1773215045982 07:44:31 info { "ok": true, "enqueued": true, "runId": "manual:57731dfe-...:1773215071210:1" } [next 5 minutes: only cron timer-armed + web heartbeat lines] [no session start, no LLM call, no execution of any kind]Impact and severity
Affected: All cron jobs across all agents — 100% failure rate
Severity: Blocks workflow entirely — the entire scheduled automation layer is non-functional
Frequency: Always reproduces — every enqueue, every scheduled trigger
Consequence: All scheduled HR agent workflows (morning prompts, evening reminders, daily reports, escalations) fail silently. No WhatsApp messages sent to team members. No reports generated. Production automation completely down.
Additional information
Interactive chat sessions via WhatsApp and Telegram work correctly — only the cron execution lane is affected. The bug persists across gateway restarts, runs/ directory deletion, and manual runningAtMs removal from jobs.json. Previously working cron jobs stopped executing after a period of LLM timeouts (RunPod pod was stopped while cron jobs were scheduled to run), suggesting the lane may have entered a broken state from unresolved LLM timeout errors and never recovered.