Symptom
Cron task records can be marked status = "lost" with error = "backing session missing" even though the cron run itself completed and this does not necessarily indicate a real user-facing delivery failure. These false positives pollute task audit / status output and can make successful cron runs look orphaned.
Root cause identified
In task-registry maintenance, hasBackingSession(task) treats runtime: "cron" tasks as backed only by whether the cron jobId is present in the in-memory activeJobIds set. After a gateway restart or similar process-lifecycle reset, that set is empty, so older active cron task rows can fail the backing-session check after the reconcile grace window and get marked lost, even when the underlying cron run already finished or its real delivery path is separate from the task ledger.
Minimal repro
- Start a cron job that creates a
runtime: "cron" task record.
- Before task-registry maintenance reconciles that row, restart the gateway or otherwise clear in-memory cron active-job state.
- Wait until the task passes the reconcile grace interval.
- Observe the task row being marked
lost with error = "backing session missing", despite no proof of an actual delivery failure.
Concrete real-world example
On 2026-04-17, the Daily Google Review Monitor task row was marked lost at 07:40:44 local with error backing session missing; gateway startup activity was visible again at 07:40:57 to 07:41:02 local; and the cron run log later recorded finished with status: ok at 07:41:46 local. That sequence is consistent with a restart racing the transient activeJobIds backing-session check, not with a genuinely vanished cron run.
Suggested fix
Do not use transient activeJobIds membership as the sole backing-session test for runtime: "cron" tasks during lost-task reconciliation. Either:
- skip lost-marking for cron tasks unless there is durable evidence the run truly vanished, or
- reconcile cron tasks against durable cron run state instead of in-memory active-job state.
Notes
This appears to be primarily a task-ledger bookkeeping bug. It may overlap with some real delivery incidents, but the false-positive lost rows themselves should not be treated as proof of delivery failure.
Symptom
Cron task records can be marked
status = "lost"witherror = "backing session missing"even though the cron run itself completed and this does not necessarily indicate a real user-facing delivery failure. These false positives pollute task audit / status output and can make successful cron runs look orphaned.Root cause identified
In task-registry maintenance,
hasBackingSession(task)treatsruntime: "cron"tasks as backed only by whether the cronjobIdis present in the in-memoryactiveJobIdsset. After a gateway restart or similar process-lifecycle reset, that set is empty, so older active cron task rows can fail the backing-session check after the reconcile grace window and get markedlost, even when the underlying cron run already finished or its real delivery path is separate from the task ledger.Minimal repro
runtime: "cron"task record.lostwitherror = "backing session missing", despite no proof of an actual delivery failure.Concrete real-world example
On 2026-04-17, the
Daily Google Review Monitortask row was markedlostat 07:40:44 local with errorbacking session missing; gateway startup activity was visible again at 07:40:57 to 07:41:02 local; and the cron run log later recordedfinishedwithstatus: okat 07:41:46 local. That sequence is consistent with a restart racing the transientactiveJobIdsbacking-session check, not with a genuinely vanished cron run.Suggested fix
Do not use transient
activeJobIdsmembership as the sole backing-session test forruntime: "cron"tasks during lost-task reconciliation. Either:Notes
This appears to be primarily a task-ledger bookkeeping bug. It may overlap with some real delivery incidents, but the false-positive
lostrows themselves should not be treated as proof of delivery failure.