Skip to content

False-positive lost cron task records after gateway restart due to transient activeJobIds backing-session check #68191

@forevermore3333

Description

@forevermore3333

Symptom

Cron task records can be marked status = "lost" with error = "backing session missing" even though the cron run itself completed and this does not necessarily indicate a real user-facing delivery failure. These false positives pollute task audit / status output and can make successful cron runs look orphaned.

Root cause identified

In task-registry maintenance, hasBackingSession(task) treats runtime: "cron" tasks as backed only by whether the cron jobId is present in the in-memory activeJobIds set. After a gateway restart or similar process-lifecycle reset, that set is empty, so older active cron task rows can fail the backing-session check after the reconcile grace window and get marked lost, even when the underlying cron run already finished or its real delivery path is separate from the task ledger.

Minimal repro

  1. Start a cron job that creates a runtime: "cron" task record.
  2. Before task-registry maintenance reconciles that row, restart the gateway or otherwise clear in-memory cron active-job state.
  3. Wait until the task passes the reconcile grace interval.
  4. Observe the task row being marked lost with error = "backing session missing", despite no proof of an actual delivery failure.

Concrete real-world example

On 2026-04-17, the Daily Google Review Monitor task row was marked lost at 07:40:44 local with error backing session missing; gateway startup activity was visible again at 07:40:57 to 07:41:02 local; and the cron run log later recorded finished with status: ok at 07:41:46 local. That sequence is consistent with a restart racing the transient activeJobIds backing-session check, not with a genuinely vanished cron run.

Suggested fix

Do not use transient activeJobIds membership as the sole backing-session test for runtime: "cron" tasks during lost-task reconciliation. Either:

  • skip lost-marking for cron tasks unless there is durable evidence the run truly vanished, or
  • reconcile cron tasks against durable cron run state instead of in-memory active-job state.

Notes

This appears to be primarily a task-ledger bookkeeping bug. It may overlap with some real delivery incidents, but the false-positive lost rows themselves should not be treated as proof of delivery failure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions