-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
Cron sessions deliver hallucinated output instead of failing cleanly when tool calls fail #49876
Copy link
Copy link
Open
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:needs-security-reviewClawSweeper marked this issue as needing security-sensitive review.ClawSweeper marked this issue as needing security-sensitive review.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:securitySecurity boundary, credential, authz, sandbox, or sensitive-data risk.Security boundary, credential, authz, sandbox, or sensitive-data risk.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivityMarked as stale due to inactivity
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:needs-security-reviewClawSweeper marked this issue as needing security-sensitive review.ClawSweeper marked this issue as needing security-sensitive review.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:securitySecurity boundary, credential, authz, sandbox, or sensitive-data risk.Security boundary, credential, authz, sandbox, or sensitive-data risk.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.staleMarked as stale due to inactivityMarked as stale due to inactivity
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
When an isolated cron session encounters a tool failure, missing data, or an incomplete task, the LLM fabricates plausible-looking output and delivers it to the user — rather than failing cleanly or staying silent as instructed.
Impact
This is a trust and safety issue. Users receive fake operational data they may act on.
Confirmed Incidents (March 17-18, 2026)
Incident 1 — Fake calendar meetings (March 17)
meeting-intel-brief(model:gemini-2.0-flash, isolated session)Incident 2 — Fake Supabase migration request (March 18)
dashboard-ado-daily-refresh(model:google/gemini-3-flash-preview, isolated session)Incident 3 — Placeholder text delivered as output (March 18)
track-a-timing-trigger(model:google/gemini-3-flash-preview, isolated session)Models Affected
gemini-2.0-flashgoogle/gemini-3-flash-previewExpected Behavior
When a cron session cannot complete its task:
Requested Fix
Workaround
Migrated all crons off Gemini models to Sonnet. Platform-level fix still needed.