-
-
Notifications
You must be signed in to change notification settings - Fork 79.2k
[Bug]: enqueueSystemEvent not deduplicated by runId/contextKey — agents cascade duplicate exec approval prompts under new IDs, locking ecosystem #69478
Copy link
Copy link
Open
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingclawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:needs-security-reviewClawSweeper marked this issue as needing security-sensitive review.ClawSweeper marked this issue as needing security-sensitive review.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:securitySecurity boundary, credential, authz, sandbox, or sensitive-data risk.Security boundary, credential, authz, sandbox, or sensitive-data risk.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.regressionBehavior that previously worked and now failsBehavior that previously worked and now fails
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingclawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:needs-security-reviewClawSweeper marked this issue as needing security-sensitive review.ClawSweeper marked this issue as needing security-sensitive review.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:securitySecurity boundary, credential, authz, sandbox, or sensitive-data risk.Security boundary, credential, authz, sandbox, or sensitive-data risk.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.regressionBehavior that previously worked and now failsBehavior that previously worked and now fails
Type
Fields
Give feedbackNo fields configured for issues without a type.
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
Summary
Under load,
enqueueSystemEventdoes not deduplicate queued exec approval requests byrunIdorcontextKey. When a heartbeat run times out and the gateway fails over, the replacement attempt re-queues the same exec call with a fresh approval ID. Each retry surfaces a new Telegram approval prompt for the identical command, cascading until the operator kills the gateway. Left alone, it saturates the approval channel fast enough to risk system-level memory pressure.Reproduced repeatably on a multi-agent install. Filing now so it can be fixed before users with
directPolicy: "allow"+ high-frequency heartbeats discover it the hard way.Steps to reproduce
What the exec call is
Routine health-check probe issued from Maelcum's heartbeat:
Hits
on-missunder the current allowlist, so an approval prompt is expected on first encounter. The bug is that it fires again, and again, and again, each time under a new approval ID, for the same run intent.Not a duplicate of
I looked for upstream issues that might cover this and found three that are adjacent but distinct:
deliver: falsestill injects system event into main session #36325 —deliver:falsehooks still inject viaenqueueSystemEvent(delivery flag bypass, not retry dedup)None of these address the approval-event retry path or the
(runId, contextKey)dedup gap.Workaround in place
every: "999h"(circuit breaker)Related bug (filing separately)
Telegram
/approve allow-alwayswrites asourcefield into the approvals allowlist entry thatopenclaw approvals set --filethen rejects as unexpected on push. Will cross-reference the issue once filed.Expected behavior
Either:
enqueueSystemEventdeduplicates queued exec approval events by(agentId, contextKey)or(runId, contextKey), coalescing retries into the already-pending prompt; orToday, neither happens.
bug-30-log-excerpt-clean.txt
Actual behavior
Observed behavior
Continual, unceasing consecutive approval prompts delivered to Telegram seconds apart, identical command, different IDs:
befadc79-10bd-4e78-b1a4-9e2f546fd3c5871d7305-c1cc-412c-9393-d538e99e4ae1Screenshot attached below.
Gateway log (
/tmp/openclaw/openclaw-2026-04-18.log) shows the cascade signature (excerpt attached):stuck session: sessionId=maelcum sessionId=<uuid> sessionKey=agent:maelcum:telegram:direct:<user_id>— age ticking up by ~30s per line, crossing 462s before interventionembedded_run_failover_decision failoverReason=timeout— cycling through the provider chain:vllm-fast→vllm-brain→openrouter/z-ai/glm-5runIds while the prior attempt is still pending approvalEach failover attempt re-enters
enqueueSystemEventcarrying the same exec call, but the event queue has no compound key covering the(runId, contextKey)pair — so the prior queued approval does not cancel or collapse, and a new one is enqueued instead.OpenClaw version
2026.4.14 (323493f)`
Operating system
macOS 26.4.1
Install method
npm global, latest stable as of filing
Model
mlx-community/Qwen3.5-9B-OptiQ-4bit (local, via rapid-mlx 0.3.12)
Provider / routing chain
openclaw -> vllm-fast (localhost:8001, rapid-mlx 0.3.12) -> Qwen3.5-9B-OptiQ-4bit
Additional provider/model setup details
Environment
every: "3h",directPolicy: "allow",target: "telegram",lightContext: truedefaults.security: "allowlist",ask: "on-miss",askFallback: "deny";maelcumuses host defaultsLogs, screenshots, and evidence
Impact and severity
Impact
openclaw gateway restart) is the only reliable stopevery: "999h"as a circuit breaker while the bug is unresolved — effectively disabling the ecosystem's scheduled work layerAdditional information
No response