[Bug]: enqueueSystemEvent not deduplicated by runId/contextKey — agents cascade duplicate exec approval prompts under new IDs, locking ecosystem

### Bug type

Regression (worked before, now fails)

### Beta release blocker

No

### Summary

## Summary

Under load, `enqueueSystemEvent` does not deduplicate queued exec approval requests by `runId` or `contextKey`. When a heartbeat run times out and the gateway fails over, the replacement attempt re-queues the **same exec call** with a fresh approval ID. Each retry surfaces a new Telegram approval prompt for the identical command, cascading until the operator kills the gateway. Left alone, it saturates the approval channel fast enough to risk system-level memory pressure.

Reproduced repeatably on a multi-agent install. Filing now so it can be fixed before users with `directPolicy: "allow"` + high-frequency heartbeats discover it the hard way.


### Steps to reproduce

## What the exec call is

Routine health-check probe issued from Maelcum's heartbeat:

```
ps aux | grep -E "contextstored|vllm|openclaw" | grep -v grep | awk '{print $11}' | sort -n | tail -5
```

Hits `on-miss` under the current allowlist, so an approval prompt is expected on first encounter. The bug is that it fires **again, and again, and again**, each time under a new approval ID, for the same run intent.


## Not a duplicate of

I looked for upstream issues that might cover this and found three that are adjacent but distinct:

- **#66487** — heartbeat prompt drops completion body (peek-not-consume on a *different* queue event path, not the approval queue)
- **#14191** — heartbeat routes to wrong session queue (routing bug, not a dedup bug)
- **#36325** — `deliver:false` hooks still inject via `enqueueSystemEvent` (delivery flag bypass, not retry dedup)

None of these address the approval-event retry path or the `(runId, contextKey)` dedup gap.

## Workaround in place

- All 11 agent heartbeats set to `every: "999h"` (circuit breaker)
- No agent work resumes on a normal schedule until this is fixed or a dedup workaround exists at the exec-approvals layer


## Related bug (filing separately)

Telegram `/approve allow-always` writes a `source` field into the approvals allowlist entry that `openclaw approvals set --file` then rejects as unexpected on push. Will cross-reference the issue once filed.


### Expected behavior

Either:

1. `enqueueSystemEvent` deduplicates queued exec approval events by `(agentId, contextKey)` or `(runId, contextKey)`, coalescing retries into the already-pending prompt; or
2. When a run fails over, any exec approval events it queued are cancelled before the replacement run is allowed to enqueue new ones.

Today, neither happens.

[bug-30-log-excerpt-clean.txt](https://github.com/user-attachments/files/26913097/bug-30-log-excerpt-clean.txt)

<img width="581" height="889" alt="Image" src="https://github.com/user-attachments/assets/4a74dc6c-8c52-4e82-80dd-9b65032a46d7" />

### Actual behavior

## Observed behavior

Continual, unceasing consecutive approval prompts delivered to Telegram seconds apart, identical command, different IDs:

- `befadc79-10bd-4e78-b1a4-9e2f546fd3c5`
- `871d7305-c1cc-412c-9393-d538e99e4ae1`
- etc.

Screenshot attached below.

Gateway log (`/tmp/openclaw/openclaw-2026-04-18.log`) shows the cascade signature (excerpt attached):

- `stuck session: sessionId=maelcum sessionId=<uuid> sessionKey=agent:maelcum:telegram:direct:<user_id>` — age ticking up by ~30s per line, crossing 462s before intervention
- `embedded_run_failover_decision failoverReason=timeout` — cycling through the provider chain: `vllm-fast` → `vllm-brain` → `openrouter/z-ai/glm-5`
- Heartbeat re-firing and regenerating the run under fresh `runId`s while the prior attempt is still pending approval

Each failover attempt re-enters `enqueueSystemEvent` carrying the same exec call, but the event queue has no compound key covering the `(runId, contextKey)` pair — so the prior queued approval does not cancel or collapse, and a new one is enqueued instead.



### OpenClaw version

2026.4.14 (323493f)` 

### Operating system

macOS 26.4.1 

### Install method

npm global, latest stable as of filing

### Model

mlx-community/Qwen3.5-9B-OptiQ-4bit (local, via rapid-mlx 0.3.12)

### Provider / routing chain

openclaw -> vllm-fast (localhost:8001, rapid-mlx 0.3.12) -> Qwen3.5-9B-OptiQ-4bit

### Additional provider/model setup details

## Environment

- **Host:** macOS, Mac Mini M4 Pro, 48 GB unified memory
- **Gateway:** launchd-supervised, loopback bind, port 18789
- **Heartbeat:** `every: "3h"`, `directPolicy: "allow"`, `target: "telegram"`, `lightContext: true`
- **Exec approval policy:** `defaults.security: "allowlist"`, `ask: "on-miss"`, `askFallback: "deny"`; `maelcum` uses host defaults

### Logs, screenshots, and evidence

```shell
## Attached evidence

1. Screenshot of two consecutive approval prompts with different IDs for the same command
2. `bug-30-log-excerpt.txt` — 60 lines of the cascade from the gateway log
```

### Impact and severity

## Impact

- Saturates the approval channel — every cascade cycle produces a new Telegram prompt
- Fast enough to outrun manual intervention; forcing a gateway restart (`openclaw gateway restart`) is the only reliable stop
- On installs with many agents sharing a channel, one stuck agent can drown all approval prompts for every other agent
- Forced me to set **all 11 heartbeats to `every: "999h"`** as a circuit breaker while the bug is unresolved — effectively disabling the ecosystem's scheduled work layer


### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: enqueueSystemEvent not deduplicated by runId/contextKey — agents cascade duplicate exec approval prompts under new IDs, locking ecosystem #69478

Bug type

Beta release blocker

Summary

Summary

Steps to reproduce

What the exec call is

Not a duplicate of

Workaround in place

Related bug (filing separately)

Expected behavior

Actual behavior

Observed behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Environment

Logs, screenshots, and evidence

Impact and severity

Impact

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: enqueueSystemEvent not deduplicated by runId/contextKey — agents cascade duplicate exec approval prompts under new IDs, locking ecosystem #69478

Description

Bug type

Beta release blocker

Summary

Summary

Steps to reproduce

What the exec call is

Not a duplicate of

Workaround in place

Related bug (filing separately)

Expected behavior

Actual behavior

Observed behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Environment

Logs, screenshots, and evidence

Impact and severity

Impact

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions