[Bug]: async task completion reports can be lost because system event/wake is not reliably session-targeted

## Bug type

Bug / async completion routing

## Summary

Async task completion reporting is unreliable when external task runners (for example Codex via `exec`) try to notify OpenClaw with:

```bash
openclaw system event --text "...done..." --mode now
```

In practice this can fail in two ways:

1. the CLI call itself fails with local gateway websocket errors such as:
   - `gateway closed (1006 abnormal closure)`
   - target `ws://127.0.0.1:18789`
2. even when the call succeeds conceptually, `system event` / `wake` is not session-targeted, so completion reporting does not reliably route back to the originating user conversation

This makes long-running background tasks feel "done but never reported" unless the user asks again.

## Environment

- OpenClaw: `2026.3.13 (61d171a)`
- OS: macOS
- Gateway mode: local
- Gateway bind: loopback
- Messaging surface: Telegram direct chat
- Typical runner: `exec` background task launching Codex / external orchestrator

## What I observed

I reproduced this with a simple background coding task:

1. Start a long-ish Codex task via background `exec`
2. Ask Codex to run this on completion:

```bash
openclaw system event --text "Done: Built a concise responsive login page in the temp project directory" --mode now
```

3. The coding task finishes successfully
4. The completion notification does **not** arrive in the originating Telegram conversation

In one captured run, Codex did execute the command, but it failed with:

```text
gateway closed (1006 abnormal closure)
Gateway target: ws://127.0.0.1:18789
```

At the same time the actual task output/files were present, confirming the work completed.

## Root cause analysis

After tracing the current implementation, this looks like a product/architecture gap rather than only a single transport glitch:

### A. `openclaw system event` is implemented as a `wake`
The CLI path does not act like a reliable completion callback. It effectively does:

- enqueue system event text
- request heartbeat

### B. `wake` is not session-targeted
The current `wake` path does not carry `sessionKey` in the relevant CLI flow.

That means the event is not reliably bound to the originating conversation that launched the async task.

### C. heartbeat defaults to the agent's main session when no forced session is provided
So even if the wake/system event path works, it does not guarantee delivery back to the original user thread / DM that triggered the task.

### D. local websocket fragility makes it worse
From external task runners, the local gateway websocket path can also fail with `1006 abnormal closure`, so the fallback notification bridge is itself not reliable.

## Why this matters

This creates a bad UX for background tasks:

- task actually completes
- OpenClaw may know something happened
- user still gets no completion report
- user has to manually ask "is it done?"

This is especially noticeable for:

- Codex / ACP tasks launched from chat
- background `exec` jobs
- external orchestrators like ClawTeam

## Expected behavior

At least one of these should be true:

1. `openclaw system event` / `wake` supports an explicit `sessionKey` and reliably wakes the originating session
2. async exec completion events preserve originating session context automatically
3. there is a first-class completion notification path for background tasks that can deliver to the originating channel/session without depending on main-session heartbeat inference

## Related work already in the repo

This seems closely related to:

- #33815 — Sub-agent completion push notification to originating channel
- #35231 — add `--session-key` support to system wake/event
- #50818 — propagate `sessionKey` in exec/hooks to fix async context loss
- #43392 — websocket 1006 race condition fix
- #47711 — reload aborts in-flight subagent calls / completion reporting issues

## Suggestion

I suspect the real fix is not just transport retry. The bigger gap is that `system event` / `wake` is currently used as if it were a completion callback, but it is really an internal wake/heartbeat mechanism.

So the best fix is probably one or both of:

- explicit session targeting for wake/system-event entry points
- a first-class completion notification mechanism for async/background tasks

If useful, I can provide a more detailed repro timeline and the exact local logs / Codex transcript snippets that showed:

- successful task completion
- attempted `openclaw system event`
- failure with `gateway closed (1006 abnormal closure)`
- no proactive Telegram completion report


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: async task completion reports can be lost because system event/wake is not reliably session-targeted #52305

Bug type

Summary

Environment

What I observed

Root cause analysis

A. `openclaw system event` is implemented as a `wake`

B. `wake` is not session-targeted

C. heartbeat defaults to the agent's main session when no forced session is provided

D. local websocket fragility makes it worse

Why this matters

Expected behavior

Related work already in the repo

Suggestion

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: async task completion reports can be lost because system event/wake is not reliably session-targeted #52305

Description

Bug type

Summary

Environment

What I observed

Root cause analysis

A. openclaw system event is implemented as a wake

B. wake is not session-targeted

C. heartbeat defaults to the agent's main session when no forced session is provided

D. local websocket fragility makes it worse

Why this matters

Expected behavior

Related work already in the repo

Suggestion

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

A. `openclaw system event` is implemented as a `wake`

B. `wake` is not session-targeted