Skip to content

[Bug]: Control UI chat can stay stuck on "Stop" after embedded run timeout on Windows #42011

@lost-abadan

Description

@lost-abadan

Bug type

Crash (process/app exits or hangs)

Summary

In Gateway Dashboard / Control UI chat, an embedded run can appear to finish its tool/file work, but the UI remains stuck in a running state:

  • Stop remains visible
  • clicking Stop does nothing
  • gateway health still shows normal
  • the affected chat session never returns to idle

This looks like a missing terminal lifecycle event (end / error) reaching the webchat/control-ui path after timeout/abort

Steps to reproduce

A chat run started normally and performed tool/file operations. The UI showed those operations as completed, but the chat remained stuck in a running state.

Observed behavior:

  • Stop remained visible
  • clicking Stop had no effect
  • gateway health still showed normal
  • the chat content already showed completed tool/edit actions

Observed log entry:

embedded run timeout: runId=<...> sessionId=<...> timeoutMs=600000

There were also timeout-related entries such as LLM request timeouts around the same period.

### Investigation notes

From tracing the bundled code in 2026.3.8, this does not look like a simple frontend-only issue.

What I found:

- Control UI chat finalization appears to depend on lifecycle terminal events (phase === "end" or "error")
- the embedded timeout path appears to call abortRun(true) / activeSession.abort()
- that timeout path does not appear to directly emit a terminal lifecycle event itself
- terminal lifecycle emission seems to depend on the embedded subscription path receiving agent_end
- once agentRunStarted === true in webchat/control-ui, the UI path appears to rely on lifecycle terminal events and does not seem to have a fallback finalization path if those events never arrive

### Current hypothesis

There is an edge case where an embedded run times out or aborts after startup, but no terminal lifecycle event reaches Control UI.

That leaves the webchat/control-ui run stuck in a running state even though the run has already ended internally.

### What this does NOT look like

Based on local tracing, this does not currently look like:

- a simple early-return bug in dispatchInboundMessage()
- a frontend-only rendering issue
- an indefinitely blocked cleanup in flushPendingToolResultsAfterIdle() (that cleanup appears bounded)

### Possible areas to inspect

- embedded timeout/abort path in runEmbeddedAttempt(...) / runEmbeddedPiAgent(...)
- whether activeSession.abort() can bypass agent_end
- whether subscribeEmbeddedPiSession(...) can unsubscribe before terminal lifecycle is emitted
- whether webchat/control-ui should defensively finalize a started run when dispatchInboundMessage() settles but no terminal lifecycle event was observed

### Repro hints

In my case this happened in Gateway Dashboard chat after a long-running embedded run that eventually timed out. The UI had already shown completed tool/edit actions, but the run never left the Stop state.


### Expected behavior

After a timeout or abort, the Control UI should always receive a terminal state and the run should finalize as one of:

- completed
- aborted
- error

The UI should leave the `Stop` state.

### Actual behavior

After timeout:

- the run can stay stuck on `Stop`
- clicking `Stop` may do nothing
- the gateway can remain otherwise healthy
- the specific run/session appears zombied in the UI

### OpenClaw version

2026.3.8

### Operating system

Windows 11

### Install method

npm

### Model

openai-codex/gpt-5.3-codex

### Provider / routing chain

chat

### Config file / key location

_No response_

### Additional provider/model setup details

From tracing the bundled code in 2026.3.8, this does not look like a simple frontend-only issue.

What I found:

- Control UI chat finalization appears to depend on lifecycle terminal events (phase === "end" or "error")
- the embedded timeout path appears to call abortRun(true) / activeSession.abort()
- that timeout path does not appear to directly emit a terminal lifecycle event itself
- terminal lifecycle emission seems to depend on the embedded subscription path receiving agent_end
- once agentRunStarted === true in webchat/control-ui, the UI path appears to rely on lifecycle terminal events and does not seem to have a fallback finalization path if those events never arrive

### Logs, screenshots, and evidence

```shell

Impact and severity

No response

Additional information

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingbug:crashProcess/app exits unexpectedly or hangs

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions