Skip to content

BUG: Completed run can remain persisted as running, blocking new input and stop #60250

@kAIborg24

Description

@kAIborg24

Bug type

Regression (worked before, now fails)

Beta release blocker

Yes

Summary

A completed interactive session can be written back to the session store with status running even though endedAt and runtimeMs are already set. When this happens, the gateway treats the conversation as still in flight, new inbound messages are not accepted, and stop/abort commands cannot be processed until the session JSON is edited manually.

Steps to reproduce

  1. Start an interactive direct-chat session on OpenClaw 2026.4.2.
  2. Use the default main session with openai-codex/gpt-5.4.
  3. Let a turn finish normally.
  4. Inspect the persisted session-store row for the active session.
  5. If the bug triggers, the row shows status: "running" while also containing terminal fields such as endedAt and runtimeMs.
  6. After that state is persisted, new inbound messages are not accepted for that conversation and stop/abort commands do not take effect until the session-store row is corrected manually.

Expected behavior

When a run ends, the persisted session-store row should move to a terminal state such as done, failed, killed, or timeout.

A completed run must not remain marked running, and subsequent inbound messages plus stop/abort commands should continue to work normally.

Actual behavior

Observed persisted state for the main direct-chat session contained all of the following at once:

  • status: "running"
  • a non-null endedAt
  • a non-null runtimeMs
  • modelProvider: "openai-codex"
  • model: "gpt-5.4"

Because the session row remained running, the conversation became wedged: the agent could resume old work after restart, but new input was not accepted and stop/abort commands were ineffective. Manually changing the persisted row from running to done restored normal behavior.

OpenClaw version

2026.4.2

Operating system

Ubuntu 24.04.4 LTS; Linux 6.18.7-surface-1 x86_64

Install method

npm global

Model

gpt-5.4

Provider / routing chain

openai-codex -> gpt-5.4 (observed in persisted session row)

Additional provider/model setup details

Observed on the main direct-chat session.

Related Anthropic auth failures were present in the same environment, but they were not the immediate cause of this wedge. The persisted stuck row was for openai-codex/gpt-5.4.

Logs, screenshots, and evidence

Grounded evidence from the installed 2026.4.2 build:

- The persisted session-store row for the main session contained `status: "running"` together with terminal timing fields (`endedAt`, `runtimeMs`) and `modelProvider: "openai-codex"`, `model: "gpt-5.4"`.
- The accepted-command log showed no newly accepted inbound commands after the wedge state occurred.
- The session transcript itself ended with a normal final GPT-5.4 assistant message, indicating that the transcript and the persisted session-store status had diverged.
- Reviewing the installed dist build suggests the likely write-order bug is: lifecycle persistence computes a terminal status correctly, but a later reply/session persistence write can re-save the in-memory `sessionEntry` with stale `status: "running"`.
- Relevant installed-source areas in 2026.4.2: `gateway-cli-*.js` session lifecycle persistence (`persistGatewaySessionLifecycleEvent` / `deriveGatewaySessionLifecycleSnapshot`) and `reply-*.js` final session-entry persistence that writes `...store[sessionKey], ...sessionEntry` back to disk.

Impact and severity

High severity.

When triggered, the primary conversation becomes effectively unusable: new messages are not accepted, stop/abort commands do not work, and recovery requires manual editing of the persisted session-store JSON while the gateway is stopped. This can make the gateway appear online but operationally stuck.

Additional information

Root cause confidence is high for a persistence-order bug, not a UI-only issue.

The lifecycle code in the installed 2026.4.2 build computes terminal states correctly, and the merge helper overwrites fields normally. The contradictory persisted row therefore appears to require a later post-run write that re-saves stale status: "running" from an in-memory sessionEntry.

The issue was recoverable by stopping the gateway, backing up the session files, changing the affected session row from running to done, and restarting the live gateway runtime.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions