Skip to content

Bug: deleteMessage (and other assertNotBusy-guarded APIs) permanently fail when a question tool part is stuck in state.status=running #27907

@omercnet

Description

@omercnet

Description

When the question tool ends up in a permanently running state (state.output never written, state.time.end never set, no follow-up assistant message), the session is marked busy forever. After that, every HTTP API call gated by SessionRunState.assertNotBusy() fails — including deleteMessage — so the user has no way to delete the offending message, abort the run, or otherwise escape the session via the documented HTTP API.

Concrete error returned by the server (paste from our log):

ERROR ... service=server error=Session ses_1d00d41caffe3C6jQ0xci17Keg is busy
  at SessionRunState.assertNotBusy
  at SessionRunState.assertNotBusy (definition)
  at SessionHttpApi.deleteMessage
  at SessionHttpApi.deleteMessage (definition)

The HTTP method deleteMessage checks assertNotBusy and throws, but the only thing keeping the session busy is a tool part that will never complete on its own. The session becomes a dead-end. The user's only recovery path today is to manually delete the session row or the stuck part row out of ~/.local/share/opencode/opencode.db.

This is essentially the same class of problem as the now-closed #14014 ("Revert fails when session is busy/generating"), but instead of revert the affected API is deleteMessage, and the trigger isn't an in-flight model generation — it's a question-tool part that has been abandoned mid-flight by a prior failure path.

How we got into the stuck-running state

We have three independent reproductions today from live ~/.local/share/opencode/opencode.db inspection. All three follow the same signature:

  • A prior MCP tool call ended with state.status: "error", error: "Not connected" (in our case the failing MCP server was context-mode).
  • A handful of normal bash/read tool calls then ran successfully in the same session.
  • A subsequent question tool call fires.
  • The question part is created with state.status: "running" and updated once ~6 seconds later with what looks like state.time.start written, but state.output is never written, state.time.end is never set, and no follow-up assistant message is created. The SSE/SDK side seems to never deliver the "this tool finished" event to the persisted part, so the part stays running indefinitely.
  • The session is now permanently busy from the run-state's point of view, and deleteMessage (and presumably any other assertNotBusy-guarded API) throws on every call.

We can produce a full forensic dump for any of the three sessions if useful (part ids, message ids, timing, etc.).

Expected Behavior

A deleteMessage request on a stuck session should either:

  1. Succeed by treating an indefinitely-running tool part as cancellable (since the user is explicitly trying to delete the message that owns it), or
  2. Be allowed to force-abort the stuck run-state first (similar to how session.undo already aborts ongoing work before reverting, per #14014's expected-behavior section), or
  3. At minimum offer a documented HTTP API to force-clear the busy flag / mark abandoned tool parts as errored, so the UI and external clients (we use CodeNomad as a wrapper) can recover without touching the SQLite DB by hand.

Plugins

context-mode@latest is the MCP plugin whose "Not connected" failure appears to be the upstream trigger in our reproductions, but the underlying lockout problem is in opencode itself: the run-state never garbage-collects an abandoned question tool part, and the HTTP API has no escape hatch when that happens. The same lockout would presumably occur for any tool that fails to deliver a terminal state event.

OpenCode version

v1.14.48 (reproduced today). Should still apply to v1.15.3 based on inspection of the assertNotBusy call sites — happy to re-verify on latest if asked.

Steps to reproduce

We have not yet pinned down a one-shot reproducer for getting the question part into the stuck-running state, but the secondary lockout is deterministic once you're there:

  1. Get any tool part stuck in state.status: "running" with no terminal event (we can repro this via an MCP tool that errors with "Not connected" mid-session, followed shortly by a question tool call — happens reliably enough that we have three live cases today).
  2. Confirm via the DB that state.output is undefined and state.time.end is missing for the offending part.
  3. Call DELETE /session/:sessionId/message/:messageId (or whichever route SessionHttpApi.deleteMessage is bound to) on any message in the session.
  4. Observe the server error: Session <id> is busy at SessionRunState.assertNotBusy at SessionHttpApi.deleteMessage.

Result: the session is undeletable / unrecoverable through the documented HTTP API.

Screenshot and/or share link

N/A — server-side error. We can share part dumps if useful.

Operating System

Linux

Terminal

N/A (we hit this through CodeNomad, a third-party UI wrapper around opencode-server)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions