Skip to content

halt() never emits an ACP frame when LLM call fails after retries — turn appears silently stuck to ACP clients #26350

@truenorth-lj

Description

@truenorth-lj

Summary

When an LLM call fails after the retry policy in session/retry.ts is exhausted, session/processor.ts halt() updates internal state (ctx.assistantMessage.error, Bus.Session.Event.Error, EventV2.SessionEvent.Step.Failed.Sync) but never emits any ACP session/update notification or other frame to the connected client.

From an ACP client's perspective the turn is silently stuck:

  • No stopReason
  • No error notification
  • No final session/update

The turn was started by session/prompt, the agent emitted some agent_message_chunks (or zero, for an immediate failure), and then nothing.

Reproduction

  1. Configure any model with a deterministic non-retriable failure (the simplest is a backend that returns 429 with a body the AI SDK retries on, until SessionRetry's budget is exhausted; a 502 storm works the same way).
  2. Send session/prompt.
  3. Observe: client receives no terminating frame after the retry window.

The retry policy itself is fine — the issue is purely in what happens after Effect.retry(SessionRetry.policy(...)) gives up and the retry error reaches halt() via Effect.catch(halt).

Expected

When halt() runs, the connected ACP client should receive a session/update (or equivalent ACP frame) that:

  • Indicates the turn ended in an error state (stopReason: "error").
  • Carries enough information for the client to render an error to the user instead of waiting forever.

Actual

halt() (packages/opencode/src/session/processor.ts, around the function definition Effect.fn("SessionProcessor.halt")):

const halt = Effect.fn("SessionProcessor.halt")(function* (e: unknown) {
  slog.error("process", { ... })
  const error = parse(e)
  if (MessageV2.ContextOverflowError.isInstance(error)) {
    ctx.needsCompaction = true
    yield* bus.publish(Session.Event.Error, { sessionID: ctx.sessionID, error })
    return
  }
  if (!ctx.assistantMessage.summary) {
    EventV2.run(SessionEvent.Step.Failed.Sync, { ... })
  }
  ctx.assistantMessage.error = error
  yield* bus.publish(Session.Event.Error, {
    sessionID: ctx.assistantMessage.sessionID,
    error: ctx.assistantMessage.error,
  })
  yield* status.set(ctx.sessionID, { type: "idle" })
})

Both bus.publish(Session.Event.Error, ...) and EventV2.run(SessionEvent.Step.Failed.Sync, ...) are internal to the opencode process. Neither produces an ACP wire frame — the ACP connection.sessionUpdate(...) callsite in packages/opencode/src/acp/agent.ts only translates a subset of bus events into ACP frames (message.part.updated, permission.asked, compact, etc.); there is no case for session.error.

Suggested fix

Add a handler in packages/opencode/src/acp/agent.ts handleEvent() for event.type === "session.error" that:

  1. Reads the error payload (MessageV2.APIError carries responseHeaders / responseBody / statusCode, sufficient to characterize the failure).
  2. Calls this.connection.sessionUpdate({ sessionId, update: { sessionUpdate: ..., ... } }) to notify the client.

Two design choices for the update shape:

  • Minimal: emit a session/update of an existing kind with stopReason: "error" and the error message inline. No protocol change required.
  • Typed (preferred): introduce a new agent_error SessionUpdate kind with a structured payload (type, message, retryable, optional retry-after / reset-at / etc.) so clients can render type-specific copy (rate-limit retry timer, budget reset time, context-overflow vs auth, …) instead of just a generic error string. I have a separate PR proposing this kind: feat(acp): add AgentErrorUpdate session/update kind for typed LLM error propagation #26306.

Happy to send a PR for whichever direction you'd prefer. My current draft goes with the typed kind from #26306; if #26306 is unwanted, the same fix lands easily on a stopReason: "error"-on-existing-kind variant.

Environment

opencode dev branch as of this issue. The halt() location and bus event names referenced above match the current source; the relevant control flow (Effect.retry(SessionRetry.policy(...)) → Effect.catch(halt)) lives in packages/opencode/src/session/processor.ts's process function.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions