Skip to content

[Daemon] RFC: POST /prompt should be non-blocking — decouple trigger from completion #4582

@chiga0

Description

@chiga0

Summary

The daemon's POST /session/:id/prompt endpoint currently holds the HTTP connection open until the entire agent turn completes (model inference + tool execution + multi-step agentic loop). This synchronous blocking design conflicts with common infrastructure timeout constraints and creates reliability issues in real-world deployments.

Current Design

Client                              Daemon
  |                                   |
  |--- POST /prompt ----------------->|
  |         (connection held open)    |  ← model inference
  |                                   |  ← tool execution
  |                                   |  ← more model calls...
  |                                   |  ← could take 2-10+ minutes
  |<-- 200 { stopReason } -----------|

Meanwhile, real-time data (assistant text chunks, tool calls, tool output) is already delivered independently via the SSE GET /session/:id/events stream. The /prompt HTTP response only carries { stopReason } — effectively just a completion signal.

Problem

In HTTP-based deployments (web IDE, remote daemon access), the request passes through standard infrastructure layers (reverse proxies, ingress controllers, load balancers). These layers universally enforce ~60s proxy_read_timeout on regular HTTP requests — this is an industry-standard default, not a misconfiguration.

When an agent turn exceeds 60s:

  • The intermediate proxy returns 504 Gateway Timeout to the client
  • The daemon continues executing normally (unaware of the disconnection)
  • The client loses the stopReason completion signal
  • There is no alternative way to learn that the turn has finished, because no turn_complete event exists in the SSE protocol

SSE connections are exempt from this timeout (via X-Accel-Buffering: no, heartbeat frames, dedicated proxy config), but regular HTTP POST requests are not — and shouldn't need to be.

Design Issue

The /prompt endpoint conflates two distinct responsibilities:

  1. Trigger — "start processing this prompt" (validation, queueing)
  2. Await completion — "tell me when the turn is done and why it stopped"

Responsibility #2 is already better served by the SSE channel, which:

  • Has built-in reconnection and heartbeat mechanisms
  • Is already used for all intermediate state delivery
  • Survives proxy timeouts by design

Reference: ACP Streamable HTTP already solves this

The ACP HTTP transport (/acp, PR #4472, RFD #721) has already adopted the non-blocking pattern:

POST /acp { session/prompt } → 202 (immediate, empty body)
GET  /acp (session-scoped)   ← SSE: session/update notifications
                             ← SSE: { id, result: { stop_reason } }  (completion)

This works because:

  • POST takes <1s (no proxy timeout risk)
  • SSE has 15s heartbeat + X-Accel-Buffering: no (proxies don't kill it)
  • Completion signal travels via SSE alongside streaming data

However, ACP HTTP is still a draft proposal with an incomplete implementation. It should NOT be treated as the migration target today. Instead, this issue proposes applying the same architectural pattern to the existing REST API surface, so both transports can run independently side by side.

Proposed Change

Apply the ACP-consistent non-blocking pattern to the existing REST API, without changing the URL surface:

1. Make POST /session/:id/prompt non-blocking

Client                              Daemon
  |                                   |
  |--- POST /prompt ----------------->|
  |<-- 202 { promptId } -------------|  ← immediate (< 1s)
  |                                   |
  |  (agent turn runs asynchronously) |

The endpoint validates the request, confirms the prompt is accepted, and returns immediately. Errors in prompt submission (invalid session, busy, malformed input) are still returned synchronously as 4xx.

2. Add turn_complete event to existing GET /session/:id/events SSE stream

Client (SSE)                        Daemon
  |                                   |
  | ... session_update events ...     |
  |<-- turn_complete { stopReason } --|  ← agent turn finished

All SSE subscribers (prompt sender + passive observers) receive this event, providing a single authoritative completion signal. This also eliminates the current 3-second inactivity heuristic that passive observers use as a workaround.

3. SDK backward compatibility

DaemonClient.prompt() retains its Promise<PromptResult> signature. Internally, it becomes: POST (fire) → await matching turn_complete event on SSE → resolve. Callers see no breaking change.

4. Coexistence with /acp

Both transports share the same Bridge instance and EventBus. The change is purely at the REST transport layer:

                    ┌─────────────────────────────┐
                    │       Bridge + EventBus      │
                    └──────┬──────────┬────────────┘
                           │          │
              ┌────────────▼──┐  ┌────▼────────────┐
              │  REST API     │  │  ACP HTTP (/acp) │
              │  /session/*   │  │  (RFD #721)      │
              │  (this issue) │  │  (already done)   │
              └───────────────┘  └──────────────────┘

No dependency between the two; either can be enabled/disabled independently.

Components Affected

Component Change
packages/cli/src/serve/server.ts /prompt route returns 202 immediately; on turn end, publishes turn_complete to EventBus
packages/acp-bridge/src/bridge.ts Emit turn_complete / turn_error event when sendPrompt promise settles
packages/sdk-typescript/src/daemon/events.ts Add turn_complete, turn_error event types
packages/sdk-typescript/src/daemon/DaemonClient.ts prompt() internally awaits SSE turn_complete event instead of HTTP response
packages/webui/ Remove 3s inactivity heuristic; use turn_complete event uniformly

Additional Evidence

The passive observer (multi-tab) scenario already reveals this gap. When a client subscribes to SSE without being the prompt sender, it has no reliable way to know when the turn ends. The current webui uses a 3-second inactivity heuristic (schedulePassiveAssistantDone) — a clear workaround for the missing completion signal.

Discussion Points

  • Should the non-blocking behavior be opt-in (header/query param) for backward compatibility during transition?
  • If SSE disconnects during a turn, should there be a GET /session/:id/prompt-status endpoint for recovery?
  • The local CLI qwen serve scenario has no proxy timeout issue — is there value in non-blocking there too (e.g., client disconnect tolerance)?

Prior Art

  • ACP Streamable HTTP (this repo, /acp) — already implements this exact pattern
  • OpenAI Assistants API — create run → stream events
  • GitHub Actions API — queue job → poll/webhook for result
  • Industry standard for any async job system (Celery, Temporal, etc.)

The current blocking design made sense when the daemon was local-only. As it increasingly serves remote/web clients through standard HTTP infrastructure, the blocking model becomes a liability.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions