Summary
The daemon's POST /session/:id/prompt endpoint currently holds the HTTP connection open until the entire agent turn completes (model inference + tool execution + multi-step agentic loop). This synchronous blocking design conflicts with common infrastructure timeout constraints and creates reliability issues in real-world deployments.
Current Design
Client Daemon
| |
|--- POST /prompt ----------------->|
| (connection held open) | ← model inference
| | ← tool execution
| | ← more model calls...
| | ← could take 2-10+ minutes
|<-- 200 { stopReason } -----------|
Meanwhile, real-time data (assistant text chunks, tool calls, tool output) is already delivered independently via the SSE GET /session/:id/events stream. The /prompt HTTP response only carries { stopReason } — effectively just a completion signal.
Problem
In HTTP-based deployments (web IDE, remote daemon access), the request passes through standard infrastructure layers (reverse proxies, ingress controllers, load balancers). These layers universally enforce ~60s proxy_read_timeout on regular HTTP requests — this is an industry-standard default, not a misconfiguration.
When an agent turn exceeds 60s:
- The intermediate proxy returns 504 Gateway Timeout to the client
- The daemon continues executing normally (unaware of the disconnection)
- The client loses the
stopReason completion signal
- There is no alternative way to learn that the turn has finished, because no
turn_complete event exists in the SSE protocol
SSE connections are exempt from this timeout (via X-Accel-Buffering: no, heartbeat frames, dedicated proxy config), but regular HTTP POST requests are not — and shouldn't need to be.
Design Issue
The /prompt endpoint conflates two distinct responsibilities:
- Trigger — "start processing this prompt" (validation, queueing)
- Await completion — "tell me when the turn is done and why it stopped"
Responsibility #2 is already better served by the SSE channel, which:
- Has built-in reconnection and heartbeat mechanisms
- Is already used for all intermediate state delivery
- Survives proxy timeouts by design
Reference: ACP Streamable HTTP already solves this
The ACP HTTP transport (/acp, PR #4472, RFD #721) has already adopted the non-blocking pattern:
POST /acp { session/prompt } → 202 (immediate, empty body)
GET /acp (session-scoped) ← SSE: session/update notifications
← SSE: { id, result: { stop_reason } } (completion)
This works because:
- POST takes <1s (no proxy timeout risk)
- SSE has 15s heartbeat +
X-Accel-Buffering: no (proxies don't kill it)
- Completion signal travels via SSE alongside streaming data
However, ACP HTTP is still a draft proposal with an incomplete implementation. It should NOT be treated as the migration target today. Instead, this issue proposes applying the same architectural pattern to the existing REST API surface, so both transports can run independently side by side.
Proposed Change
Apply the ACP-consistent non-blocking pattern to the existing REST API, without changing the URL surface:
1. Make POST /session/:id/prompt non-blocking
Client Daemon
| |
|--- POST /prompt ----------------->|
|<-- 202 { promptId } -------------| ← immediate (< 1s)
| |
| (agent turn runs asynchronously) |
The endpoint validates the request, confirms the prompt is accepted, and returns immediately. Errors in prompt submission (invalid session, busy, malformed input) are still returned synchronously as 4xx.
2. Add turn_complete event to existing GET /session/:id/events SSE stream
Client (SSE) Daemon
| |
| ... session_update events ... |
|<-- turn_complete { stopReason } --| ← agent turn finished
All SSE subscribers (prompt sender + passive observers) receive this event, providing a single authoritative completion signal. This also eliminates the current 3-second inactivity heuristic that passive observers use as a workaround.
3. SDK backward compatibility
DaemonClient.prompt() retains its Promise<PromptResult> signature. Internally, it becomes: POST (fire) → await matching turn_complete event on SSE → resolve. Callers see no breaking change.
4. Coexistence with /acp
Both transports share the same Bridge instance and EventBus. The change is purely at the REST transport layer:
┌─────────────────────────────┐
│ Bridge + EventBus │
└──────┬──────────┬────────────┘
│ │
┌────────────▼──┐ ┌────▼────────────┐
│ REST API │ │ ACP HTTP (/acp) │
│ /session/* │ │ (RFD #721) │
│ (this issue) │ │ (already done) │
└───────────────┘ └──────────────────┘
No dependency between the two; either can be enabled/disabled independently.
Components Affected
| Component |
Change |
packages/cli/src/serve/server.ts |
/prompt route returns 202 immediately; on turn end, publishes turn_complete to EventBus |
packages/acp-bridge/src/bridge.ts |
Emit turn_complete / turn_error event when sendPrompt promise settles |
packages/sdk-typescript/src/daemon/events.ts |
Add turn_complete, turn_error event types |
packages/sdk-typescript/src/daemon/DaemonClient.ts |
prompt() internally awaits SSE turn_complete event instead of HTTP response |
packages/webui/ |
Remove 3s inactivity heuristic; use turn_complete event uniformly |
Additional Evidence
The passive observer (multi-tab) scenario already reveals this gap. When a client subscribes to SSE without being the prompt sender, it has no reliable way to know when the turn ends. The current webui uses a 3-second inactivity heuristic (schedulePassiveAssistantDone) — a clear workaround for the missing completion signal.
Discussion Points
- Should the non-blocking behavior be opt-in (header/query param) for backward compatibility during transition?
- If SSE disconnects during a turn, should there be a
GET /session/:id/prompt-status endpoint for recovery?
- The local CLI
qwen serve scenario has no proxy timeout issue — is there value in non-blocking there too (e.g., client disconnect tolerance)?
Prior Art
- ACP Streamable HTTP (this repo,
/acp) — already implements this exact pattern
- OpenAI Assistants API — create run → stream events
- GitHub Actions API — queue job → poll/webhook for result
- Industry standard for any async job system (Celery, Temporal, etc.)
The current blocking design made sense when the daemon was local-only. As it increasingly serves remote/web clients through standard HTTP infrastructure, the blocking model becomes a liability.
Summary
The daemon's
POST /session/:id/promptendpoint currently holds the HTTP connection open until the entire agent turn completes (model inference + tool execution + multi-step agentic loop). This synchronous blocking design conflicts with common infrastructure timeout constraints and creates reliability issues in real-world deployments.Current Design
Meanwhile, real-time data (assistant text chunks, tool calls, tool output) is already delivered independently via the SSE
GET /session/:id/eventsstream. The/promptHTTP response only carries{ stopReason }— effectively just a completion signal.Problem
In HTTP-based deployments (web IDE, remote daemon access), the request passes through standard infrastructure layers (reverse proxies, ingress controllers, load balancers). These layers universally enforce ~60s
proxy_read_timeouton regular HTTP requests — this is an industry-standard default, not a misconfiguration.When an agent turn exceeds 60s:
stopReasoncompletion signalturn_completeevent exists in the SSE protocolSSE connections are exempt from this timeout (via
X-Accel-Buffering: no, heartbeat frames, dedicated proxy config), but regular HTTP POST requests are not — and shouldn't need to be.Design Issue
The
/promptendpoint conflates two distinct responsibilities:Responsibility #2 is already better served by the SSE channel, which:
Reference: ACP Streamable HTTP already solves this
The ACP HTTP transport (
/acp, PR #4472, RFD #721) has already adopted the non-blocking pattern:This works because:
X-Accel-Buffering: no(proxies don't kill it)However, ACP HTTP is still a draft proposal with an incomplete implementation. It should NOT be treated as the migration target today. Instead, this issue proposes applying the same architectural pattern to the existing REST API surface, so both transports can run independently side by side.
Proposed Change
Apply the ACP-consistent non-blocking pattern to the existing REST API, without changing the URL surface:
1. Make
POST /session/:id/promptnon-blockingThe endpoint validates the request, confirms the prompt is accepted, and returns immediately. Errors in prompt submission (invalid session, busy, malformed input) are still returned synchronously as 4xx.
2. Add
turn_completeevent to existingGET /session/:id/eventsSSE streamAll SSE subscribers (prompt sender + passive observers) receive this event, providing a single authoritative completion signal. This also eliminates the current 3-second inactivity heuristic that passive observers use as a workaround.
3. SDK backward compatibility
DaemonClient.prompt()retains itsPromise<PromptResult>signature. Internally, it becomes: POST (fire) → await matchingturn_completeevent on SSE → resolve. Callers see no breaking change.4. Coexistence with
/acpBoth transports share the same
Bridgeinstance andEventBus. The change is purely at the REST transport layer:No dependency between the two; either can be enabled/disabled independently.
Components Affected
packages/cli/src/serve/server.ts/promptroute returns 202 immediately; on turn end, publishesturn_completeto EventBuspackages/acp-bridge/src/bridge.tsturn_complete/turn_errorevent whensendPromptpromise settlespackages/sdk-typescript/src/daemon/events.tsturn_complete,turn_errorevent typespackages/sdk-typescript/src/daemon/DaemonClient.tsprompt()internally awaits SSEturn_completeevent instead of HTTP responsepackages/webui/turn_completeevent uniformlyAdditional Evidence
The passive observer (multi-tab) scenario already reveals this gap. When a client subscribes to SSE without being the prompt sender, it has no reliable way to know when the turn ends. The current webui uses a 3-second inactivity heuristic (
schedulePassiveAssistantDone) — a clear workaround for the missing completion signal.Discussion Points
GET /session/:id/prompt-statusendpoint for recovery?qwen servescenario has no proxy timeout issue — is there value in non-blocking there too (e.g., client disconnect tolerance)?Prior Art
/acp) — already implements this exact patternThe current blocking design made sense when the daemon was local-only. As it increasingly serves remote/web clients through standard HTTP infrastructure, the blocking model becomes a liability.