Skip to content

Assistant phase 2a: SSE streaming on POST /api/assistant/messages #146

@jayzalowitz

Description

@jayzalowitz

Phase 1 (#139) lands a sync chat completion. Replies block until the full LLM response arrives — typically 3–10s. SSE streaming would let tokens land incrementally so the user sees a typing animation backed by real content.

Scope

  • Add LlmClient.generateStream(prompt, options) returning AsyncIterable<string> chunks. Anthropic and OpenAI both support SSE on their completion endpoints.
  • Refactor AssistantService.reply() to optionally return an async iterable instead of a string (or add a sibling replyStream()).
  • Refactor POST /api/assistant/messages to upgrade to SSE when the client requests it (Accept: text/event-stream). Backward-compatible: existing JSON path still works.
  • Web client: detect SSE, render tokens as they arrive, swap the typing-dots bubble for the streaming text bubble.
  • Persist the assistant message AFTER the stream closes (use the accumulated content).

Out of scope

  • Provider-side cancellation when the user closes the page mid-stream (nice but adds complexity).
  • Tool-use streaming (phase 2c).

Why

Single biggest UX improvement available for the assistant. Users have low tolerance for waiting on chat replies after the ChatGPT/Claude UX baseline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions