Assistant phase 2a: SSE streaming on POST /api/assistant/messages

Phase 1 (#139) lands a sync chat completion. Replies block until the full LLM response arrives — typically 3–10s. SSE streaming would let tokens land incrementally so the user sees a typing animation backed by real content.

## Scope
- Add `LlmClient.generateStream(prompt, options)` returning `AsyncIterable<string>` chunks. Anthropic and OpenAI both support SSE on their completion endpoints.
- Refactor `AssistantService.reply()` to optionally return an async iterable instead of a string (or add a sibling `replyStream()`).
- Refactor `POST /api/assistant/messages` to upgrade to SSE when the client requests it (Accept: text/event-stream). Backward-compatible: existing JSON path still works.
- Web client: detect SSE, render tokens as they arrive, swap the typing-dots bubble for the streaming text bubble.
- Persist the assistant message AFTER the stream closes (use the accumulated content).

## Out of scope
- Provider-side cancellation when the user closes the page mid-stream (nice but adds complexity).
- Tool-use streaming (phase 2c).

## Why
Single biggest UX improvement available for the assistant. Users have low tolerance for waiting on chat replies after the ChatGPT/Claude UX baseline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assistant phase 2a: SSE streaming on POST /api/assistant/messages #146

Scope

Out of scope

Why

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Assistant phase 2a: SSE streaming on POST /api/assistant/messages #146

Description

Scope

Out of scope

Why

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions