Slack long-running turns have weak progress UX and can emit misleading no-reply fallbacks

## Problem

Slack users sometimes see:

> :warning: I didn't manage to produce a reply. Please try rephrasing or sending your message again.

...even when Netclaw actually did produce useful output, or when the turn failed for a more specific reason.

This is not one bug. It appears to be a family of overlapping failure modes that all collapse into the same bad Slack-facing symptom: the user sees either no reply, or a misleading fallback warning.

## Why this matters

This is one of the worst UX failures in the product right now:

- the user loses confidence in whether Netclaw is working
- useful replies can be followed by a contradictory warning
- long-running Slack turns provide poor feedback
- different underlying problems are obscured by the same generic fallback message

## Structural problem

We currently have multiple independent layers that influence whether Slack shows a reply:

1. **Session actor turn completion**
   - decides whether the turn succeeded, failed, or hit a limit
   - may emit streamed text, final text, files, or errors

2. **Slack thread binding actor**
   - buffers streamed text
   - posts final text/files to Slack
   - decides whether to post a fallback warning if it thinks nothing visible was sent

3. **Persistence / recovery**
   - restored sessions can replay old journals/snapshots
   - deserialization or schema drift can cause thread/session behavior to diverge after restart

4. **Model behavior**
   - may emit:
     - text only
     - text + tool calls
     - tool calls only
     - empty completions
     - repeated tool calls after tools are disabled

5. **Progress communication**
   - there is no consistent contract for:
     - when to send a plan preamble
     - when to send progress updates
     - how to distinguish useful progress from noise

These layers are not consistently modeling the same concept of:

- "did the user get a visible reply?"
- "did the turn complete successfully?"
- "did we fail, or did we degrade?"

That mismatch is the root structural issue.

## Known failure modes

### 1. Duplicate fallback after real streamed/buffered text
Slack can post the real reply and then still post the generic fallback warning because the adapter does not always mark the turn as "visible output already posted" after flushing buffered streamed text on `TurnCompleted`.

**Symptom**
- real answer appears
- then `I didn't manage to produce a reply` appears after it

### 2. Tool-limit turns collapsing into generic failure
A turn can do substantial useful tool work, hit the tool limit, and still end up presenting the generic fallback warning instead of a specific degraded completion.

**Symptom**
- lots of searching/fetching succeeds
- user still gets generic warning or poor fallback text

### 3. Mixed text + tool-call responses not surfaced early enough
The model may emit a short visible preamble together with tool calls, but if that text is not surfaced immediately, the user sees silence until the turn finishes.

**Symptom**
- long-running research turn appears dead
- user gets no early indication of what Netclaw is doing

### 4. Recovery/replay incompatibilities on old sessions
Older persisted sessions can fail during replay because serialized event types no longer exist or have moved.

Example seen in logs:
- `BufferedInputAccepted` deserialization failure during replay

**Symptom**
- specific old Slack threads behave unpredictably after restart
- thread binding/session behavior may not match fresh sessions

### 5. Long-running turns provide poor progress visibility
For multi-step or research-heavy tasks, Netclaw often gives the user too little feedback while work is in progress.

Two missing behaviors stand out:

1. **No plan preamble before large jobs**
   - when the model is about to do substantial multi-tool work, it does not reliably tell the user what it plans to do first
   - users see silence or generic fallback behavior instead of a short, concrete statement of intent

2. **No meaningful progress updates during long-running jobs**
   - when a turn takes a long time, users do not reliably get milestone-style updates
   - current behavior is either silence or low-value generic messaging
   - users cannot tell whether Netclaw is actively making progress, stuck, or done

**Symptom**
- user asks for a complex task
- Slack thread sits quiet for too long, or only shows a generic placeholder
- user gets little confidence that the agent is pursuing a coherent plan

**Expected behavior**
- before substantial tool work, Netclaw should send a short user-facing preamble such as:
  - what it is going to check
  - what kind of result it is trying to produce
- for longer-running jobs, Netclaw should emit bounded, meaningful progress updates at milestones
- these should be task-focused, not internal/tool-dump chatter

## Example observed sessions

- `D0AC6CKBK5K/1773789661.799409`
  - real reply posted
  - generic fallback warning also posted
- `D0AC6CKBK5K/1773788262.773759`
  - long tool-heavy work
  - tool-limit / force-no-tools path triggered
- reminder-created and older recovered threads also showed inconsistent reply behavior

## Expected behavior

- If visible text or files were posted to Slack for the turn, **never** post the generic no-reply fallback.
- If the turn hit a known bounded limit, post a **specific degraded completion**, not a generic provider-failure warning.
- If the model emits short user-facing text before tool work, surface it promptly.
- Before substantial multi-tool work, Netclaw should send a short task-focused preamble.
- For longer-running jobs, Netclaw should emit bounded milestone updates without narrating every tool call.
- Recovery of old sessions should either:
  - succeed cleanly, or
  - fail in a contained, diagnosable way without producing misleading Slack behavior.

## Actual behavior

Different underlying causes can all produce the same Slack symptom:
- no reply
- misleading generic warning
- warning after a real answer
- confusing silence during long-running tool work

## Proposed direction

### A. Unify "visible output posted" semantics
Define one authoritative turn-level notion of whether user-visible Slack output has already been delivered.

### B. Reserve generic fallback for true empty-turn failures only
Only use `I didn't manage to produce a reply` when:
- no text was posted
- no files were uploaded
- no deterministic degraded completion was emitted

### C. Treat bounded failures as degraded completions
Examples:
- tool-limit reached
- repeated empty post-tool finalization
- force-no-tools violation after sufficient prior work

These should produce specific visible completions, not generic fallback warnings.

### D. Add regression coverage for each failure bucket
Need targeted tests for:
- streamed buffered reply + turn complete -> no fallback warning
- mixed text + tool calls -> preamble visible early
- tool-limit path -> specific degraded text, no generic warning
- replay of legacy persisted sessions -> either compatible or explicitly handled

### E. Investigate legacy persistence compatibility
Audit session event schema drift and add a migration/compatibility plan for older persisted sessions.

### F. Add a bounded progress-communication contract
Need a clear policy for long-running turns:

- before substantial multi-tool work, emit a short plan preamble
- during long-running work, emit bounded milestone updates
- do not narrate every tool call
- progress messages should be user-facing and task-oriented, not internal process dumps

## Acceptance criteria

- No Slack thread posts the generic no-reply warning after a real text reply in the same turn
- Tool-limit turns produce a specific degraded completion instead of generic no-reply
- Long-running Slack turns emit a short task-focused preamble before substantial tool work
- Long-running turns can emit bounded milestone updates without spamming the thread
- Progress messages describe intent and progress, not internal tool chatter
- Legacy session replay failures are either fixed or isolated/documented with clear handling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slack long-running turns have weak progress UX and can emit misleading no-reply fallbacks #277

Problem

Why this matters

Structural problem

Known failure modes

1. Duplicate fallback after real streamed/buffered text

2. Tool-limit turns collapsing into generic failure

3. Mixed text + tool-call responses not surfaced early enough

4. Recovery/replay incompatibilities on old sessions

5. Long-running turns provide poor progress visibility

Example observed sessions

Expected behavior

Actual behavior

Proposed direction

A. Unify "visible output posted" semantics

B. Reserve generic fallback for true empty-turn failures only

C. Treat bounded failures as degraded completions

D. Add regression coverage for each failure bucket

E. Investigate legacy persistence compatibility

F. Add a bounded progress-communication contract

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Slack long-running turns have weak progress UX and can emit misleading no-reply fallbacks #277

Description

Problem

Why this matters

Structural problem

Known failure modes

1. Duplicate fallback after real streamed/buffered text

2. Tool-limit turns collapsing into generic failure

3. Mixed text + tool-call responses not surfaced early enough

4. Recovery/replay incompatibilities on old sessions

5. Long-running turns provide poor progress visibility

Example observed sessions

Expected behavior

Actual behavior

Proposed direction

A. Unify "visible output posted" semantics

B. Reserve generic fallback for true empty-turn failures only

C. Treat bounded failures as degraded completions

D. Add regression coverage for each failure bucket

E. Investigate legacy persistence compatibility

F. Add a bounded progress-communication contract

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions