Route inbound approvals by SessionId so passivated/cold gateway trees don't drop the response

## Summary

Sibling to #939. Proposes a structurally different fix for the same symptom — silently-dropped approval responses — that targets the proximate cause observed in production: **cascading passivation** of the channel-adapter actor tree, where the per-thread binding correctly defers its own passivation but the channel-level parent passivates anyway and takes the child with it.

The proposed change makes inbound routing symmetric with outbound routing: both addressed by `SessionId`, both tolerant of any subset of the gateway tree being cold.

## Observed incident

Real session reproduced from production daemon logs. The session is parked on a pending approval (`shell_execute` for a `git commit`) and is still alive 8h later.

Trigger sequence (IDs redacted):

```
T+0      slack-gateway/<channelId>/<threadTs>
         "Slack thread idle but 1 approval(s) are pending; deferring passivation"
         ← per-thread child correctly defers ITS OWN timer

T+1.1s   slack-gateway/<channelId>
         "Slack conversation idle for 2 hours, passivating"
         ← CHANNEL-LEVEL parent passivates, no awareness of child's pending state

T+1.1s   session-manager/.../<threadTs>
         "[...slack-gateway/.../<threadTs>/StreamSupervisor-N/
            slack-thread-0-0-actorRefSource] left"

T+1.1s   Dead letter to slack-gateway/<channelId>/<threadTs>
         AbruptTerminationException
```

For the next ~5h40m the session ticks `session_observer_distill_skipped` every 90s, oblivious that its slack output binding is gone. When the user finally clicks Approve in Slack:

```
T+5h40m  slack-gateway: "Routing Slack approval response for ... <callId>"
T+5h40m  slack-gateway/<channelId>: "Ignoring Slack approval response for missing thread <threadTs>"
```

Slack delivered the click correctly. Our re-spawned channel-level gateway had no in-memory record of the pending approval and dropped it.

## Why this isn't fully covered by #939

- #939's fix path is "persist pending approvals, resurrect binding actors on demand, re-post UI on recovery." That works but bundles the routing fix with persistence work.
- The proximate cause here is purely a cold-actor-tree problem within a single still-running daemon. The session actor is alive and addressable at a deterministic path the entire time. No persistence is required to deliver the response; only routing is broken.
- #939 also doesn't name the cascading-passivation bug: per-thread `Slack thread idle but N approval(s) are pending; deferring passivation` only blocks the child's own timer, not the channel-level parent's idle timer.

## Architectural asymmetry being fixed

| Direction | Today |
|-----------|-------|
| Outbound (LLM → user) | Session actor holds the destination address; Akka.NET lazily re-spawns whatever's needed at that path. Survives passivation. |
| Inbound (user → LLM) | Channel-level gateway consults an in-memory `Dictionary<callId, perThreadChild>`. Dictionary lost on passivation → response dropped. |

The Slack `block_actions` payload + our existing button-value codec already carry everything needed to route inbound by `SessionId`:

- `channel.id` → channel
- `message.thread_ts` → thread
- `ApprovalButtonValueCodec` decode → `callId`, `optionKey`, `requesterSenderId`
- `user.id` → approver senderId (for `CanApprove`)
- `message.ts` and `response_url` → for the redraw

That tuple deterministically resolves to `session-manager/{persistenceId}` + the specific pending call. No gateway-internal lookup is required.

(Note: `ApprovalButtonValueCodec.MaxEncodedLength = 100` is correct as-is — Discord's `custom_id` cap is 100 — so the decode side must continue tolerating prefix-match against the pending-call set in scope. Under this proposal the match is scoped to a single session, so prefix-match is trivially safe.)

## Proposed change

Move the routing boundary from the channel-level gateway to the session actor. Symmetric for Slack and Discord:

1. **Slack ingress** (`SlackConversationActor`): on a `block_actions` payload with an approval `action_id`, decode the button value, build a self-contained `ApprovalResponseReceived(sessionId, callId, optionKey, approvingSenderId, channel, messageTs, responseUrl)` and `Tell` `session-manager/{persistenceId}` directly. Stop consulting the per-thread child for routing.
2. **Discord ingress** (`DiscordConversationActor`): same shape, decode `custom_id`, build the same self-contained command, `Tell` the session.
3. **Session actor**: on `ApprovalResponseReceived`, run `CanApprove(requesterPrincipal, requesterSenderId, approvingSenderId)` (it already owns this state), resolve the pending tool call, then send a self-contained `RenderResolvedApproval(channel, messageTs, request, selectedKey, senderId)` to the slack/discord output binding. Lazy spawn is fine — the binding doesn't need any prior in-memory state for the redraw because `BuildResolvedApprovalBlocks(...)` is pure.
4. **Per-thread bindings**: drop `_pendingApprovalRequests` as a routing dependency. It can stay as a hint for "should I defer my own passivation?" but is no longer load-bearing for delivery.

## Near-term mitigation (cheap, narrow)

If we want to reduce blast radius before the architectural change lands: have the channel-level `slack-gateway/{channelId}` (and Discord equivalent) consult its children's pending-approval state before passivating itself. Closes the specific cascading bug observed here without touching routing. Doesn't help cold-restart cases.

## Relationship to #939

Complementary, not redundant.

- This issue: **passivation** case, no persistence needed, makes routing tolerant of cold gateway tree within a running daemon.
- #939: **full daemon restart** case, requires persisted pending-approval state and recovery-time UI re-post.

Land this first → #939's Phase 2 ("resurrect binding on missing child") becomes unnecessary, and #939 narrows to its actual contribution: persistence of the `TaskCompletionSource`-equivalent state across process restart. The two together produce the full correctness guarantee.

## Affected files (initial scan)

| File | Change |
|------|--------|
| `src/Netclaw.Channels.Slack/SlackConversationActor.cs` | Decode + route by SessionId; remove per-thread-child routing dependency for approvals |
| `src/Netclaw.Channels.Slack/SlackThreadBindingActor.cs` | Drop `_pendingApprovalRequests` as routing dependency; keep as passivation-deferral hint |
| `src/Netclaw.Channels.Discord/DiscordConversationActor.cs` | Symmetric to Slack |
| `src/Netclaw.Channels.Discord/DiscordSessionBindingActor.cs` | Symmetric to Slack |
| `src/Netclaw.Actors/Sessions/LlmSessionActor.cs` | Handle `ApprovalResponseReceived`; emit `RenderResolvedApproval` to output binding |
| `src/Netclaw.Actors/Protocol/*` | New self-contained messages for inbound approval response and resolved-render command |

## Severity

Same as #939: production correctness. Sessions get permanently wedged with no UI signal that the click failed. Passivation happens routinely (channel-level 2h idle), so this fires far more often than full daemon restart.

File	Change
`src/Netclaw.Channels.Slack/SlackConversationActor.cs`	Decode + route by SessionId; remove per-thread-child routing dependency for approvals
`src/Netclaw.Channels.Slack/SlackThreadBindingActor.cs`	Drop `_pendingApprovalRequests` as routing dependency; keep as passivation-deferral hint
`src/Netclaw.Channels.Discord/DiscordConversationActor.cs`	Symmetric to Slack
`src/Netclaw.Channels.Discord/DiscordSessionBindingActor.cs`	Symmetric to Slack
`src/Netclaw.Actors/Sessions/LlmSessionActor.cs`	Handle `ApprovalResponseReceived`; emit `RenderResolvedApproval` to output binding
`src/Netclaw.Actors/Protocol/*`	New self-contained messages for inbound approval response and resolved-render command

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Route inbound approvals by SessionId so passivated/cold gateway trees don't drop the response #979

Summary

Observed incident

Why this isn't fully covered by #939

Architectural asymmetry being fixed

Proposed change

Near-term mitigation (cheap, narrow)

Relationship to #939

Affected files (initial scan)

Severity

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Direction	Today
Outbound (LLM → user)	Session actor holds the destination address; Akka.NET lazily re-spawns whatever's needed at that path. Survives passivation.
Inbound (user → LLM)	Channel-level gateway consults an in-memory `Dictionary<callId, perThreadChild>`. Dictionary lost on passivation → response dropped.

Route inbound approvals by SessionId so passivated/cold gateway trees don't drop the response #979

Description

Summary

Observed incident

Why this isn't fully covered by #939

Architectural asymmetry being fixed

Proposed change

Near-term mitigation (cheap, narrow)

Relationship to #939

Affected files (initial scan)

Severity

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions