Inbox messages stuck in PENDING when receiving agent is already idle

## Summary

Inbox messages can get stuck in `PENDING` indefinitely when the receiving agent is already idle at the time the message is posted. This affects all providers — Kiro CLI, Claude Code, etc. — because the issue is in the delivery architecture, not in any specific provider's status detection.

Provider: Kiro CLI (but could be provider-agnostic)

## Impact

* Agent-to-agent messaging silently fails — messages stay PENDING forever
* Multi-agent workflows stall waiting for callbacks that were sent but never delivered
* Requires manual intervention (resending the message) to unblock

## Reproduction

1. Start `cao-server` and a multi-agent session with 3+ agents
2. Agent A finishes work and goes idle (no more log output)
3. Agent B calls `send_message` to Agent A
4. Message stays PENDING — Agent A never receives it

This happens intermittently in long-running sessions (4-8 hours) with multiple concurrent agents. We observe it several times per session.

## Root Cause

The inbox has two delivery paths:

**Path 1 — Immediate delivery (on POST):** `POST /terminals/{id}/inbox/messages` calls `check_and_send_pending_messages(receiver_id)`, which calls `provider.get_status()`. If IDLE or COMPLETED, delivers immediately. **This is a single-shot attempt with no retry.** If `get_status()` returns a stale or incorrect status at that moment, delivery is skipped.

**Path 2 — PollingObserver:** Monitors `TERMINAL_LOG_DIR` for `.log` file changes every 5 seconds. On change → check pending → check idle → deliver. But if the agent is already idle and not producing output, the log file doesn't change, so the observer never fires again.

**The gap:** If Path 1 fails (stale status at the wrong moment) and the agent is already idle (Path 2 never triggers), the message is permanently orphaned. There is no fallback mechanism.

## Possible Directions

- A periodic background check for orphaned PENDING messages (similar to the existing `flow_daemon()` pattern)
- Retry logic on the immediate delivery path (e.g., a few attempts with short delays)
- A fallback poll triggered when a new message is queued but the watcher hasn't fired within N seconds

## Related Issues

- #104 — it seems to fix stale PROCESSING detection in Claude Code specifically (PR #106)
- PR #62 — added position-based status comparison to Kiro CLI / Q CLI

Both improve `get_status()` accuracy, but this issue is distinct: even with perfect status detection, the single-shot immediate delivery can miss due to timing, and there is no fallback when it does.

## Environment

* `cao-server` at commit `331e8d7` 
* macOS, Kiro CLI provider
* Observed across multiple multi-day sessions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inbox messages stuck in PENDING when receiving agent is already idle #131

Summary

Impact

Reproduction

Root Cause

Possible Directions

Related Issues

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Inbox messages stuck in PENDING when receiving agent is already idle #131

Description

Summary

Impact

Reproduction

Root Cause

Possible Directions

Related Issues

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions