Skip to content

[Feature]: Per-session activity state (busy/idle + awaiting_user, awaiting_subagent) via gateway API + WS statechange #39127

@sab-xan

Description

@sab-xan

Summary

Add a gateway-owned, per-session activity system (sessions.activity.get/list plus WS statechange events) so any client can reliably show session running state with elapsed time and explicit waiting states (awaiting user, awaiting subagent), without polling and without guesswork.

Problem to solve

OpenClaw currently has no gateway-level, session-scoped way to determine whether a specific session is actively executing work, how long it has been executing, or whether progress is blocked waiting for something. The TUI can display a running indicator, but that is client-local. Other clients (Control UI, CLI, external tooling) must poll or guess. Channel-account status (channels.status) is not acceptable because it is not session-scoped and it does not cover runs initiated via chat surfaces such as Control UI chat.send.

This causes operator uncertainty and wasted time in multi-session and multi-agent workflows, especially when a session is blocked on user input or child agent completion.

Proposed solution

Implement a robust, gateway-owned per-session state model with:

A) Runtime activity (objective)
Track in-flight runs per sessionKey:

  • busy: boolean derived from activeRuns
  • activeRuns: integer, non-negative
  • busySince: ms epoch timestamp of the oldest active run, null when idle
  • lastRunActivityAt: ms epoch timestamp updated during progress
  • staleRuns: integer, non-negative (required for robustness so sessions cannot remain busy forever)

B) Attention state (semantic blocker, explicit)
Represent “work not fulfilled because blocked” even when no run is executing.
Fields:

  • attention.state: one of none, awaiting_user, awaiting_subagent, awaiting_approval, blocked, paused
  • attention.since: ms epoch or null
  • attention.note: optional short reason string, no sensitive content
  • attention.blockedOn: optional list of dependencies that are actually blocking, for example:
    • kind: subagent
    • childSessionKey
    • childRunId optional
    • label optional
    • startedAt optional

Hard requirement:

  • Attention state must not be inferred from message text or punctuation. It must be explicitly set and cleared by runtime or deterministic gateway rules.

C) Spawn relationships (not always blocking)
Spawning a child session does not imply the parent is blocked. Add a separate, bounded rollup to show delegations even when not blocking:
children.total, children.active, children.failed
children.recent list capped to a safe size, each entry includes childSessionKey and basic status

D) Dedicated RPC surface (no sessions.list extension)
Add new RPC methods:

  1. sessions.activity.get
    Request:
  • sessionKey
    Response:
  • sessionKey
  • version, monotonic per session activity state
  • runtimeActivity object
  • attention object
  • children rollup (optional but recommended)
  1. sessions.activity.list
    Request:
  • activeOnly optional
  • agentId optional
  • limit optional
    Response:
  • ts
  • sessions array of sessions.activity.get-style entries

E) WebSocket statechange (no polling)
Over the existing gateway WebSocket:

  • emit an event whenever session activity changes
  • event includes sessionKey and monotonic version
  • emit on activeRuns transitions, attention transitions, staleRuns transitions
  • allow clients to resync after reconnect by calling sessions.activity.list

F) Cross-agent children
Child sessions can live under different agentIds. Relationships must be keyed by sessionKey, and sessions.activity must be able to resolve activity globally, not only within a single agent store.

Robustness and failure modes (required)

  • activeRuns must be derived from a runId set, not naive increment and decrement counters
  • stale-run watchdog or TTL is required to prevent “busy forever” when terminal events are missed
  • WS ordering and resync: include per-session monotonic version and define reconnect resync behavior
  • define behavior for runs without a sessionKey association (exclude or bucket under unknown), and test it
  • persistence rules:
    • runtimeActivity may be ephemeral across gateway restart if documented
    • attention state should be durable or deterministically reconstructed from explicit runtime signals, otherwise blockers disappear and monitoring becomes guesswork again

Security and access control

  • Expose only counters, timestamps, and small status strings. No message content.
  • Authorization should follow existing sessions read access rules.

Test plan

  • Start an in-flight run for a known sessionKey and verify sessions.activity.get shows busy true, activeRuns non-zero, busySince set
  • Verify it flips to idle on completion
  • Verify WS events fire on transitions and version ordering is correct
  • Simulate missing terminal event and verify stale handling prevents permanent busy
  • Verify attention state explicit set and clear semantics, including awaiting_subagent and blockedOn list updates
  • Verify cross-agent child relationships work (parent and child in different agentIds)

Definition of done

  • sessions.activity.get and sessions.activity.list implemented and documented
  • WS statechange events implemented with monotonic version and reconnect resync path
  • stale-run handling prevents permanent busy and is covered by tests
  • explicit attention states implemented, including awaiting_subagent, and covered by tests
  • children rollup exists and is bounded, and covered by tests
  • no reliance on channel-level approximations or message-text heuristics

Alternatives considered

  • Polling existing endpoints: slow, wasteful, still ambiguous
  • channels.status activeRuns: wrong scope and misses chat-surface runs
  • typing indicators or emoji reactions: helpful UX but not a session-level API

Impact

  • Affected: operators and users running multi-session and multi-agent workflows, and anyone using multiple clients (TUI plus Control UI plus messaging channels).
  • Severity: blocks reliable monitoring and troubleshooting, causes uncertainty and duplicated prompts.
  • Frequency: frequent during active use, constant for multi-agent workflows.
  • Consequence: wasted time, unclear progress, and inability to verify work fulfillment without additional probing.

Evidence/examples

  • TUI shows a running indicator, but other clients cannot.
  • Parent session can be idle while child session is running; without awaiting_subagent and children rollup, progress is not visible.

GUI representation could look like this:
Image

Additional information

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions