Problem Summary
When an embedded agent fails to initialize (e.g., `deactivated_workspace` error), the corresponding session enters a zombie state:
- Subsequent messages are dispatched normally by the gateway
- Gateway logs: `dispatch complete (queuedFinal=false, replies=0)`
- The agent produces zero replies — users see complete silence
- Session is never cleaned up; only manual deletion from `sessions.json` restores service
Steps to Reproduce
- A group-chat session triggers embedded agent initialization
- The initialization fails with a `deactivated_workspace` error (e.g., due to workspace config issues)
- The session is left in a broken state: marked "initialized" but not actually running
- All subsequent messages dispatched to this session queue silently — `replies=0` forever
- Only manually deleting the session key from `sessions.json` and triggering a new session creation restores functionality
Current Workaround
Manually delete the session key from:
`/root/.openclaw/agents//sessions/sessions.json`
This forces a fresh session to be created on the next inbound message.
Root Cause Analysis
- The `deactivated_workspace` error does not trigger session cleanup
- The session state machine does not enter a proper error handling path when agent init fails
- The gateway dispatch layer considers the message "delivered" (since it reached the agent), but the agent never actually processes it
- No `abortedLastRun` flag is set, and no automatic recovery mechanism fires
Suggested Fixes
The session lifecycle should handle embedded agent init failures gracefully:
- Auto-mark on failure: When embedded agent init fails, set `abortedLastRun=true` on the session so the next dispatch can detect it and create a fresh session
- Session health check on dispatch: Before dispatching to an existing session, check if the previous run was aborted/failed and auto-recover rather than silently reusing a dead session
- Graceful degradation: If the session cannot be initialized after N attempts, surface an explicit error to the user instead of silent `replies=0`
- Auto-cleanup of zombie sessions: A background cleanup task that detects sessions with repeated `abortedLastRun=true` and removes them proactively
Environment
- OpenClaw version: latest (main branch)
- Channel: Feishu group chat
- Session type: `group` (embedded agent)
Labels: bug, session, recovery, embedded-agent
Problem Summary
When an embedded agent fails to initialize (e.g., `deactivated_workspace` error), the corresponding session enters a zombie state:
Steps to Reproduce
Current Workaround
Manually delete the session key from:
`/root/.openclaw/agents//sessions/sessions.json`
This forces a fresh session to be created on the next inbound message.
Root Cause Analysis
Suggested Fixes
The session lifecycle should handle embedded agent init failures gracefully:
Environment
Labels: bug, session, recovery, embedded-agent