Skip to content

Session status field values ("failed", "timeout", "done") mislead agents into spawning duplicate sessions #64103

@lykeion-dev

Description

@lykeion-dev

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

The session status field uses values like "failed", "timeout", and "done" that only reflect the last communication turn's state, but these names strongly imply the session or task has permanently ended. This causes orchestrator agents to spawn duplicate sessions instead of resuming existing ones via sessions_send.

Steps to reproduce

  1. Configure an orchestrator agent (main) with sub-agents.
  2. Spawn a sub-agent session via sessions_spawn for a long-running task.
  3. The sub-agent session encounters an error or timeout on its last API call.
  4. Observe the session's status field becomes "failed" or "timeout".
  5. The orchestrator agent sees this status and incorrectly concludes the session is dead.
  6. The orchestrator spawns a NEW session for the same task instead of using sessions_send to resume the existing one.
  7. In observed incidents, the same task had 2-4 concurrent sessions, all burning tokens on paid models.

Expected behavior

Session status values should clearly communicate they are communication/turn states, not session lifecycle states. A session with status: "failed" or status: "timeout" should still be resumable via sessions_send, and the field naming or documentation should make this obvious to both human operators and AI agents.

Actual behavior

Orchestrator agents see "failed" or "timeout" and spawn duplicate sessions for the same task. In one observed incident, a single task had 4 concurrent sessions (3 unnecessary), wasting paid-model tokens. The original session's accumulated context (file reads, analysis, partial work) is lost when a new session starts from scratch. The cascade can compound: each new session may also "fail", triggering yet another spawn.

OpenClaw version

current (2026.4)

Operating system

Ubuntu 22.04

Install method

npm global

Model

various (openrouter/z-ai/glm-5.1, dashscope/qwen3.6-plus, dashscope/deepseek-v3.2)

Provider / routing chain

openclaw -> openrouter -> various models

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Affected: All users running orchestrator/agentic agents that manage sub-agent sessions
Severity: High (causes token waste on paid models, session sprawl, context loss)
Frequency: Always when a sub-agent session's last turn results in an error or timeout
Consequence: Duplicate sessions burning 2-4x the necessary tokens; lost context from abandoned sessions; session management complexity that compounds over time

Additional information

Proposed Solutions

Option A: Rename status values (breaking change)

Rename to clearly communicate they are communication states, not task states:

  • "failed" → "interrupted"
  • "timeout" → "waiting_for_response"
  • "done" → "agent_responded"
  • "killed" → "halted"

Option B: Add a separate field (non-breaking)

Keep status for backward compatibility but add:

  • resumable: true/false — Whether the session can still receive sessions_send
  • taskStatus: "in_progress" | "completed" | "unknown" — The actual task state (only set by explicit declaration)

Option C: Add explicit documentation in API response

Include a resumable: true field and a note field on every session:

{
  "status": "failed",
  "resumable": true,
  "note": "status reflects last turn state only; session is alive and can receive sessions_send"
}

### Option D: Change the default agent system prompt
Add explicit guidance in the default system prompt:
> "A session's `status` field only indicates the last turn's communication state. `failed`, `timeout`, and `done` do NOT mean the session is unusable. Consider using `sessions_send` to resume an existing session before considering a new `sessions_spawn`."

### Recommendation
Option A is the cleanest long-term fix

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingbug:behaviorIncorrect behavior without a crash

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions