Summary
In multi-channel + multi-session production use, we repeatedly hit session-lane backpressure and duplicate sub-task dispatches (especially when long-running tasks and sessions_send/process poll overlap). This causes:
- "ACK visible, body delayed/missing" behavior in chat channels
- cascading timeouts on
sessions_send/tool calls
- repeated task launches that look like "infinite retry"
- lock conflicts in downstream scripts due to duplicated dispatch
There are adjacent issues (#14214, #13682, #17569, #7108, #16583), but we still lack a first-class global task idempotency/concurrency guard at platform level.
Current behavior
- A channel session starts a long task (or nested tool loop).
- User sends follow-up messages and/or automation sends additional commands.
- Session lane queue grows; later messages wait behind active runs.
- New runs can still be spawned for logically same task (no built-in task key guard).
- Downstream task runners observe lock conflicts / duplicate work.
Why this is a core platform gap
This is not business-script specific: any non-trivial orchestration with long tasks + multiple message sources can reproduce it.
Proposal
Add a built-in Task Guard abstraction in OpenClaw runtime/tools layer:
taskKey convention: <sessionKey>::<taskName> (or explicit task namespace)
- Atomic
acquire(taskKey) / release(taskKey, status)
- Standard conflict response:
already_running + current runId/startedAt
- Failed-task retry policy: require explicit
approve-retry before next acquire
- Optional TTL + stale-lock healing
- Surface guard state in session/task diagnostics (
queue_status-like visibility)
Suggested integration points
sessions_spawn
- high-risk tool pipelines (long exec/poll loops)
- optional guard hook in auto-reply dispatch path before launching taskful flows
Acceptance criteria
- Concurrent same-key launches produce exactly one running task.
- Duplicate triggers return deterministic
already_running response (no new run).
- Failed tasks cannot silently auto-retry without explicit approval.
- Queue pressure scenarios no longer amplify duplicate task creation.
- Observability: operators can inspect active guards and stale guards.
Reproduction hints
- Multi-channel Discord setup
- Start long-running task in one channel session
- Send repeated trigger messages/commands for same logical task while first run is active
- Observe current behavior: queue delay + duplicate launches + lock contention
Summary
In multi-channel + multi-session production use, we repeatedly hit session-lane backpressure and duplicate sub-task dispatches (especially when long-running tasks and
sessions_send/process polloverlap). This causes:sessions_send/tool callsThere are adjacent issues (#14214, #13682, #17569, #7108, #16583), but we still lack a first-class global task idempotency/concurrency guard at platform level.
Current behavior
Why this is a core platform gap
This is not business-script specific: any non-trivial orchestration with long tasks + multiple message sources can reproduce it.
Proposal
Add a built-in Task Guard abstraction in OpenClaw runtime/tools layer:
taskKeyconvention:<sessionKey>::<taskName>(or explicit task namespace)acquire(taskKey)/release(taskKey, status)already_running+ currentrunId/startedAtapprove-retrybefore next acquirequeue_status-like visibility)Suggested integration points
sessions_spawnAcceptance criteria
already_runningresponse (no new run).Reproduction hints