-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
Multi-agent orchestration is unstable: concurrent agents add/config overwrites, session-lock failures, and detached child work #43367
Copy link
Copy link
Open
BingqingLyu/openclaw
#612Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.ClawSweeper found an open linked pull request for this issue.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.Auth, provider routing, model choice, or SecretRef resolution may break.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.ClawSweeper found an open linked pull request for this issue.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:auth-providerAuth, provider routing, model choice, or SecretRef resolution may break.Auth, provider routing, model choice, or SecretRef resolution may break.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
I tried to orchestrate a small parallel coding batch from the OpenClaw CLI on
2026.3.8and hit a cluster of failures that make multi-agent runs unreliable in practice:openclaw agents addappears unsafe when invoked concurrently: config gets overwritten repeatedly and only a subset of agents persist.openclaw agentconcurrent runs hit session lock timeouts even with isolated agents/workspaces.openai-codexOAuth refresh races (refresh_token_reused).next build,npm install) running without a clean handle.This makes it hard to use OpenClaw as an orchestrator for parallel coding tasks, even when each agent has its own workspace.
Version / environment
2026.3.8 (3caab92)openai-codex/gpt-5.3-codexanthropic/claude-sonnet-4-6Reproduction
I created 4 isolated git worktrees for the same repo, then tried to create and run 4 isolated agents for parallel work.
1. Concurrent agent creation
Commands like:
Observed behavior:
openclaw agents list --jsonshowed only a subset of the agents.2. Concurrent agent runs
After recreating the agents sequentially, I launched multiple runs in parallel, for example:
I also tested
--local.Observed failures included:
and:
The lock files were agent-specific, for example:
/home/user/.openclaw/agents/lane129/sessions/...jsonl.lock/home/user/.openclaw/agents/lane130/sessions/...jsonl.lockUnexpected detached work
In at least one case, the CLI path reported failure or became unusable, but the underlying agent had clearly kept working in the background:
openclaw-agentprocesses were still runningnext build --turbopacknpm installSo from the operator perspective:
kill, worktree cleanup, agent deletion)Expected behavior
agents addshould not race on the global config file.Actual behavior
Related issues
I found partial overlap with existing issues, especially around session locks and OAuth refresh races, for example:
#42160Session store monolithic JSON with global lock causes ...#32799Session file lock not released when holding process dies ...#26322OAuth token refresh race condition causes spurious failover ...But the scenario here is specifically the end-to-end multi-agent orchestration path: create multiple isolated agents + launch multiple coding runs + observe config races, session locks, auth noise, and detached child work.
What would help
A fix or guardrail in any of these areas would help a lot:
agents addconfig writesIf helpful, I can also provide the exact command transcript / cleanup steps I used.