Problem
When a sub-agent fails (bad config, missing dependency, logic error), there is no mechanism to prevent the parent from immediately spawning it again. This creates cascade failures where the same broken sub-agent is retried indefinitely, burning tokens and time.
OpenClaw tracks sub-agent completion status and has configurable runTimeoutSeconds, but there is no circuit breaker that says "this sub-agent has failed N times in a row — stop trying."
Evidence
- Cascade failures where a broken sub-agent was repeatedly spawned, each time failing the same way
- Token spend accumulates with each failed attempt (model boot cost, context injection, partial execution)
Workaround
I built cc-failure-guard.py, a PreToolUse hook (~80 lines Python) that:
- Intercepts
SubagentStart events before execution
- Checks if a failure marker file exists in the target workspace
- If present → blocks the spawn with feedback explaining the failure
- The marker must be manually cleared before the sub-agent can be retried
Proposed Solution
Add a native circuit breaker to the sub-agent spawn system:
- Track consecutive failures per sub-agent (by agent ID or task signature)
- Configurable threshold: after N consecutive failures, block further spawns
- Cooldown period: after threshold is hit, block for a configurable duration
- User override: allow manual reset or forced spawn that bypasses the breaker
- Notification: alert the user when a circuit breaker trips
{
"agents": {
"defaults": {
"subagents": {
"circuitBreaker": {
"enabled": true,
"maxConsecutiveFailures": 3,
"cooldownMinutes": 30,
"notifyOnTrip": true
}
}
}
}
}
Impact
Medium. Prevents token waste and user frustration from cascade failures. Especially valuable for automated/unattended pipelines.
Environment
- OpenClaw 2026.4.10 (npm, macOS)
Problem
When a sub-agent fails (bad config, missing dependency, logic error), there is no mechanism to prevent the parent from immediately spawning it again. This creates cascade failures where the same broken sub-agent is retried indefinitely, burning tokens and time.
OpenClaw tracks sub-agent completion status and has configurable
runTimeoutSeconds, but there is no circuit breaker that says "this sub-agent has failed N times in a row — stop trying."Evidence
Workaround
I built
cc-failure-guard.py, a PreToolUse hook (~80 lines Python) that:SubagentStartevents before executionProposed Solution
Add a native circuit breaker to the sub-agent spawn system:
{ "agents": { "defaults": { "subagents": { "circuitBreaker": { "enabled": true, "maxConsecutiveFailures": 3, "cooldownMinutes": 30, "notifyOnTrip": true } } } } }Impact
Medium. Prevents token waste and user frustration from cascade failures. Especially valuable for automated/unattended pipelines.
Environment