Orchestration: DagScheduler busy-spins at 250ms when SubAgentManager is saturated

## Description

When the `SubAgentManager` concurrency limit is reached (e.g. `[agents] max_concurrent = 1` default), the `DagScheduler` enters a tight retry loop at 250ms intervals. The log is flooded with:

```
ERROR zeph_core::agent: spawn_for_task failed error=concurrency limit reached (active: 1, max: 1) task_id=1
ERROR zeph_core::agent: spawn_for_task failed error=concurrency limit reached (active: 1, max: 1) task_id=1
```

...repeated every ~250ms indefinitely.

## Root Cause

In `DagScheduler::wait_event()`:

```rust
if self.running.is_empty() {
    tokio::time::sleep(self.deferral_backoff).await;
    return;
}
```

`self.running` is empty because no DAG tasks were successfully registered (all spawns failed). The fallback is `deferral_backoff = 250ms`. Since `DagScheduler.running` and `SubAgentManager`'s active-agent count are separate, the DagScheduler cannot tell that the subagent pool is occupied and will keep retrying at 250ms until the external sub-agent completes.

## Reproduction

1. `[agents] max_concurrent = 1` (default) or 1 sub-agent already running
2. Create a plan with 2+ tasks that require sub-agent spawning
3. Confirm the plan
4. Watch logs for flood of ERROR messages every 250ms

Config used: `/tmp/testing-orch-cancel.toml` (no `[agents]` section → max_concurrent defaults to 1)

## Expected

The DagScheduler should back off significantly longer (e.g., 1–5 seconds) when all spawn attempts fail due to concurrency limits, or ideally wait for a signal that a slot has freed. 250ms is too aggressive and floods logs.

## Actual

Tight 250ms spin loop with ERROR log on every iteration until the session is killed.

## Severity: Medium

Not data-loss, but causes log flood and CPU waste. Makes diagnosing actual errors harder.

## Suggested Fix

Increase `deferral_backoff` from 250ms to at least 1–2s, or implement exponential backoff up to a cap. Alternatively, subscribe to SubAgentManager slot-freed events to wake the DagScheduler exactly when a spawn is possible.

## Context

- Discovered in continuous improvement session, 2026-03-13
- Related: #1603 (plan cancel during execution), PERF-SC-04 (scheduler tick)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orchestration: DagScheduler busy-spins at 250ms when SubAgentManager is saturated #1618

Description

Root Cause

Reproduction

Expected

Actual

Severity: Medium

Suggested Fix

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Orchestration: DagScheduler busy-spins at 250ms when SubAgentManager is saturated #1618

Description

Description

Root Cause

Reproduction

Expected

Actual

Severity: Medium

Suggested Fix

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions