Gateway startup race: channels fire before anthropic plugin registers claude-cli harness

## Summary

The gateway declares itself "ready" and fires channel startup before the `anthropic` plugin (which provides the `claude-cli` agent harness) has finished initializing. This causes immediate channel startup failure on every boot:

```
[gateway] ready (2 plugins: memory-core, memory-wiki; 3.6s)
[gateway/channels] channel startup failed: Error: Requested agent harness "claude-cli" 
  is not registered and PI fallback is disabled.
```

The anthropic plugin loads later (triggered lazily by `models.list`), but by then the channel has already failed. This leaves the gateway in a degraded state where followup/embedded agent dispatches can fail with the same harness-not-registered error.

## Reproduction

- **Environment:** WSL2 (Linux 6.6.x), OpenClaw v2026.4.24, gateway as systemd user service
- **Config:** `claude-cli/claude-opus-4-7` as primary model (requires anthropic plugin for harness)
- **Steps:** Restart gateway (`systemctl --user restart openclaw-gateway`), observe logs
- **Result:** Channel startup fails every time. The anthropic plugin loads 60-90+ seconds later when the Control UI triggers `models.list`.

## Gateway log timeline (typical boot)

```
21:23:30 [plugins] memory-core installed bundled runtime deps in 890ms
21:23:31 [plugins] memory-wiki installed bundled runtime deps in 347ms
21:23:32 [gateway] ready (2 plugins: memory-core, memory-wiki; 3.6s)
21:23:32 [gateway/channels] channel startup failed: "claude-cli" is not registered
  ... 90+ seconds gap ...
  [models.list triggers remaining plugin init]
  [plugins] anthropic installed bundled runtime deps
  [plugins] brave installed bundled runtime deps
  ... etc ...
```

## Impact

When the harness isn't registered at channel startup:

1. **Followup agent dispatch fails** — after a CLI turn completes, the gateway tries to dispatch followup work but the harness is deregistered. All 17 fallback models fail instantly with the same error.
2. **Cascading timeout loop** — if the CLI session happens to go quiet (e.g., running a tool that produces no streaming output for 180s), the gateway kills it. The abort deregisters the harness, and every subsequent fallback attempt fails instantly. The gateway cycles through opus-4-7 → gpt-5.5 → gpt-5.4 → gemini → opus-4-6 → ... → all 17 candidates fail in under 1 second.
3. **User sees duplicate/lost messages** — the Control UI retries sends, messages appear to vanish, and the session becomes unresponsive until a manual WS disconnect/reconnect.

## Root cause

The gateway's plugin loading has two phases:
1. **Startup plugins** (memory-core, memory-wiki) — loaded before "ready"
2. **Lazy plugins** (anthropic, brave, google, openai, xai) — loaded on-demand, triggered by `models.list`

Channel startup fires immediately after phase 1, but the primary model's harness (`claude-cli`) is provided by the `anthropic` plugin in phase 2. There's no mechanism to defer channel startup until the primary model's harness is available.

## Suggested fix

Either:
- **Load the primary model's plugin in phase 1** — if the configured primary model requires a specific plugin (e.g., `claude-cli/*` requires `anthropic`), ensure that plugin is loaded before declaring "ready"
- **Defer channel startup** — don't fire channel startup until the primary model's harness is registered, with a reasonable timeout
- **Retry channel startup** — if channel startup fails due to a missing harness, retry after plugin lazy-load completes

## Related

The harness deregistration on CLI abort is a separate issue that amplifies this one. When a CLI session is aborted (180s no-output timeout), the `claude-cli` harness deregisters, and all fallback candidates fail instantly instead of being able to spawn fresh CLI processes.

## Environment

| | |
|---|---|
| OpenClaw | v2026.4.24 |
| OS | WSL2 on Windows (Linux 6.6.87.2-microsoft-standard-WSL2) |
| Gateway | loopback-only, systemd user service |
| Primary model | `claude-cli/claude-opus-4-7` |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gateway startup race: channels fire before anthropic plugin registers claude-cli harness #71957

Summary

Reproduction

Gateway log timeline (typical boot)

Impact

Root cause

Suggested fix

Related

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development


OpenClaw	v2026.4.24
OS	WSL2 on Windows (Linux 6.6.87.2-microsoft-standard-WSL2)
Gateway	loopback-only, systemd user service
Primary model	`claude-cli/claude-opus-4-7`

Uh oh!

Gateway startup race: channels fire before anthropic plugin registers claude-cli harness #71957

Description

Summary

Reproduction

Gateway log timeline (typical boot)

Impact

Root cause

Suggested fix

Related

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions