Summary
The gateway declares itself "ready" and fires channel startup before the anthropic plugin (which provides the claude-cli agent harness) has finished initializing. This causes immediate channel startup failure on every boot:
[gateway] ready (2 plugins: memory-core, memory-wiki; 3.6s)
[gateway/channels] channel startup failed: Error: Requested agent harness "claude-cli"
is not registered and PI fallback is disabled.
The anthropic plugin loads later (triggered lazily by models.list), but by then the channel has already failed. This leaves the gateway in a degraded state where followup/embedded agent dispatches can fail with the same harness-not-registered error.
Reproduction
- Environment: WSL2 (Linux 6.6.x), OpenClaw v2026.4.24, gateway as systemd user service
- Config:
claude-cli/claude-opus-4-7 as primary model (requires anthropic plugin for harness)
- Steps: Restart gateway (
systemctl --user restart openclaw-gateway), observe logs
- Result: Channel startup fails every time. The anthropic plugin loads 60-90+ seconds later when the Control UI triggers
models.list.
Gateway log timeline (typical boot)
21:23:30 [plugins] memory-core installed bundled runtime deps in 890ms
21:23:31 [plugins] memory-wiki installed bundled runtime deps in 347ms
21:23:32 [gateway] ready (2 plugins: memory-core, memory-wiki; 3.6s)
21:23:32 [gateway/channels] channel startup failed: "claude-cli" is not registered
... 90+ seconds gap ...
[models.list triggers remaining plugin init]
[plugins] anthropic installed bundled runtime deps
[plugins] brave installed bundled runtime deps
... etc ...
Impact
When the harness isn't registered at channel startup:
- Followup agent dispatch fails — after a CLI turn completes, the gateway tries to dispatch followup work but the harness is deregistered. All 17 fallback models fail instantly with the same error.
- Cascading timeout loop — if the CLI session happens to go quiet (e.g., running a tool that produces no streaming output for 180s), the gateway kills it. The abort deregisters the harness, and every subsequent fallback attempt fails instantly. The gateway cycles through opus-4-7 → gpt-5.5 → gpt-5.4 → gemini → opus-4-6 → ... → all 17 candidates fail in under 1 second.
- User sees duplicate/lost messages — the Control UI retries sends, messages appear to vanish, and the session becomes unresponsive until a manual WS disconnect/reconnect.
Root cause
The gateway's plugin loading has two phases:
- Startup plugins (memory-core, memory-wiki) — loaded before "ready"
- Lazy plugins (anthropic, brave, google, openai, xai) — loaded on-demand, triggered by
models.list
Channel startup fires immediately after phase 1, but the primary model's harness (claude-cli) is provided by the anthropic plugin in phase 2. There's no mechanism to defer channel startup until the primary model's harness is available.
Suggested fix
Either:
- Load the primary model's plugin in phase 1 — if the configured primary model requires a specific plugin (e.g.,
claude-cli/* requires anthropic), ensure that plugin is loaded before declaring "ready"
- Defer channel startup — don't fire channel startup until the primary model's harness is registered, with a reasonable timeout
- Retry channel startup — if channel startup fails due to a missing harness, retry after plugin lazy-load completes
Related
The harness deregistration on CLI abort is a separate issue that amplifies this one. When a CLI session is aborted (180s no-output timeout), the claude-cli harness deregisters, and all fallback candidates fail instantly instead of being able to spawn fresh CLI processes.
Environment
|
|
| OpenClaw |
v2026.4.24 |
| OS |
WSL2 on Windows (Linux 6.6.87.2-microsoft-standard-WSL2) |
| Gateway |
loopback-only, systemd user service |
| Primary model |
claude-cli/claude-opus-4-7 |
Summary
The gateway declares itself "ready" and fires channel startup before the
anthropicplugin (which provides theclaude-cliagent harness) has finished initializing. This causes immediate channel startup failure on every boot:The anthropic plugin loads later (triggered lazily by
models.list), but by then the channel has already failed. This leaves the gateway in a degraded state where followup/embedded agent dispatches can fail with the same harness-not-registered error.Reproduction
claude-cli/claude-opus-4-7as primary model (requires anthropic plugin for harness)systemctl --user restart openclaw-gateway), observe logsmodels.list.Gateway log timeline (typical boot)
Impact
When the harness isn't registered at channel startup:
Root cause
The gateway's plugin loading has two phases:
models.listChannel startup fires immediately after phase 1, but the primary model's harness (
claude-cli) is provided by theanthropicplugin in phase 2. There's no mechanism to defer channel startup until the primary model's harness is available.Suggested fix
Either:
claude-cli/*requiresanthropic), ensure that plugin is loaded before declaring "ready"Related
The harness deregistration on CLI abort is a separate issue that amplifies this one. When a CLI session is aborted (180s no-output timeout), the
claude-cliharness deregisters, and all fallback candidates fail instantly instead of being able to spawn fresh CLI processes.Environment
claude-cli/claude-opus-4-7