[Bug]: OpenClaw accepts tasks but agents often do not execute them, return placeholder replies, and activity/log visibility is inconsistent

### Bug type

Regression (worked before, now fails)

## Summary

When assigning tasks in OpenClaw, the system frequently appears to accept the request, but the agent does not actually complete the task. In affected runs, the UI shows placeholder responses such as "One sec" / "let me actually test it now", and the task either appears stuck or provides little/no visible execution detail in the chat/feed.

The underlying logs suggest this is a mix of:

- upstream model failures (`API rate limit reached`, `HTTP 401 missing scopes: model.request`, `LLM request timed out`)
- gateway instability/restarts during active webchat sessions
- agent responses that look like conversational filler rather than real tool execution
- a visibility gap where backend task logs exist, but the user-facing experience makes it look like there was no real work/logging

## Environment

- OS: macOS `darwin 25.3.0`
- OpenClaw config primary model: `kimi-coding/k2p5`
- Gateway model at runtime: `kimi-coding/k2p5`
- Webchat client observed: `openclaw-control-ui webchat v2026.3.7`
- Local backend health during investigation:
  - `{"status":"healthy","timestamp":"2026-03-08T16:35:09.062Z","uptime":364.136322625,"agentSessions":{}}`

## User-facing symptoms

- Asking OpenClaw to do a task does not reliably result in actual execution.
- Instead, agents sometimes respond with placeholder text like:
  - `Still planning to check gog — let me do that now. One sec.`
  - `Let me actually check gog now instead of just saying I will. One sec.`
- Earlier runs appeared to show no useful activity, even though backend task logs existed.
- The result is that from the UI it looks like OpenClaw accepted the request but did not actually do the work.

## Reproduction

This appears intermittent, but the general repro is:

1. Start OpenClaw with `kimi-coding/k2p5` as the primary model.
2. Open the webchat UI.
3. Assign a task to an agent/workspace.
4. Observe one of the following:
   - task stalls
   - placeholder/non-executing reply
   - no meaningful progress visible in chat/feed
   - backend shows model/gateway failures

## Expected behavior

- Task assignment should either:
  - execute successfully and show meaningful progress/logs, or
  - fail clearly with a surfaced error state
- Agents should not emit conversational filler as if they are about to work unless they are actually proceeding with execution.
- If backend task logs exist, the UI should make that visible enough that the user does not conclude "nothing happened".

## Actual behavior

- Tasks are accepted, but execution is unreliable.
- Model failures occur in the background.
- Some agent responses are filler/placeholder text rather than actual completion.
- Gateway restarts/disconnects occur around active webchat sessions.
- User perception is that OpenClaw did nothing and showed no activity.

## Evidence

### 1. Gateway is running on `kimi-coding/k2p5`

From `logs/gateway.log` on 2026-03-08:

```text
2026-03-08T16:11:06.004+00:00 [gateway] agent model: kimi-coding/k2p5
2026-03-08T16:15:44.998+00:00 [gateway] agent model: kimi-coding/k2p5
2026-03-08T16:16:01.225+00:00 [gateway] agent model: kimi-coding/k2p5
```

### 2. Model/gateway failures during active use

From `logs/gateway.err.log`:

```text
2026-03-08T16:01:23.986+00:00 [agent/embedded] embedded run agent end: runId=d84b6658-b26b-47ba-8e59-6abd6fd9b219 isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-08T16:03:13.125+00:00 [agent/embedded] embedded run agent end: runId=39c24751-1ac6-483f-aa97-7ae59512b7f4 isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-08T16:09:01.233+00:00 [agent/embedded] embedded run agent end: runId=cf5e5492-92c5-4f97-a10d-ec9d16c6f3c4 isError=true error=HTTP 401: You have insufficient permissions for this operation. Missing scopes: model.request.
2026-03-08T16:10:01.115+00:00 [agent/embedded] embedded run agent end: runId=cf5e5492-92c5-4f97-a10d-ec9d16c6f3c4 isError=true error=LLM request timed out.
```

### 3. Gateway restarts while webchat is connected

From `logs/gateway.log`:

```text
2026-03-08T16:11:44.852+00:00 [ws] webchat connected conn=2e046649-12fa-4a56-b141-1675d7de3c35 remote=127.0.0.1 client=openclaw-control-ui webchat v2026.3.7
2026-03-08T16:11:57.802+00:00 [ws] ⇄ res ✓ chat.send 51ms runId=bfec63f1-cb1d-4db5-a0b0-eaa4944be7cd conn=2e046649…3c35 id=403a6e55…aaf9
2026-03-08T16:15:42.221+00:00 [gateway] signal SIGTERM received
2026-03-08T16:15:42.224+00:00 [gateway] received SIGTERM; shutting down
2026-03-08T16:15:42.275+00:00 [ws] webchat disconnected code=1012 reason=service restart conn=14d4c1d0-ff76-4879-954b-e170ba264b00
```

### 4. Task logs show repeated orchestration retries / fallback behavior

Representative stuck task log (`t38459756`, "Process bank statements for March"):

```text
Task claimed by Jerry — moved to in_progress
Orchestrator mode — sending decomposition request to Jerry
No response from orchestrator — falling back to planning spec
Orchestrator mode — sending decomposition request to Jerry
No response from orchestrator — falling back to planning spec
...
```

### 5. Placeholder/non-executing agent replies

Later logs for the same task show agents returning filler text:

```text
Alex responded (68 chars)
detail="Hey! 👋

Still planning to check gog — let me do that now. One sec."

Nora responded (74 chars)
detail="Hey! Let me actually check gog now instead of just saying I will. One sec."
```

This is particularly problematic because it reads like execution is starting, but in practice these messages are not reliable evidence that useful work is happening.

## Suspected root causes

- Primary model path is unstable under current account/quota conditions.
- Failover chain includes providers/profiles that can error with auth scope issues.
- Gateway restarts may interrupt active chat/task execution.
- Agent prompt/tool handoff may allow filler responses to be treated as acceptable task progress.
- UI/log surfacing may not make backend execution state obvious enough when failures happen.

## Suggested fixes

1. Surface model/provider failures directly in the UI when task execution fails.
2. Mark tasks as failed/degraded when agent runs terminate with quota/auth/timeout errors.
3. Prevent filler responses like "One sec" from being treated as meaningful task progress.
4. Improve webchat resilience around gateway restart/disconnect/reconnect events.
5. Expose backend task logs more clearly in the main user flow so failures are visible without digging.
6. Consider validating provider auth scopes and quota health before dispatching tasks.

## Additional local factor: duplicate clients and handshake timeout spam

During investigation, the gateway error log showed a continuous stream of `handshake timeout` / `closed before connect` warnings, repeating every ~11 seconds:

```text
2026-03-08T16:36:26 [gateway/ws] handshake timeout conn=09798fa7… remote=127.0.0.1
2026-03-08T16:36:26 [gateway/ws] closed before connect conn=09798fa7… remote=127.0.0.1 … code=1000 reason=n/a
2026-03-08T16:36:37 [gateway/ws] handshake timeout conn=7001b80d… remote=127.0.0.1
...
```

Process inspection revealed **multiple concurrent local clients** connected to `ws://127.0.0.1:18789`:

| Client | PID | Connection style | Status |
|--------|-----|-----------------|--------|
| Dashboard backend (`server.js`) | 1328 | Challenge/connect handshake | ESTABLISHED |
| **Duplicate** dashboard backend (`server.js`) | 2040 | Challenge/connect handshake | ESTABLISHED |
| Chrome / webchat UI | 1436 | Webchat protocol | ESTABLISHED |
| Cursor extension host | 2427 | Extension bridge | ESTABLISHED |
| Legacy Python bridge (`openclaw-bridge-simple.py`) | 600 | Old `ws?agent=...&token=...` direct style | Running |
| Apple WebKit networking | 5484 | Unknown | ESTABLISHED |

### Why this matters

- The **duplicate dashboard dev stack** means two `server.js` instances both try to connect to the gateway, dispatch tasks, and poll for results — potentially causing duplicate or conflicting task execution.
- The **legacy Python bridge** uses an older direct WebSocket URL format (`/ws?agent=...&token=...`) instead of the newer challenge/connect handshake protocol. This mismatch likely causes the gateway to accept the TCP connection but fail the handshake, producing the timeout spam.
- Multiple clients competing for the same agent sessions can cause:
  - Race conditions in task dispatch
  - Duplicate orchestration attempts
  - One client consuming a response meant for another
  - Gateway resource contention

### Mitigation

Stopping the duplicate dashboard processes and the legacy Python bridge reduced the handshake timeout noise:

```bash
kill 1924 1942 1944 1998 2036 2040 600
```

### Suggestion for OpenClaw

1. The gateway should log which client identity is failing the handshake (client name, auth method attempted) so users can identify the offending process.
2. Consider rejecting incompatible/legacy handshake attempts with a clear error message rather than silently timing out.
3. Document that only one dashboard backend should connect to the gateway at a time.

## Impact

- Core task execution appears unreliable.
- Users lose trust because OpenClaw sounds like it is working while not actually completing work.
- Lack of clear surfaced failure state makes debugging harder.
- Duplicate local clients can amplify failures and make debugging significantly harder.



### Steps to reproduce

## Steps to reproduce

1. Open the OpenClaw gateway.
2. Open the webchat.
3. Ask it to do a particular task.
4. It says it’s doing the task, but nothing actually happens — the pulsing red border blinking for a second and then nothing happen with no progress.

### Expected behavior

- The agent should start doing the task, with the red border pulsing as it responds.
- Or, if I refresh the page, I should see the logs of what it’s doing to complete the task.
- Right now neither of those happens.

### Actual behavior

It just says it’s looking at the task, and nothing happens.

### OpenClaw version

Version 2026.3.7

### Operating system

macOS (Darwin 25.3.0) / Tahoe 26.3.1

### Install method

_No response_

### Logs, screenshots, and evidence

```shell

```

### Impact and severity

_No response_

### Additional information

_No response_

Client	PID	Connection style	Status
Dashboard backend (`server.js`)	1328	Challenge/connect handshake	ESTABLISHED
Duplicate dashboard backend (`server.js`)	2040	Challenge/connect handshake	ESTABLISHED
Chrome / webchat UI	1436	Webchat protocol	ESTABLISHED
Cursor extension host	2427	Extension bridge	ESTABLISHED
Legacy Python bridge (`openclaw-bridge-simple.py`)	600	Old `ws?agent=...&token=...` direct style	Running
Apple WebKit networking	5484	Unknown	ESTABLISHED

Uh oh!

[Bug]: OpenClaw accepts tasks but agents often do not execute them, return placeholder replies, and activity/log visibility is inconsistent #40082

Description

Bug type

Summary

Environment

User-facing symptoms

Reproduction

Expected behavior

Actual behavior

Evidence

1. Gateway is running on kimi-coding/k2p5

2. Model/gateway failures during active use

3. Gateway restarts while webchat is connected

4. Task logs show repeated orchestration retries / fallback behavior

5. Placeholder/non-executing agent replies

Suspected root causes

Suggested fixes

Additional local factor: duplicate clients and handshake timeout spam

Why this matters

Mitigation

Suggestion for OpenClaw

Impact

Steps to reproduce

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Gateway is running on `kimi-coding/k2p5`