[Bug]: Gateway WS self-contention still unresolved — cron tool timeouts from active sessions (#5703/#6508 circular-duped)

## Summary

Gateway WS self-contention when calling `cron` tool from within an active LLM session is still unresolved. The original issues (#5703 and #6508) were **circular-duped shut** — each closed as duplicate of the other — without an actual fix landing.

- #5703 closed as dupe of #6508
- #6508 closed as dupe of #5703

The bug persists as of v2026.3.x. We hit it daily when triggering cron jobs from active sessions.

## Reproduction (still works)

1. From an active LLM session (e.g., Discord or Telegram), call the `cron` tool (run/list/add)
2. The tool opens a second WS connection to the same gateway
3. Gateway's single-threaded event loop is busy processing the current LLM turn
4. Second WS request sits in queue, never gets processed → timeout after 10s
5. **The job actually runs successfully** — it's just the ack that times out

```
Error: gateway timeout after 10000ms
Gateway target: ws://127.0.0.1:18789
```

Also reproducible via CLI: `openclaw cron run <jobId>` from within an active session.

## Root cause (unchanged from #6508)

The `cron` tool routes through a **new WS connection** to the gateway instead of using the existing session's IPC/WS channel. The gateway's Node.js event loop is occupied by the current LLM turn, so it can't respond to the second connection within the timeout window.

**This is not a resource issue** — CPU/memory are fine. It's purely single-threaded event loop contention.

## Evidence from #6508 discussion

- Internal IPC path (`server-bridge-methods`) works instantly from active sessions
- External WS connections work fine when no session is active (17ms response)
- The timeout only occurs when the same gateway is already processing an LLM turn
- `handshake=connected` in logs confirms it's not an auth issue — the connection establishes, the gateway just never responds

## Preferred fix (from #6508 community discussion)

**Option B: In-process function calls for gateway-native tools**
- The internal `cron` tool already has an IPC path that works perfectly
- Route embedded tool calls (cron, gateway config, etc.) through in-process IPC instead of WS
- Zero overhead, immune to event loop contention
- This is how some tools (e.g., `gateway.config.get`) already work internally

**Alternative: Option A** — Multiplex tool calls on the existing session WS channel instead of opening a new connection. More complex but also viable.

## Current workaround

```bash
# Use CLI via exec tool instead of native cron tool
openclaw cron run <jobId> --timeout 3000 2>&1 || true
```

This spawns a separate process with its own event loop. The timeout error is cosmetic — the job always runs. But it's noisy and confusing for LLM agents that may interpret the timeout as a failure and retry.

## Impact

- Every `cron` tool call from an active session hits this
- Risk of duplicate jobs if LLM retries on false timeout
- Affects `cron.run`, `cron.list`, `cron.add`, and potentially other gateway-native tool calls under load
- Users/agents must use CLI workaround, which adds latency and error noise

## References

- #6508 — Original detailed report with root cause analysis
- #5703 — Duplicate report with additional reproduction data
- #20217 — Related UX issue (cron timeouts displaying as errors)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Gateway WS self-contention still unresolved — cron tool timeouts from active sessions (#5703/#6508 circular-duped) #40237

Summary

Reproduction (still works)

Root cause (unchanged from #6508)

Evidence from #6508 discussion

Preferred fix (from #6508 community discussion)

Current workaround

Impact

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Gateway WS self-contention still unresolved — cron tool timeouts from active sessions (#5703/#6508 circular-duped) #40237

Description

Summary

Reproduction (still works)

Root cause (unchanged from #6508)

Evidence from #6508 discussion

Preferred fix (from #6508 community discussion)

Current workaround

Impact

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions