Event-loop starvation during context compaction causes fetch timeouts (16.9s timer delay)

## Summary

During context overflow auto-compaction, the Node.js event loop stalls for ~17 seconds, causing pending fetch operations (e.g. Telegram API calls) to time out — even when their timeout is set to 10s. This is consistent with CPU-synchronous work blocking the event loop during compaction.

## Environment

- openclaw version: latest npm (`npm info openclaw` → `git+https://github.com/openclaw/openclaw.git`)
- Node.js: 22.22.0
- Provider: `openai-codex/gpt-5.5`
- Platform: Ubuntu 22.04

## Observed sequence

```
06:31:07 WARN [context-overflow-diag]
  sessionKey=agent:main:telegram:group:...:topic:539
  provider=openai-codex/gpt-5.5
  messages=246
  compactionAttempts=0
  error=Context overflow: estimated context size exceeds safe threshold during tool loop

06:31:07 WARN context overflow detected (attempt 1/3); attempting auto-compaction for openai-codex/gpt-5.5

06:32:13 WARN [fetch-timeout]
  timeoutMs=10000
  elapsedMs=26963
  timerDelayMs=16963
  eventLoopDelayHint="timer delayed 16963ms, likely event-loop starvation"
  operation=fetchWithTimeout
  url=https://api.telegram.org/bot.../getMe

06:33:33 INFO auto-compaction succeeded for openai-codex/gpt-5.5; retrying prompt
06:33:33 INFO post-compaction guard armed for 3 attempts
```

The `timerDelayMs=16963` in your own `[fetch-timeout]` log confirms the event loop was blocked for 16.9s during compaction — the 10s fetch timer couldn't fire until 26.9s elapsed.

## Cascading effect

After compaction the agent resumed but then ran two web search tool calls that both hit MCP -32001 timeout:

```
ERROR [tools] kindly-search__web_search failed: MCP error -32001: Request timed out
ERROR [tools] kindly-search__web_search failed: MCP error -32001: Request timed out
```

These may also be caused by the event loop being saturated post-compaction, or by MCP server state after the stall.

## Expected behaviour

Compaction should not block the event loop. If it involves heavy JSON serialisation / summarisation API calls, those should be done in a worker thread or with `setImmediate` yields so pending timers can fire normally.

## Suggested fix direction

- Move compaction's CPU-heavy phase (token counting, session serialisation, summarisation request) to a worker thread or split with `setImmediate` to yield the event loop
- Alternatively, drain and re-arm pending fetch timeouts after compaction completes

## Impact

In our setup (agent-chat-telegram orchestrator driving OpenClaw as a subprocess), the stall causes the orchestrator's own 300s timeout to eventually fire and terminate the OpenClaw call, surfacing as a generic failure to the end user.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Event-loop starvation during context compaction causes fetch timeouts (16.9s timer delay) #86358

Summary

Environment

Observed sequence

Cascading effect

Expected behaviour

Suggested fix direction

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Event-loop starvation during context compaction causes fetch timeouts (16.9s timer delay) #86358

Description

Summary

Environment

Observed sequence

Cascading effect

Expected behaviour

Suggested fix direction

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions