Skip to content

[Bug]: Active-memory embedded sub-agent run blocks event loop, starving Telegram polling — agent goes permanently unresponsive #65517

@badmutt

Description

@badmutt

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

Enabling the active-memory plugin causes Telegram polling to permanently stall because the embedded sub-agent call blocks the Node.js event loop; disabling active-memory immediately restores responsiveness, reproducible 100% across 5+ restart cycles.

Steps to reproduce

  1. Enable active-memory with config: agents: ["main"], model: "anthropic/claude-sonnet-4-20250514", queryMode: "recent", timeoutMs: 15000
  2. Restart gateway
  3. Send a message to the bot via Telegram DM
  4. Observe: no response, no telegram sendMessage entries in gateway log, Telegram API shows pending_update_count: 1
  5. Disable active-memory (set enabled: false), restart gateway
  6. Send same message — bot responds immediately

Expected behavior

Active-memory's embedded run should complete or timeout without blocking Telegram polling. The agent should remain responsive to incoming messages even if the sub-agent call is slow.

Actual behavior

Gateway boots and reports ready. Telegram providers start successfully. No incoming messages are processed. getUpdates polling stalls — confirmed by Telegram API showing pending_update_count: 1. Gateway process enters Dl state at 37.8% CPU, 641MB RAM. Logs show: "embedded_run_failover_decision: decision=surface_error, failoverReason=timeout, model=claude-opus-4-6, timedOut=true, aborted=true" followed by "Polling stall detected (active getUpdates stuck for 135.04s); forcing restart." Restart does not help because active-memory re-fires on session init.

OpenClaw version

2026.4.11 (769908e)

Operating system

Ubuntu 24.04 (Linux 6.8.0-106-generic, x64)

Install method

npm global

Model

anthropic/claude-opus-4-6 (inherited by active-memory sub-agent), also tested with explicit anthropic/claude-sonnet-4-20250514

Provider / routing chain

openclaw -> anthropic:default (API key, direct)

Additional provider/model setup details

anthropic:default uses a direct API key (sk-ant-api03). No proxy, no OpenRouter for this path. Active-memory inherits the session model (Opus) when config.model is unset. Setting config.model to Sonnet did not resolve the issue — the embedded call still blocks the event loop.

Logs, screenshots, and evidence

Gateway log sequence after boot with active-memory enabled:

17:09:17 [gateway] ready (3 plugins: active-memory, browser, telegram; 3.6s)
17:09:49 [telegram] [default] starting provider (@Soph7aBot)
17:25:06 [telegram] Polling stall detected (active getUpdates stuck for 135.04s); forcing restart
17:25:06 [telegram] polling runner stopped (polling stall detected); restarting in 2.05s
17:25:14 [agent/embedded] Profile anthropic:default timed out. Trying next account...
17:25:14 [agent/embedded] embedded_run_failover_decision: decision=surface_error, stage=assistant, failoverReason=timeout, model=claude-opus-4-6, fallbackConfigured=false, timedOut=true, aborted=true

Network confirmed healthy during stall:
curl api.telegram.org/bot<redacted>/getMe → 200 in 0.39s
curl api.anthropic.com/v1/messages → 405 in 0.08s (expected, GET not POST)

With active-memory disabled, same config boots and responds within seconds. Tested across 5+ restart cycles — 100% reproducible both ways.

Impact and severity

Affected: Any Telegram deployment using active-memory plugin
Severity: Critical — agent is completely unresponsive with no self-recovery path
Frequency: 100% reproducible (5/5 restart cycles with active-memory enabled, 0/5 with it disabled)
Consequence: Agent cannot receive or respond to any Telegram messages while active-memory is enabled

Additional information

Related issues with the same underlying event-loop starvation pattern:

The common root cause: blocking embedded operations on the single-process Node.js event loop starve Telegram's getUpdates long-polling. Active-memory is a new trigger for this existing class of bugs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions