[Bug]: Active Memory Telegram preflight retains local embedding model mapping after timeout

### Bug type

Behavior bug (latency and retained process memory after completed or timed-out Active Memory preflight)

### Beta release blocker

No

### Summary

Consolidated replacement for #83773 and original #83752, with the later live profiling evidence folded into the main issue body.

On a live Linux VPS running `OpenClaw 2026.5.18 (50a2481)`, Telegram group-topic turns that trigger Active Memory preflight can sharply increase gateway parent RSS and leave it elevated after the turn completes, even when `/readyz` is healthy and OpenClaw reports `0 queued · 0 running`.

The newest profiling datapoint narrows the retained RSS from a generic gateway memory symptom to a specific retained file-backed local embedding model mapping:

```text
rss_kb=314504 pss_kb=314504 anon_kb=0 priv_clean_kb=314504 priv_dirty_kb=0 path=/home/ubuntu/.node-llama-cpp/models/hf_ggml-org_embeddinggemma-300m-qat-Q8_0.gguf
```

Important: this does **not** appear to be a local chat-model fallback. The live config showed chat routing as `openai/gpt-5.5` primary with `google/gemini-2.5-flash` fallback. The local GGUF model is used by Memory Search / `memory-core` as the local embedding backend. Active Memory runs before the normal reply, performs memory search, and that memory-search path loads or touches the local embedding model in the gateway parent process. After the Active Memory timeout, that mapping remains resident while the gateway is otherwise idle.

### Steps to reproduce

1. Run OpenClaw `2026.5.18` as a systemd user gateway with Telegram and Active Memory enabled.
2. Use a Telegram group topic/session where Active Memory is allowed for group/channel style sessions.
3. Configure Active Memory with `queryMode: "message"`, `timeoutMs: 5000`, `setupGraceTimeoutMs: 5000`, and `allowedChatTypes: ["direct", "group", "channel"]`.
4. Restart the gateway cleanly and wait for `/readyz`.
5. Record gateway parent RSS, RssAnon, RssFile, PSS, child RSS, cgroup memory, and OpenClaw task pressure while idle.
6. Send `/active-memory on` in the Telegram topic.
7. Send one short normal Telegram message in that topic.
8. Wait for the reply and then leave the gateway idle.
9. Re-check `/readyz`, task pressure, parent RSS/RssAnon/RssFile/PSS, child RSS, and top `smaps` mappings.
10. Compare against the clean post-restart baseline and/or repeat with Active Memory disabled in the same topic.

### Expected behavior

Completed Telegram turns should not leave the gateway retaining hundreds of MB of extra RSS after the system is idle.

If Active Memory times out, it should release/clean up transient recall resources it owns and degrade the reply path without leaving a high retained RSS footprint. If the local memory-search embedding model is intentionally cached, that should be explicit and bounded so operators do not see an unexpected ~300 MB retained mapping after a timed-out Telegram preflight.

### Actual behavior

The affected VPS repeatedly showed:

- clean post-restart gateway parent RSS around 430-570 MB after settling;
- Active Memory Telegram turns increasing parent RSS to around 1.0-1.08 GB;
- `/readyz` healthy and task pressure `0 queued · 0 running` while RSS stayed elevated;
- a clean restart bringing the gateway back to the lower baseline;
- during the newest controlled `message` / 5s test, the largest retained mapping after the timeout was the local Memory Search embedding GGUF file.

### OpenClaw version

`OpenClaw 2026.5.18 (50a2481)`

### Operating system

Ubuntu 24.04.3 LTS, Linux `6.17.0-1011-oracle`, `aarch64`

### Install method

System-global npm install:

```text
node=v22.22.0
npm=10.9.4
install root=/usr/lib/node_modules/openclaw
service=/home/ubuntu/.config/systemd/user/openclaw-gateway.service
ExecStart=/usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789
```

### Model and routing

Normal chat/model routing from the live gateway config:

```text
agents.defaults.model.primary: openai/gpt-5.5
agents.defaults.model.fallbacks[0]: google/gemini-2.5-flash
```

No local chat/model fallback was found in the checked config.

Memory Search configuration from the same live gateway:

```text
agents.defaults.memorySearch.provider: local
agents.defaults.memorySearch.fallback: none
plugins.slots.memory: memory-core
```

`openclaw memory status --deep` confirmed the local embedding model used by Memory Search:

```text
Memory Search (main)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Sources: memory
Indexed: 240/240 files · 2184 chunks
Store: ~/.openclaw/memory/main.sqlite
Embeddings: ready
Vector store: ready
Semantic vectors: ready
Vector dims: 768
Vector path: /usr/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-arm64/vec0.so
Embedding cache: enabled (2884 entries)
Recall store: 9011 entries
```

For the `codex` agent, the same local embedding backend was configured, though with no indexed chunks at the time checked:

```text
Memory Search (codex)
Provider: local (requested: local)
Model: hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf
Indexed: 0/42 files · 0 chunks
Store: ~/.openclaw/memory/codex.sqlite
```

### Enabled plugins and local state size

```text
plugins list: 92 known, 11 enabled/loaded, 0 errors
enabled: active-memory, anthropic, file-transfer, google, memory-core, memory-wiki, ollama, openai, telegram, tokenjuice, codex
codex plugin source: ~/.openclaw/npm/node_modules/@openclaw/codex
codex plugin version: 2026.5.6
```

```text
~/.openclaw total: 7.7G
~/.openclaw/agents/main/sessions: 376M, 687 jsonl files
~/.openclaw/agents/codex/sessions: 368K, 10 jsonl files
~/.openclaw/agents/main/agent/codex-home/logs_2.sqlite: 714,862,592 bytes
```

### Active Memory configurations tested

Initial heavy configuration:

```json
{
  "agents": ["main"],
  "allowedChatTypes": ["direct", "group", "channel"],
  "enabled": true,
  "logging": true,
  "maxSummaryChars": 220,
  "persistTranscripts": false,
  "promptStyle": "contextual",
  "queryMode": "full",
  "setupGraceTimeoutMs": 30000,
  "timeoutMs": 30000
}
```

Reduced configuration, still reproduced:

```json
{
  "queryMode": "recent",
  "promptStyle": "balanced",
  "timeoutMs": 15000,
  "setupGraceTimeoutMs": 15000,
  "allowedChatTypes": ["direct", "group", "channel"]
}
```

Lowest-latency controlled Telegram-topic configuration, still reproduced:

```json
{
  "queryMode": "message",
  "timeoutMs": 5000,
  "setupGraceTimeoutMs": 5000,
  "allowedChatTypes": ["direct", "group", "channel"],
  "persistTranscripts": false,
  "logging": true
}
```

### Evidence: controlled profiling run with local embedding model mapping

No gateway restart, config change, hotfix, or heap snapshot was performed during this capture. A 2s sampler recorded `/proc/<gateway-pid>/status`, `smaps_rollup`, child RSS, and cgroup memory from `2026-05-18T22:16:30Z` to `2026-05-18T22:22:30Z`.

Baseline immediately before the controlled run:

```text
readyz: healthy
task pressure: 0 queued / 0 running
gateway parent PID: 1289215
parent RSS: 600848 kB
parent RssAnon: 540188 kB
parent RssFile: 60660 kB
parent PSS: 546904 kB
child Codex app-server PID: 1341301
child RSS: 46032 kB
```

Test sequence:

1. Sent `/active-memory on` in the same Telegram group topic.
2. Sent one short normal Telegram message in that topic.
3. Waited for the reply and then left the gateway idle.

Relevant journal lines, redacted to behavior and timing:

```text
22:16:45 inbound Telegram group/topic command, 17 chars
22:16:46 outbound send ok

22:16:58 inbound Telegram group/topic message, 58 chars
22:17:00 main embedded agent started
22:17:01 active-memory start timeoutMs=5000 queryChars=58 searchQueryChars=58
22:17:01 active-memory embedded run started
22:17:11 before_prompt_build handler from active-memory failed: timed out after 10000ms
22:17:12 active-memory done status=timeout elapsedMs=10236 summaryChars=0
22:17:40 Telegram sendMessage ok
```

Sampler summary:

```text
samples: 175
first_ts: 2026-05-18T22:16:30Z
last_ts: 2026-05-18T22:22:30Z

rss_kb_min: 578956 at 22:16:59
rss_kb_max: 1029036 at 22:17:12
rss_kb_last: 997392 at 22:22:30

pss_kb_min: 524332 at 22:16:59
pss_kb_max: 976556 at 22:17:12
pss_kb_last: 942823 at 22:22:30

rss_anon_kb_min: 518232 at 22:16:59
rss_anon_kb_max: 648528 at 22:17:12
rss_anon_kb_last: 616420 at 22:22:30

rss_file_kb_min: 60724 at 22:16:30
rss_file_kb_max: 380972 at 22:17:09
rss_file_kb_last: 380972 at 22:22:30

vmdata_kb_min: 610664 at 22:16:59
vmdata_kb_max: 842720 at 22:17:12
vmdata_kb_last: 809992 at 22:22:30

child_rss_kb_min/max/last: 46032
cgroup_current_bytes_min: 604041216 at 22:16:59
cgroup_current_bytes_max: 879431680 at 22:17:26
cgroup_current_bytes_last: 725692416 at 22:22:30
cgroup_peak_bytes_max/last: 944881664
```

Idle state after the sampler finished:

```text
2026-05-18T22:23:02Z
readyz: healthy
task pressure: 0 queued / 0 running
parent gateway RSS: 997424 kB
parent gateway RssAnon: 616452 kB
parent gateway RssFile: 380972 kB
child Codex app-server RSS: 46032 kB
threads: 12
swap: 0
```

Top retained mapping after the timeout:

```text
rss_kb=314504 pss_kb=314504 anon_kb=0 priv_clean_kb=314504 priv_dirty_kb=0 shared_clean_kb=0 path=/home/ubuntu/.node-llama-cpp/models/hf_ggml-org_embeddinggemma-300m-qat-Q8_0.gguf
```

Other large mappings included `[heap]` around 59948 kB, `/usr/bin/node` around 59008 kB, and anonymous blocks. The 314 MB file-backed GGUF mapping was the largest single retained mapping.

Interpretation from this capture: a timed-out Active Memory preflight appears to load or local-touch the node-llama-cpp Memory Search embedding model in the gateway parent process; that model mapping remains resident after the Active Memory timeout, after the Telegram reply, and while `/readyz` is healthy with task pressure idle. This does not by itself prove the final fix, but it narrows the retained-RSS evidence from generic gateway RSS growth to a specific retained file-backed model mapping plus a smaller anonymous-memory increase.

### Evidence: original full-context observation

```text
Before clean restart on 2026.5.18:
RSS: ~1.4-1.6 GB
Memory diagnostic fired: rssBytes=1651253248 heapUsedBytes=498389504 thresholdBytes=1610612736

After clean restart:
~446 MB RSS shortly after ready
~509 MB RSS after ~90s
~570 MB RSS after ~6m45s
~566 MB RSS after ~9m27s

After one Telegram weather ask plus a follow-up log-check turn:
~1,001,404 kB RSS (~978 MiB)
readyz healthy
0 queued / 0 running
gateway process threads: 12
no child processes observed
swap: 0
```

Full-context timing:

```text
20:25:15.970 inbound Telegram message received
20:25:21.200 embedded agent started (~5.2s after inbound)
20:25:23.381 Active Memory started
20:25:40.285 Active Memory finished: 16.9s, no relevant memory
20:25:44.319 Codex task started
20:26:26.677 wttr.in curl finished in ~80ms
20:26:43.561 final answer generated
20:26:47.728 Telegram sendMessage ok
Total inbound-to-Telegram-send: ~91.8s
```

### Evidence: `recent` mode still reproduced

After switching from `full/contextual/30000ms` to `recent/balanced/15000ms` while keeping group/channel allowed:

```text
Clean post-restart baseline:
~447 MB RSS shortly after ready
~495 MB RSS after ~90s
readyz healthy
0 queued / 0 running
```

Then one Telegram weather ask plus one log-check follow-up:

```text
20:39:18 inbound Telegram weather message
20:39:26 active-memory start timeoutMs=15000 queryChars=58 searchQueryChars=58
20:39:47 active-memory done status=ok elapsedMs=21269 summaryChars=131
20:40:55 Telegram sendMessage ok

20:41:00 inbound Telegram log-check follow-up
20:41:03 active-memory start timeoutMs=15000 queryChars=593 searchQueryChars=288
20:41:21 active-memory done status=ok elapsedMs=18058 summaryChars=181
```

RSS after those turns:

```text
PID     ELAPSED RSS     VSZ      %MEM %CPU CMD
1249003 04:24   1080144 44832448 4.3  37.2 /usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789
VmRSS:  1080144 kB
RssAnon: 699168 kB
RssFile: 380976 kB
readyz healthy
0 queued / 0 running
```

### Evidence: `message` mode with 5s timeout still reproduced

After tuning to `queryMode: "message"`, `timeoutMs: 5000`, `setupGraceTimeoutMs: 5000`, group/channel still allowed:

```text
2026-05-18T21:01:19.922Z inbound Telegram group/topic message, 19 chars
2026-05-18T21:01:21.727Z main embedded agent started
2026-05-18T21:01:22.577Z active-memory start timeoutMs=5000 queryChars=19
2026-05-18T21:01:32.577Z hook failed: timed out after 10000ms
2026-05-18T21:01:33.941Z active-memory done status=timeout elapsedMs=10016 summaryChars=0
```

RSS moved from roughly 590-607 MB before this turn to a peak around 1.0-1.07 GB during/after the Active Memory timeout.

Then `/active-memory off` was sent in the same Telegram topic:

```text
2026-05-18T21:01:41.064Z inbound Telegram group/topic message, 18 chars
2026-05-18T21:01:42.244Z outbound send ok
```

The follow-up weather-style request in the same topic did not show Active Memory hook/log lines:

```text
2026-05-18T21:01:58.576Z inbound Telegram group/topic message, 58 chars
2026-05-18T21:02:00.205Z main embedded agent started
no active-memory start/done lines for this request
2026-05-18T21:02:33.492Z Telegram sendMessage ok
```

Inbound-to-send was about 34.9s with Active Memory disabled for the topic, versus about 77.7s in the prior `message` / 5s Active Memory test and about 91.8s before tuning.

### Evidence: retained high RSS until restart

At `2026-05-18T21:05:39Z` after the Active Memory timeout tests:

```text
gateway parent RSS: 1,020,480 kB (~996 MiB)
process tree RSS: 1,179,660 kB (~1.13 GiB)
children:
  gateway parent: 1,020,480 kB
  codex app-server node child: 46,028 kB
  codex native app-server child: 113,152 kB
OpenClaw task pressure: 0 queued · 0 running
readyz healthy
```

After restarting an idle gateway:

```text
immediately after restart: parent RSS 697,624 kB, service peak 667.1M
~90s after restart: parent RSS 483,656 kB, systemd service memory 428.3M, readyz healthy
```

So the high value was not the normal clean-start baseline on this host. It was retained runtime state after the Telegram/Active Memory tests, and a clean restart brought it back to the 430-480 MB range.

### Current-code notes from previous ClawSweeper review

A previous ClawSweeper review on #83773 noted these source-level facts against current main at the time:

- Active Memory computes the embedded recall run timeout/watchdog as `config.timeoutMs + config.setupGraceTimeoutMs`, matching the observed `5000ms + 5000ms` path surfacing as a `10000ms` timeout.
- Hook timeout does not cancel underlying plugin work by itself; timed-out modifying hooks are logged and skipped while the plugin's underlying work is not automatically cancelled, so cleanup must come from Active Memory and embedded-run abort handling.
- The prompt-build hook is fail-open; replies continue while latency and RSS are the problem.
- Comparing v2026.5.18 to current main showed no Active Memory behavior change that would obviously resolve retained RSS.
- The adjacent latency PR #73667 was draft/conflicting and did not prove this retained-RSS failure mode.

### Impact and severity

Affected: live gateways using Telegram group topics plus Active Memory on persistent conversations, especially where Active Memory is allowed for group/channel sessions and Memory Search uses the local embedding backend.

Severity: Medium. The gateway remained healthy on this VPS because the host has enough RAM, but RSS crossed OpenClaw's own diagnostic threshold before restart and can grow back quickly after user-visible turns.

Frequency: Observed repeatedly as high peaks across multiple recent versions on this VPS. The most recent controlled run reproduced the RSS jump and retained local embedding model mapping with a single Telegram message after `/active-memory on`.

Consequence: higher steady-state memory footprint, possible memory pressure on smaller hosts, and slow Telegram replies because Active Memory is a blocking pre-reply step.

### Related / not duplicate notes

- Supersedes #83773 and #83752.
- Related open memory issue mentioned by ClawSweeper: #69451, but this report has a narrower Telegram + Active Memory + local Memory Search embedding trigger and should not be closed as a duplicate of session-file memory growth without further proof.
- Adjacent open PR found during contributor duplicate scan: #73667 (`Bound active-memory recall latency and jitter QMD startup`). It was draft/conflicting and ClawSweeper flagged a timeout regression/no real behavior proof, so it should not currently be treated as the canonical fix for this report.

### What would help validate a fix

A good fix/proof should ideally capture before/after values for:

- RSS, PSS, RssAnon, RssFile, heapUsed, external, arrayBuffers, active handles, child RSS, and task pressure before and after idle;
- Active Memory enabled vs disabled in the same Telegram topic;
- `queryMode: message`, `recent`, and ideally `full` if safe;
- whether configured `timeoutMs` vs `setupGraceTimeoutMs` behavior is intentional or accidentally doubling the user-visible timeout;
- whether timed-out Active Memory recall work is actually cancelled or merely skipped by the hook layer;
- whether local Memory Search embedding resources are intentionally cached in the gateway parent and, if so, whether there is a configurable/bounded unload or cache policy;
- whether the retained `embeddinggemma-300m-qat-Q8_0.gguf` mapping returns near the idle baseline after completed/timed-out Active Memory recall runs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Active Memory Telegram preflight retains local embedding model mapping after timeout #83792

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model and routing

Enabled plugins and local state size

Active Memory configurations tested

Evidence: controlled profiling run with local embedding model mapping

Evidence: original full-context observation

Evidence: `recent` mode still reproduced

Evidence: `message` mode with 5s timeout still reproduced

Evidence: retained high RSS until restart

Current-code notes from previous ClawSweeper review

Impact and severity

Related / not duplicate notes

What would help validate a fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Active Memory Telegram preflight retains local embedding model mapping after timeout #83792

Description

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model and routing

Enabled plugins and local state size

Active Memory configurations tested

Evidence: controlled profiling run with local embedding model mapping

Evidence: original full-context observation

Evidence: recent mode still reproduced

Evidence: message mode with 5s timeout still reproduced

Evidence: retained high RSS until restart

Current-code notes from previous ClawSweeper review

Impact and severity

Related / not duplicate notes

What would help validate a fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Evidence: `recent` mode still reproduced

Evidence: `message` mode with 5s timeout still reproduced