Bound active-memory recall latency and jitter QMD startup by 0xCNAI · Pull Request #73667 · openclaw/openclaw

0xCNAI · 2026-04-28T16:19:54Z

Summary

This patch applies the active-memory reliability approach proposed by @kinthaiofficial in #72015:

reduce Active Memory's default recall deadline from 15s to 3s
keep the before_prompt_build hook deadline-bounded so memory recall degrades instead of blocking normal replies
remove the default 30s setup grace from the critical path
stagger QMD startup initialization with a small jitter so multi-agent boot does not stampede memory backends
update Active Memory docs to recommend low-latency recall budgets

The existing QMD search manager cache is already keyed per agent, so this keeps the change small and focuses on the blocking failure mode plus startup herd behavior.

Tests

pnpm exec vitest run extensions/active-memory/index.test.ts src/gateway/server-startup-memory.test.ts
pnpm tsgo:extensions:test
pnpm tsgo:core:test

Fixes #72015

greptile-apps · 2026-04-28T16:22:18Z

Greptile Summary

This PR reduces the Active Memory recall deadline from 15 s to 3 s, removes the 30 s setup grace from the critical path, and adds per-agent jitter to stagger QMD startup. The jitter and timeout-reduction changes are sound, but the new beforePromptBuildTimeoutMs formula hard-codes DEFAULT_TIMEOUT_MS (3 s) instead of the maximum allowed recall timeout (120 s).

Any agent configured with config.timeoutMs > ~3000 — including the "full" query mode that the docs in this same PR recommend at "5,000 ms or higher" — will have its before_prompt_build hook cancelled at 4 s, silently dropping memory recall, before the internal watchdog fires.
The test was updated to assert the new 4,000 ms value but does not cover custom timeoutMs values above the default, masking the regression.

Confidence Score: 3/5

Not safe to merge as-is — the hook envelope silently drops recall for any agent using a custom timeout above 3 s.

A single P1 defect in index.ts means the hook timeout regression will affect any user who follows the docs guidance for full query mode (5,000 ms+), making recall silently fail. The latency reduction and jitter logic are otherwise correct, but the P1 pulls the score below the 4/5 ceiling.

extensions/active-memory/index.ts line 2434 — beforePromptBuildTimeoutMs computation needs to account for the maximum configured timeoutMs.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: extensions/active-memory/index.ts
Line: 2434

Comment:
**Hook timeout uses `DEFAULT_TIMEOUT_MS` instead of the per-agent configured value**

`beforePromptBuildTimeoutMs` is now `DEFAULT_TIMEOUT_MS + setupGraceTimeoutMs + 1_000 = 4_000 ms`. The internal recall watchdog fires at `config.timeoutMs + setupGraceTimeoutMs` (line 2193), which uses the _per-agent_ `config.timeoutMs`, clamped up to 120,000 ms. Any agent configured with `timeoutMs > 3_000` (e.g., the docs update in this same PR recommends `5_000` or higher for `full` query mode) will have its hook cancelled at 4 s before the recall's own watchdog can fire, causing silent recall loss. The previous formula `120_000 + setupGraceTimeoutMs` was sized to always outlast the maximum allowed recall timeout.

To preserve the invariant, the hook envelope should wrap around the maximum possible recall duration, not the default:
```
// keep the hook envelope at the cap so per-agent timeoutMs up to 120 s still fits
const beforePromptBuildTimeoutMs = 120_000 + setupGraceTimeoutMs + 1_000;
```
If the goal is to reduce the _default_ latency without breaking custom high-timeout configurations, consider computing this per-session from the resolved `config.timeoutMs`, or at least use `Math.max(resolvedAgentTimeoutMs, DEFAULT_TIMEOUT_MS)` before adding the buffer.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gateway/server-startup-memory.ts
Line: 51

Comment:
**`<= 0` guard is misleading for an array index**

`params.index` is the `.map()` callback index, which is always `>= 0`, so the `<= 0` condition is functionally identical to `=== 0` but implies negative indices are possible. Using strict equality makes the intent clearer.

```suggestion
  if (params.index === 0 || params.total <= 1) {
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(active-memory): bound recall latency..." | Re-trigger Greptile}

greptile-apps · 2026-04-28T16:22:22Z

    });

-    const beforePromptBuildTimeoutMs = 120_000 + setupGraceTimeoutMs;
+    const beforePromptBuildTimeoutMs = DEFAULT_TIMEOUT_MS + setupGraceTimeoutMs + 1_000;


Hook timeout uses DEFAULT_TIMEOUT_MS instead of the per-agent configured value

beforePromptBuildTimeoutMs is now DEFAULT_TIMEOUT_MS + setupGraceTimeoutMs + 1_000 = 4_000 ms. The internal recall watchdog fires at config.timeoutMs + setupGraceTimeoutMs (line 2193), which uses the per-agent config.timeoutMs, clamped up to 120,000 ms. Any agent configured with timeoutMs > 3_000 (e.g., the docs update in this same PR recommends 5_000 or higher for full query mode) will have its hook cancelled at 4 s before the recall's own watchdog can fire, causing silent recall loss. The previous formula 120_000 + setupGraceTimeoutMs was sized to always outlast the maximum allowed recall timeout.

To preserve the invariant, the hook envelope should wrap around the maximum possible recall duration, not the default:

// keep the hook envelope at the cap so per-agent timeoutMs up to 120 s still fits const beforePromptBuildTimeoutMs = 120_000 + setupGraceTimeoutMs + 1_000;

If the goal is to reduce the default latency without breaking custom high-timeout configurations, consider computing this per-session from the resolved config.timeoutMs, or at least use Math.max(resolvedAgentTimeoutMs, DEFAULT_TIMEOUT_MS) before adding the buffer.

Prompt To Fix With AI

This is a comment left during a code review. Path: extensions/active-memory/index.ts Line: 2434 Comment: **Hook timeout uses `DEFAULT_TIMEOUT_MS` instead of the per-agent configured value** `beforePromptBuildTimeoutMs` is now `DEFAULT_TIMEOUT_MS + setupGraceTimeoutMs + 1_000 = 4_000 ms`. The internal recall watchdog fires at `config.timeoutMs + setupGraceTimeoutMs` (line 2193), which uses the _per-agent_ `config.timeoutMs`, clamped up to 120,000 ms. Any agent configured with `timeoutMs > 3_000` (e.g., the docs update in this same PR recommends `5_000` or higher for `full` query mode) will have its hook cancelled at 4 s before the recall's own watchdog can fire, causing silent recall loss. The previous formula `120_000 + setupGraceTimeoutMs` was sized to always outlast the maximum allowed recall timeout. To preserve the invariant, the hook envelope should wrap around the maximum possible recall duration, not the default: ``` // keep the hook envelope at the cap so per-agent timeoutMs up to 120 s still fits const beforePromptBuildTimeoutMs = 120_000 + setupGraceTimeoutMs + 1_000; ``` If the goal is to reduce the _default_ latency without breaking custom high-timeout configurations, consider computing this per-session from the resolved `config.timeoutMs`, or at least use `Math.max(resolvedAgentTimeoutMs, DEFAULT_TIMEOUT_MS)` before adding the buffer. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-28T16:22:24Z

+}
+
+function resolveStartupJitterMs(params: { index: number; total: number }): number {
+  if (params.index <= 0 || params.total <= 1) {


<= 0 guard is misleading for an array index

params.index is the .map() callback index, which is always >= 0, so the <= 0 condition is functionally identical to === 0 but implies negative indices are possible. Using strict equality makes the intent clearer.

Suggested change

if (params.index <= 0 || params.total <= 1) {

if (params.index === 0 || params.total <= 1) {

Prompt To Fix With AI

This is a comment left during a code review. Path: src/gateway/server-startup-memory.ts Line: 51 Comment: **`<= 0` guard is misleading for an array index** `params.index` is the `.map()` callback index, which is always `>= 0`, so the `<= 0` condition is functionally identical to `=== 0` but implies negative indices are possible. Using strict equality makes the intent clearer. ```suggestion if (params.index === 0 || params.total <= 1) { ``` How can I resolve this? If you propose a fix, please make it concise.

clawsweeper · 2026-04-28T16:23:38Z

Codex review: needs real behavior proof before merge.

Summary
The PR lowers Active Memory timeout/setup-grace defaults, resizes the before_prompt_build hook timeout, adds jittered parallel QMD startup initialization, and updates Active Memory docs/tests.

Reproducibility: yes. for the PR regression by source inspection: configure Active Memory on the branch with timeoutMs above 3000 ms and the fixed 4000 ms hook timeout can fire before the configured recall budget. I did not run a live gateway canary in this read-only review.

Real behavior proof
Needs real behavior proof before merge: The PR body lists tests/typechecks only; the contributor should add redacted terminal output, logs, live output, a linked artifact, or a recording from a real Active Memory/QMD setup before merge, then update the PR body to trigger re-review or ask for @clawsweeper re-review.

Next step before merge
Manual review is needed because the external PR is draft/conflicting, lacks real behavior proof, includes product/default timing policy, and has a concrete timeout regression.

Security
Cleared: The diff only changes Active Memory/Gateway timing behavior, docs, and tests; it adds no dependencies, workflows, scripts, permissions, downloads, package resolution, publishing, or secret handling.

Review findings

[P1] Size the hook timeout from the configured recall budget — extensions/active-memory/index.ts:2434

Review details

Best possible solution:

Rework or replace the branch so hook budgets preserve configured recall timeouts, build on current-main setup-grace and QMD lazy-start semantics, and leave broader nonblocking/default-policy work on the canonical reliability tracker.

Do we have a high-confidence way to reproduce the issue?

Yes, for the PR regression by source inspection: configure Active Memory on the branch with timeoutMs above 3000 ms and the fixed 4000 ms hook timeout can fire before the configured recall budget. I did not run a live gateway canary in this read-only review.

Is this the best way to solve the issue?

No, not as proposed. The reliability direction is plausible, but the hook envelope must be sized from the resolved configured timeout or cap, and the QMD/default-latency policy should be reconciled with current main before merge.

Full review comments:

[P1] Size the hook timeout from the configured recall budget — extensions/active-memory/index.ts:2434
This branch registers the Active Memory before_prompt_build hook with a default-derived 4s timeout while config.timeoutMs still accepts larger values and the docs recommend budgets above the default. The hook runner enforces the registration timeout before the plugin watchdog can return, so custom high-timeout recall can be silently dropped; derive the envelope from the resolved configured timeout or cap and cover a custom timeout above the default.
Confidence: 0.93

Overall correctness: patch is incorrect
Overall confidence: 0.91

What I checked:

PR diff timeout regression: The PR changes beforePromptBuildTimeoutMs to DEFAULT_TIMEOUT_MS + setupGraceTimeoutMs + 1_000, which becomes 4s after lowering the default to 3s while per-agent timeoutMs remains configurable above that. (extensions/active-memory/index.ts:2434, 0071f6ca13f7)
Current main preserves configured hook budget: Current main registers before_prompt_build with config.timeoutMs + config.setupGraceTimeoutMs, so the hook envelope tracks the resolved configured recall budget rather than a fixed default-derived value. (extensions/active-memory/index.ts:3026, 48b4e5b3614f)
Hook runner enforces registration timeout: Modifying hooks use withHookTimeout around the handler promise, so an undersized before_prompt_build registration timeout can skip the Active Memory result before the plugin's own watchdog resolves. (src/plugins/hooks.ts:590, 48b4e5b3614f)
Current tests cover high configured budgets: Current main asserts a 90,000 ms Active Memory timeout registers the hook with 90,000 ms and that setupGraceTimeoutMs is added when configured. (extensions/active-memory/index.test.ts:337, 48b4e5b3614f)
Current QMD startup behavior differs materially: Current main gates QMD boot sync behind startup opt-in, defers non-explicit agents in multi-agent configs, and initializes/syncs managers sequentially with purpose "cli" rather than the PR's Promise.all jitter path. (src/gateway/server-startup-memory.ts:56, 48b4e5b3614f)
Live PR state and proof gap: Live GitHub metadata shows the PR is open, draft, mergeable=CONFLICTING, references the still-open reliability issue, and the body lists tests/typechecks but no real Active Memory/QMD runtime proof. (0071f6ca13f7)

Likely related people:

Takhoffman: Recent merged history includes Active Memory custom recall tools and bundled recall-tool work on the same plugin/docs/test surface. (role: Active Memory recall contributor; confidence: high; commits: 2f2602508511, f256eeba431b; files: extensions/active-memory/index.ts, extensions/active-memory/index.test.ts, docs/concepts/active-memory.md)
steipete: Recent commits touch Active Memory timeout/setup-grace behavior, tests, and QMD gateway startup lazy/boot-sync behavior central to this PR. (role: recent Active Memory and QMD startup contributor; confidence: high; commits: 59449d7f19cc, 2b811fe6d9e3, 30e2654ac2c8; files: extensions/active-memory/index.ts, extensions/active-memory/index.test.ts, src/gateway/server-startup-memory.ts)
vincentkoc: History links this account to deferring implicit QMD startup and Active Memory docs/runtime-config work near the same reliability surface. (role: adjacent timeout and QMD startup contributor; confidence: medium; commits: 732a5842ee99, c357235fe6e6; files: src/gateway/server-startup-memory.ts, docs/concepts/active-memory.md)
vignesh07: Earlier QMD backend hardening overlaps the boot-storm side of the PR's startup behavior. (role: QMD backend and startup contributor; confidence: medium; commits: a305dfe62601; files: src/gateway/server-startup-memory.ts)

Remaining risk / open question:

The branch is draft and merge-conflicting, so conflict repair may materially change the reviewed diff.
The PR body has tests/typechecks only and no real Active Memory/QMD runtime proof.
The default timeout reduction and broad QMD startup scheduling behavior require maintainer product judgment beyond a mechanical bug fix.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 48b4e5b3614f.

openclaw-barnacle · 2026-05-29T04:52:36Z

This assigned pull request has been automatically marked as stale after being open for 27 days.
Please add updates or it will be closed.

barnacle-openclaw · 2026-05-30T05:07:06Z

This assigned pull request has been automatically marked as stale after being open for 27 days.
Please add updates or it will be closed.

openclaw-barnacle · 2026-06-07T05:07:23Z

Closing due to inactivity.
If you believe this PR should be revived, post in #clawtributors on Discord to talk to a maintainer.
That channel is the escape hatch for high-quality PRs that get auto-closed.

fix(active-memory): bound recall latency and jitter qmd startup

0071f6c

openclaw-barnacle Bot added docs Improvements or additions to documentation gateway Gateway runtime size: XS labels Apr 28, 2026

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

0xCNAI marked this pull request as draft April 28, 2026 16:24

M-Lietz mentioned this pull request Apr 28, 2026

[Bug] Active Memory plugin times out on every run (15s timeout, 0 chars output) - v2026.4.26 #73306

Closed

This was referenced May 7, 2026

active-memory recall subagent can deadlock on the main lane inside before_prompt_build #79026

Closed

[Performance] active-memory plugin can block chat replies up to ~5s #69182

Closed

clawsweeper Bot mentioned this pull request May 15, 2026

Active Memory: add single-shot mode (no embedded agent loop) for low-latency preflight injection #72359

Open

This was referenced May 18, 2026

[Bug]: Active Memory Telegram preflight exceeds timeout and retains gateway RSS after idle #83773

Closed

[Bug]: Active Memory Telegram preflight retains local embedding model mapping after timeout #83792

Closed

osolmaz self-assigned this May 28, 2026

openclaw-barnacle Bot added stale Marked as stale due to inactivity triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 29, 2026

barnacle-openclaw Bot removed the stale Marked as stale due to inactivity label May 30, 2026

barnacle-openclaw Bot added the stale Marked as stale due to inactivity label May 30, 2026

openclaw-barnacle Bot closed this Jun 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bound active-memory recall latency and jitter QMD startup#73667

Bound active-memory recall latency and jitter QMD startup#73667
0xCNAI wants to merge 1 commit into
openclaw:mainfrom
0xCNAI:fix/active-memory-deadline-jitter

0xCNAI commented Apr 28, 2026

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

clawsweeper Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

openclaw-barnacle Bot commented May 29, 2026

Uh oh!

barnacle-openclaw Bot commented May 30, 2026

Uh oh!

openclaw-barnacle Bot commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if (params.index <= 0 \|\| params.total <= 1) {
	if (params.index === 0 \|\| params.total <= 1) {

Uh oh!

Conversation

0xCNAI commented Apr 28, 2026

Summary

Tests

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

clawsweeper Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openclaw-barnacle Bot commented May 29, 2026

Uh oh!

barnacle-openclaw Bot commented May 30, 2026

Uh oh!

openclaw-barnacle Bot commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented Apr 28, 2026 •

edited

Loading