fix(active-memory): fast-fail stalled recall paths by codexGW · Pull Request #75761 · openclaw/openclaw

codexGW · 2026-05-01T17:30:49Z

Summary

Fast-fail active-memory recall when the embedded subagent reports a clean memory_search result with results: [] and debug.hits === 0.
Fast-fail unavailable Memory Search output without treating unrelated memory_get misses as terminal recall failures.
Enforce the configured active-memory recall budget as the user-visible hard timeout, then abort and swallow late embedded subagent rejections.
Treat embedded harness timeout boilerplate as timeout/no recall instead of injecting it as useful memory context.
Add focused regressions for zero-hit Memory Search, unavailable Memory Search, non-terminal Memory Get misses, timeout boilerplate, and hard timeout budget behavior.

Root Cause

Active-memory recall waited for the embedded subagent timeout even after Memory Search had already proven there were no hits or was unavailable. Separately, the plugin waited for the setup-grace watchdog before resolving timeout and could normalize the embedded harness timeout message as an ok memory summary.

Validation

pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/active-memory/index.ts extensions/active-memory/index.test.ts
pnpm exec oxlint extensions/active-memory/index.ts extensions/active-memory/index.test.ts
pnpm test extensions/active-memory/index.test.ts
pnpm changed:lanes --json
pnpm check:changed
Live Samantha Discord thread proof after rebuild/restart: active-memory logged status=timeout, elapsedMs=5040, summaryChars=0 for timeoutMs=5000; no <active_memory_plugin> context or timeout boilerplate was injected into the main prompt.

Notes

Testbox was not run because the Blacksmith CLI was not authenticated on this machine; previous local auth checks reported no authenticated organization.
QMD semantic search quality is intentionally left for a separate follow-up; the live proof still showed QMD returning zero hits while direct memory_get found useful file memory.

clawsweeper · 2026-05-01T17:32:12Z

Codex review: needs changes before merge.

Summary
The PR updates Active Memory to observe formatted Memory Search tool output, resolve empty or unavailable recall early, classify timeout boilerplate as no recall, change recall timeout budgeting, add regressions, and add a changelog entry.

Reproducibility: yes. On current main, Active Memory does not observe formatted tool output and its watchdog uses timeoutMs + setupGraceTimeoutMs; the PR's mocked regressions cover the zero-hit, unavailable, memory_get, boilerplate, and hard-timeout paths.

Next step before merge
Automerge is enabled and the remaining actionable blockers are narrow: resolve the dirty head, update docs for the setup-grace semantics, and get exact-head checks green.

Security
Cleared: The diff only touches Active Memory plugin code, focused tests, and changelog text; it does not add dependencies, workflows, package-resolution changes, credential handling, or release scripts.

Review findings

[P2] Document the new setup grace semantics — extensions/active-memory/index.ts:2492

Review details

Best possible solution:

Rebase this PR onto current main, keep the plugin-local fast-fail and timeout-boilerplate handling, and align the setup-grace contract in code, tests, docs, and changelog before exact-head checks gate merge.

Do we have a high-confidence way to reproduce the issue?

Yes. On current main, Active Memory does not observe formatted tool output and its watchdog uses timeoutMs + setupGraceTimeoutMs; the PR's mocked regressions cover the zero-hit, unavailable, memory_get, boilerplate, and hard-timeout paths.

Is this the best way to solve the issue?

Partly. The callback-based fast-fail is the narrowest maintainable implementation, but the hard-timeout/setup-grace semantics need docs alignment and conflict repair before this is the best mergeable form.

Full review comments:

[P2] Document the new setup grace semantics — extensions/active-memory/index.ts:2492
This changes the Active Memory watchdog to fire at timeoutMs even when setupGraceTimeoutMs is configured, but current public docs still describe setup grace as extra setup budget before recall timeout expiry. Operators tuning this config would get the wrong contract after merge, so update docs/concepts/active-memory.md with the new lifecycle and budget semantics in this PR.
Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.84

Acceptance criteria:

pnpm exec oxfmt --check --threads=1 CHANGELOG.md docs/concepts/active-memory.md extensions/active-memory/index.ts extensions/active-memory/index.test.ts
pnpm exec oxlint extensions/active-memory/index.ts extensions/active-memory/index.test.ts
pnpm test extensions/active-memory/index.test.ts
pnpm changed:lanes --json
pnpm check:changed in Testbox or exact-head CI after conflict repair

What I checked:

current-main-lifecycle-docs: The Active Memory docs define the feature as a plugin-owned blocking memory sub-agent that runs before the main reply and injects hidden context only when useful recall is produced. Public docs: docs/concepts/active-memory.md. (docs/concepts/active-memory.md:1, ac607044f1ff)
current-main-does-not-observe-tool-output: Current main calls runEmbeddedPiAgent for Active Memory with memory tools and an abort signal, but without shouldEmitToolOutput or onToolResult, so a zero-hit Memory Search result cannot currently terminate recall early. (extensions/active-memory/index.ts:2126, ac607044f1ff)
existing-runner-callback-seam: The embedded-runner params already expose shouldEmitToolOutput and onToolResult, so the PR's callback approach stays within the existing runtime seam. (src/agents/pi-embedded-runner/run/params.ts:139, ac607044f1ff)
current-main-setup-grace-contract: Current main extends the Active Memory watchdog by setupGraceTimeoutMs, and the public docs describe that key as extra setup budget before recall timeout expiry. (extensions/active-memory/index.ts:2286, ac607044f1ff)
pr-fast-fail-implementation: The PR adds Memory Search output parsing, passes shouldEmitToolOutput and onToolResult into the embedded recall run, resolves terminal empty/unavailable results, and aborts the embedded run after terminal Memory Search output. (extensions/active-memory/index.ts:2262, bf5f4046d678)
pr-timeout-semantics-doc-gap: The PR changes the watchdog to fire at params.config.timeoutMs instead of timeoutMs + setupGraceTimeoutMs, but it does not update the Active Memory documentation for the new setup-grace semantics. (extensions/active-memory/index.ts:2492, bf5f4046d678)

Likely related people:

steipete: Recent Active Memory commits include making setup grace explicit and stabilizing the timeout test surface now touched by this PR. (role: recent maintainer / setup-grace owner; confidence: high; commits: 59449d7f19cc, d85d782a0a27, d068cb960de4; files: extensions/active-memory/index.ts, extensions/active-memory/index.test.ts, docs/concepts/active-memory.md)
Takhoffman: Prior merged work added the bundled recall-tool path and QMD debug surface that this PR's Memory Search fast-fail behavior builds on; Tak also asked the lifecycle question in this PR. (role: active-memory recall feature owner / maintainer requester; confidence: high; commits: f256eeba431b, 885209ed0330; files: extensions/active-memory/index.ts, docs/concepts/active-memory.md, extensions/memory-core/src/tools.ts)
vincentkoc: Recent history includes Active Memory docs work and adjacent hook/startup fallback behavior, which overlaps the setup-grace documentation and lifecycle explanation needed here. (role: docs and adjacent timeout maintainer; confidence: medium; commits: c357235fe6e6, d686e6f876b4, d1f91b52fa67; files: docs/concepts/active-memory.md, extensions/active-memory/index.ts)
luyao618: History links the current hard-deadline Promise.race behavior to a prior merged Active Memory timeout fix, which is the same watchdog surface changed by this PR. (role: active-memory hard-timeout introducer; confidence: medium; commits: fbefbf05bd74; files: extensions/active-memory/index.ts, extensions/active-memory/index.test.ts)

Remaining risk / open question:

The branch is currently dirty against main, so conflict resolution may materially affect the reviewed diff.
Exact-head checks are failing in four aggregate lanes and need repair or rerun proof before automerge.
The setup-grace behavior now conflicts with current public docs unless the docs are updated or the timeout change is revised.

Codex review notes: model gpt-5.5, reasoning high; reviewed against ac607044f1ff.

codexGW · 2026-05-02T00:29:02Z

Combined live verification with #75860 also applied locally:

Runtime: local combined branch codex/qmd-active-memory-combined-test at d73a9da (#75860 head 4b123260 + this PR head e850a6b).
Fresh Discord thread test in #samantha at 2026-05-01 17:22 PDT.
Active-memory log for the run: start timeoutMs=5000 then done status=timeout elapsedMs=5041 summaryChars=0.
The assistant-visible runtime context did not include an <active_memory_plugin> block and did not inject the timeout boilerplate as memory context.
The literal phrase Request timed out before a response was generated appeared only in Gary's test prompt asking Samantha to check for it, not as plugin output.

This confirms the important behavior for this PR under the combined runtime: timed-out active-memory recall is classified as timeout/empty and does not leak harness timeout prose into the main prompt.

Takhoffman · 2026-05-02T15:17:42Z

@clawsweeper explain to me how this change would fit into the overall life cycle of active memory and how this pr if merged would change that life cycle

clawsweeper · 2026-05-02T15:18:31Z

🦞🦞
ClawSweeper is taking a look at your question.

I asked ClawSweeper to answer this maintainer mention in the next review comment. Tiny claws, bounded scope: this is a read-only assist pass unless it produces one of the existing structured safe-action markers.

Request: explain to me how this change would fit into the overall life cycle of active memory and how this pr if merged would change that life cycle

Takhoffman · 2026-05-02T15:41:59Z

@clawsweeper status

clawsweeper · 2026-05-02T15:42:43Z

🦞🦞
ClawSweeper status:

Current PR: 75761
Branch: fix/active-memory-empty-search-fast-fail
ClawSweeper PR: no
Automerge: not-enabled
Automerge job: jobs/openclaw/inbox/automerge-openclaw-openclaw-75761.md
Merge state: DIRTY
Checks: SUCCESS:67 SKIPPED:8 FAILURE:4
Check blockers: check-lint:FAILURE, check-additional-extension-bundled:FAILURE, check:FAILURE, check-additional:FAILURE

codexGW · 2026-05-02T16:44:55Z

@Takhoffman This PR does not change when Active Memory runs: it still starts during before_prompt_build, asks the embedded recall subagent to search memory, and injects <active_memory_plugin> only when the subagent produces useful recall.

The lifecycle change is in terminal cases. Before this PR, if the embedded subagent ran memory_search and got a clean zero-hit result, Active Memory could still wait for the subagent to finish or for the outer timeout. If the embedded runner timed out, its timeout boilerplate could also be normalized as if it were useful recall.

With this PR, Active Memory observes the memory_search tool result through the existing embedded-runner callback. A clean zero-hit result (results: [], debug.hits === 0, and no error/disabled/unavailable state) resolves recall immediately as empty and aborts the embedded run. A plugin-owned timeoutMs budget also resolves as timeout/no-recall without injecting timeout text. Setup grace remains cleanup/watchdog grace, not extra prompt-visible recall time.

So useful recall is unchanged; the change is that empty and timed-out recall become terminal earlier and do not add misleading <active_memory_plugin> context.

Takhoffman · 2026-05-02T17:08:13Z

@clawsweeper automerge

clawsweeper · 2026-05-02T17:09:25Z

🦞🦞
ClawSweeper automerge is enabled.

Head: bf5f4046d678
Label: clawsweeper:automerge
Action: repair worker queued. Run: https://api.github.com/repos/openclaw/clawsweeper/actions/runs/25257253992
Flow: review this head, repair/rebase only if needed, then re-review the exact repaired head before merge.

Draft PRs stay fix-only until GitHub marks them ready for review. Pause with /clawsweeper stop.

Automerge progress:

2026-05-02 17:08:13 UTC review queued [`bf5f4046d678`](https://github.com/openclaw/openclaw/commit/bf5f4046d678c6e8009f6c06c38d822b8e48cb95) (queued)
2026-05-02 17:09:24 UTC repair queued [`bf5f4046d678`](https://github.com/openclaw/openclaw/commit/bf5f4046d678c6e8009f6c06c38d822b8e48cb95) (autonomous) Run: https://github.com/openclaw/clawsweeper/actions/runs/25257253992
2026-05-02 17:38:19 UTC review requested repair [`bf5f4046d678`](https://github.com/openclaw/openclaw/commit/bf5f4046d678c6e8009f6c06c38d822b8e48cb95) (structured ClawSweeper marker: fix-required (finding=review-feedback sha=bf5f40...)
2026-05-02 17:36:04 UTC review queued [`bf5f4046d678`](https://github.com/openclaw/openclaw/commit/bf5f4046d678c6e8009f6c06c38d822b8e48cb95) (queued)

clawsweeper · 2026-05-02T17:25:56Z

🦞🦞
ClawSweeper is pausing this repair loop for human review.

Source: clawsweeper[bot]
Reason: structured ClawSweeper verdict: needs-human (sha=bf5f4046d678c6e8009f6c06c38d822b8e48cb95)

I added clawsweeper:human-review and left the final call with a maintainer.

Takhoffman · 2026-05-02T17:29:17Z

@clawsweeper why was automerge stopped

Takhoffman · 2026-05-02T17:36:04Z

@clawsweeper automerge

clawsweeper · 2026-05-02T17:38:35Z

ClawSweeper 🐠 reef update

Thanks for the work on this. ClawSweeper opened a replacement PR only because the source branch was not writable from the available bot permissions. branch tides, not contributor blame.

Replacement PR: #76183
This source PR is being closed only under the explicit source-close setting for this ClawSweeper run.
The original contribution stays credited in the replacement PR context.

fish notes: model gpt-5.5, reasoning high; reviewed against 3a55e61.

codexGW · 2026-05-02T18:15:13Z

Update on the source branch: the closed PR UI still shows the old head SHA, but codexGW:fix/active-memory-empty-search-fast-fail has been updated.

Current source branch tip: 634ca812a3
Compare against main: ahead by 3 / behind by 0
main...codexGW:fix/active-memory-empty-search-fast-fail

What changed since the stale PR head:

Rebased onto current main.
Clarified Active Memory docs: setupGraceTimeoutMs is hook setup/cleanup grace, not extra prompt-visible recall budget.
Fixed the fast-fail parser so fallback/supplement memory_search results are preserved even when debug.hits === 0.
Added a regression test for that fallback-results case.

Local validation passed:

pnpm test extensions/active-memory/index.test.ts -> 101 passed
pnpm check:changed -> passed

openclaw-barnacle Bot added the size: M label May 1, 2026

codexGW force-pushed the fix/active-memory-empty-search-fast-fail branch from 37e278f to e850a6b Compare May 1, 2026 21:51

openclaw-barnacle Bot added size: L and removed size: M labels May 1, 2026

codexGW changed the title ~~fix(active-memory): fast-fail zero-hit memory search~~ fix(active-memory): fast-fail stalled recall paths May 1, 2026

codexGW marked this pull request as ready for review May 1, 2026 22:01

codexGW mentioned this pull request May 2, 2026

fix(memory): improve QMD recall for channel queries #75860

Open

codexGW force-pushed the fix/active-memory-empty-search-fast-fail branch 6 times, most recently from 0db7c10 to bf010d7 Compare May 2, 2026 02:46

fix(active-memory): fast-fail stalled recall paths

bf5f404

codexGW force-pushed the fix/active-memory-empty-search-fast-fail branch from bf010d7 to bf5f404 Compare May 2, 2026 02:52

clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 2, 2026

clawsweeper Bot added the clawsweeper:human-review Needs maintainer review before ClawSweeper can continue label May 2, 2026

clawsweeper Bot removed the clawsweeper:human-review Needs maintainer review before ClawSweeper can continue label May 2, 2026

clawsweeper Bot mentioned this pull request May 2, 2026

fix(active-memory): fast-fail stalled recall paths #76183

Merged

clawsweeper Bot closed this May 2, 2026

Uh oh!

Conversation

codexGW commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Validation

Notes

Uh oh!

clawsweeper Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codexGW commented May 2, 2026

Uh oh!

Takhoffman commented May 2, 2026

Uh oh!

clawsweeper Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Takhoffman commented May 2, 2026

Uh oh!

clawsweeper Bot commented May 2, 2026

Uh oh!

codexGW commented May 2, 2026

Uh oh!

Takhoffman commented May 2, 2026

Uh oh!

clawsweeper Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 2, 2026

Uh oh!

Takhoffman commented May 2, 2026

Uh oh!

Takhoffman commented May 2, 2026

Uh oh!

clawsweeper Bot commented May 2, 2026

Uh oh!

codexGW commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codexGW commented May 1, 2026 •

edited

Loading

clawsweeper Bot commented May 1, 2026 •

edited

Loading

clawsweeper Bot commented May 2, 2026 •

edited

Loading

clawsweeper Bot commented May 2, 2026 •

edited

Loading