Skip to content

fix(active-memory): fast-fail stalled recall paths#75761

Closed
codexGW wants to merge 1 commit intoopenclaw:mainfrom
codexGW:fix/active-memory-empty-search-fast-fail
Closed

fix(active-memory): fast-fail stalled recall paths#75761
codexGW wants to merge 1 commit intoopenclaw:mainfrom
codexGW:fix/active-memory-empty-search-fast-fail

Conversation

@codexGW
Copy link
Copy Markdown
Contributor

@codexGW codexGW commented May 1, 2026

Summary

  • Fast-fail active-memory recall when the embedded subagent reports a clean memory_search result with results: [] and debug.hits === 0.
  • Fast-fail unavailable Memory Search output without treating unrelated memory_get misses as terminal recall failures.
  • Enforce the configured active-memory recall budget as the user-visible hard timeout, then abort and swallow late embedded subagent rejections.
  • Treat embedded harness timeout boilerplate as timeout/no recall instead of injecting it as useful memory context.
  • Add focused regressions for zero-hit Memory Search, unavailable Memory Search, non-terminal Memory Get misses, timeout boilerplate, and hard timeout budget behavior.

Root Cause

Active-memory recall waited for the embedded subagent timeout even after Memory Search had already proven there were no hits or was unavailable. Separately, the plugin waited for the setup-grace watchdog before resolving timeout and could normalize the embedded harness timeout message as an ok memory summary.

Validation

  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/active-memory/index.ts extensions/active-memory/index.test.ts
  • pnpm exec oxlint extensions/active-memory/index.ts extensions/active-memory/index.test.ts
  • pnpm test extensions/active-memory/index.test.ts
  • pnpm changed:lanes --json
  • pnpm check:changed
  • Live Samantha Discord thread proof after rebuild/restart: active-memory logged status=timeout, elapsedMs=5040, summaryChars=0 for timeoutMs=5000; no <active_memory_plugin> context or timeout boilerplate was injected into the main prompt.

Notes

  • Testbox was not run because the Blacksmith CLI was not authenticated on this machine; previous local auth checks reported no authenticated organization.
  • QMD semantic search quality is intentionally left for a separate follow-up; the live proof still showed QMD returning zero hits while direct memory_get found useful file memory.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 1, 2026

Codex review: needs changes before merge.

Summary
The PR updates Active Memory to observe formatted Memory Search tool output, resolve empty or unavailable recall early, classify timeout boilerplate as no recall, change recall timeout budgeting, add regressions, and add a changelog entry.

Reproducibility: yes. On current main, Active Memory does not observe formatted tool output and its watchdog uses timeoutMs + setupGraceTimeoutMs; the PR's mocked regressions cover the zero-hit, unavailable, memory_get, boilerplate, and hard-timeout paths.

Next step before merge
Automerge is enabled and the remaining actionable blockers are narrow: resolve the dirty head, update docs for the setup-grace semantics, and get exact-head checks green.

Security
Cleared: The diff only touches Active Memory plugin code, focused tests, and changelog text; it does not add dependencies, workflows, package-resolution changes, credential handling, or release scripts.

Review findings

  • [P2] Document the new setup grace semantics — extensions/active-memory/index.ts:2492
Review details

Best possible solution:

Rebase this PR onto current main, keep the plugin-local fast-fail and timeout-boilerplate handling, and align the setup-grace contract in code, tests, docs, and changelog before exact-head checks gate merge.

Do we have a high-confidence way to reproduce the issue?

Yes. On current main, Active Memory does not observe formatted tool output and its watchdog uses timeoutMs + setupGraceTimeoutMs; the PR's mocked regressions cover the zero-hit, unavailable, memory_get, boilerplate, and hard-timeout paths.

Is this the best way to solve the issue?

Partly. The callback-based fast-fail is the narrowest maintainable implementation, but the hard-timeout/setup-grace semantics need docs alignment and conflict repair before this is the best mergeable form.

Full review comments:

  • [P2] Document the new setup grace semantics — extensions/active-memory/index.ts:2492
    This changes the Active Memory watchdog to fire at timeoutMs even when setupGraceTimeoutMs is configured, but current public docs still describe setup grace as extra setup budget before recall timeout expiry. Operators tuning this config would get the wrong contract after merge, so update docs/concepts/active-memory.md with the new lifecycle and budget semantics in this PR.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.84

Acceptance criteria:

  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md docs/concepts/active-memory.md extensions/active-memory/index.ts extensions/active-memory/index.test.ts
  • pnpm exec oxlint extensions/active-memory/index.ts extensions/active-memory/index.test.ts
  • pnpm test extensions/active-memory/index.test.ts
  • pnpm changed:lanes --json
  • pnpm check:changed in Testbox or exact-head CI after conflict repair

What I checked:

  • current-main-lifecycle-docs: The Active Memory docs define the feature as a plugin-owned blocking memory sub-agent that runs before the main reply and injects hidden context only when useful recall is produced. Public docs: docs/concepts/active-memory.md. (docs/concepts/active-memory.md:1, ac607044f1ff)
  • current-main-does-not-observe-tool-output: Current main calls runEmbeddedPiAgent for Active Memory with memory tools and an abort signal, but without shouldEmitToolOutput or onToolResult, so a zero-hit Memory Search result cannot currently terminate recall early. (extensions/active-memory/index.ts:2126, ac607044f1ff)
  • existing-runner-callback-seam: The embedded-runner params already expose shouldEmitToolOutput and onToolResult, so the PR's callback approach stays within the existing runtime seam. (src/agents/pi-embedded-runner/run/params.ts:139, ac607044f1ff)
  • current-main-setup-grace-contract: Current main extends the Active Memory watchdog by setupGraceTimeoutMs, and the public docs describe that key as extra setup budget before recall timeout expiry. (extensions/active-memory/index.ts:2286, ac607044f1ff)
  • pr-fast-fail-implementation: The PR adds Memory Search output parsing, passes shouldEmitToolOutput and onToolResult into the embedded recall run, resolves terminal empty/unavailable results, and aborts the embedded run after terminal Memory Search output. (extensions/active-memory/index.ts:2262, bf5f4046d678)
  • pr-timeout-semantics-doc-gap: The PR changes the watchdog to fire at params.config.timeoutMs instead of timeoutMs + setupGraceTimeoutMs, but it does not update the Active Memory documentation for the new setup-grace semantics. (extensions/active-memory/index.ts:2492, bf5f4046d678)

Likely related people:

  • steipete: Recent Active Memory commits include making setup grace explicit and stabilizing the timeout test surface now touched by this PR. (role: recent maintainer / setup-grace owner; confidence: high; commits: 59449d7f19cc, d85d782a0a27, d068cb960de4; files: extensions/active-memory/index.ts, extensions/active-memory/index.test.ts, docs/concepts/active-memory.md)
  • Takhoffman: Prior merged work added the bundled recall-tool path and QMD debug surface that this PR's Memory Search fast-fail behavior builds on; Tak also asked the lifecycle question in this PR. (role: active-memory recall feature owner / maintainer requester; confidence: high; commits: f256eeba431b, 885209ed0330; files: extensions/active-memory/index.ts, docs/concepts/active-memory.md, extensions/memory-core/src/tools.ts)
  • vincentkoc: Recent history includes Active Memory docs work and adjacent hook/startup fallback behavior, which overlaps the setup-grace documentation and lifecycle explanation needed here. (role: docs and adjacent timeout maintainer; confidence: medium; commits: c357235fe6e6, d686e6f876b4, d1f91b52fa67; files: docs/concepts/active-memory.md, extensions/active-memory/index.ts)
  • luyao618: History links the current hard-deadline Promise.race behavior to a prior merged Active Memory timeout fix, which is the same watchdog surface changed by this PR. (role: active-memory hard-timeout introducer; confidence: medium; commits: fbefbf05bd74; files: extensions/active-memory/index.ts, extensions/active-memory/index.test.ts)

Remaining risk / open question:

  • The branch is currently dirty against main, so conflict resolution may materially affect the reviewed diff.
  • Exact-head checks are failing in four aggregate lanes and need repair or rerun proof before automerge.
  • The setup-grace behavior now conflicts with current public docs unless the docs are updated or the timeout change is revised.

Codex review notes: model gpt-5.5, reasoning high; reviewed against ac607044f1ff.

@codexGW codexGW force-pushed the fix/active-memory-empty-search-fast-fail branch from 37e278f to e850a6b Compare May 1, 2026 21:51
@codexGW codexGW changed the title fix(active-memory): fast-fail zero-hit memory search fix(active-memory): fast-fail stalled recall paths May 1, 2026
@codexGW codexGW marked this pull request as ready for review May 1, 2026 22:01
@codexGW
Copy link
Copy Markdown
Contributor Author

codexGW commented May 2, 2026

Combined live verification with #75860 also applied locally:

  • Runtime: local combined branch codex/qmd-active-memory-combined-test at d73a9da (#75860 head 4b123260 + this PR head e850a6b).
  • Fresh Discord thread test in #samantha at 2026-05-01 17:22 PDT.
  • Active-memory log for the run: start timeoutMs=5000 then done status=timeout elapsedMs=5041 summaryChars=0.
  • The assistant-visible runtime context did not include an <active_memory_plugin> block and did not inject the timeout boilerplate as memory context.
  • The literal phrase Request timed out before a response was generated appeared only in Gary's test prompt asking Samantha to check for it, not as plugin output.

This confirms the important behavior for this PR under the combined runtime: timed-out active-memory recall is classified as timeout/empty and does not leak harness timeout prose into the main prompt.

@codexGW codexGW force-pushed the fix/active-memory-empty-search-fast-fail branch 6 times, most recently from 0db7c10 to bf010d7 Compare May 2, 2026 02:46
@codexGW codexGW force-pushed the fix/active-memory-empty-search-fast-fail branch from bf010d7 to bf5f404 Compare May 2, 2026 02:52
@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper explain to me how this change would fit into the overall life cycle of active memory and how this pr if merged would change that life cycle

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 2, 2026

🦞🦞
ClawSweeper is taking a look at your question.

I asked ClawSweeper to answer this maintainer mention in the next review comment. Tiny claws, bounded scope: this is a read-only assist pass unless it produces one of the existing structured safe-action markers.

Request: explain to me how this change would fit into the overall life cycle of active memory and how this pr if merged would change that life cycle

@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper status

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 2, 2026

🦞🦞
ClawSweeper status:

  • Current PR: 75761
  • Branch: fix/active-memory-empty-search-fast-fail
  • ClawSweeper PR: no
  • Automerge: not-enabled
  • Automerge job: jobs/openclaw/inbox/automerge-openclaw-openclaw-75761.md
  • Merge state: DIRTY
  • Checks: SUCCESS:67 SKIPPED:8 FAILURE:4
  • Check blockers: check-lint:FAILURE, check-additional-extension-bundled:FAILURE, check:FAILURE, check-additional:FAILURE

@codexGW
Copy link
Copy Markdown
Contributor Author

codexGW commented May 2, 2026

@Takhoffman This PR does not change when Active Memory runs: it still starts during before_prompt_build, asks the embedded recall subagent to search memory, and injects <active_memory_plugin> only when the subagent produces useful recall.

The lifecycle change is in terminal cases. Before this PR, if the embedded subagent ran memory_search and got a clean zero-hit result, Active Memory could still wait for the subagent to finish or for the outer timeout. If the embedded runner timed out, its timeout boilerplate could also be normalized as if it were useful recall.

With this PR, Active Memory observes the memory_search tool result through the existing embedded-runner callback. A clean zero-hit result (results: [], debug.hits === 0, and no error/disabled/unavailable state) resolves recall immediately as empty and aborts the embedded run. A plugin-owned timeoutMs budget also resolves as timeout/no-recall without injecting timeout text. Setup grace remains cleanup/watchdog grace, not extra prompt-visible recall time.

So useful recall is unchanged; the change is that empty and timed-out recall become terminal earlier and do not add misleading <active_memory_plugin> context.

@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 2, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 2, 2026

🦞🦞
ClawSweeper automerge is enabled.

Draft PRs stay fix-only until GitHub marks them ready for review. Pause with /clawsweeper stop.

Automerge progress:

  • 2026-05-02 17:08:13 UTC review queued [`bf5f4046d678`](https://github.com/openclaw/openclaw/commit/bf5f4046d678c6e8009f6c06c38d822b8e48cb95) (queued)
  • 2026-05-02 17:09:24 UTC repair queued [`bf5f4046d678`](https://github.com/openclaw/openclaw/commit/bf5f4046d678c6e8009f6c06c38d822b8e48cb95) (autonomous) Run: https://github.com/openclaw/clawsweeper/actions/runs/25257253992
  • 2026-05-02 17:38:19 UTC review requested repair [`bf5f4046d678`](https://github.com/openclaw/openclaw/commit/bf5f4046d678c6e8009f6c06c38d822b8e48cb95) (structured ClawSweeper marker: fix-required (finding=review-feedback sha=bf5f40...)
  • 2026-05-02 17:36:04 UTC review queued [`bf5f4046d678`](https://github.com/openclaw/openclaw/commit/bf5f4046d678c6e8009f6c06c38d822b8e48cb95) (queued)

@clawsweeper clawsweeper Bot added the clawsweeper:human-review Needs maintainer review before ClawSweeper can continue label May 2, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 2, 2026

🦞🦞
ClawSweeper is pausing this repair loop for human review.

Source: clawsweeper[bot]
Reason: structured ClawSweeper verdict: needs-human (sha=bf5f4046d678c6e8009f6c06c38d822b8e48cb95)

I added clawsweeper:human-review and left the final call with a maintainer.

@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper why was automerge stopped

@clawsweeper clawsweeper Bot removed the clawsweeper:human-review Needs maintainer review before ClawSweeper can continue label May 2, 2026
@Takhoffman
Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 2, 2026

ClawSweeper 🐠 reef update

Thanks for the work on this. ClawSweeper opened a replacement PR only because the source branch was not writable from the available bot permissions. branch tides, not contributor blame.

Replacement PR: #76183
This source PR is being closed only under the explicit source-close setting for this ClawSweeper run.
The original contribution stays credited in the replacement PR context.

fish notes: model gpt-5.5, reasoning high; reviewed against 3a55e61.

@clawsweeper clawsweeper Bot closed this May 2, 2026
@codexGW
Copy link
Copy Markdown
Contributor Author

codexGW commented May 2, 2026

Update on the source branch: the closed PR UI still shows the old head SHA, but codexGW:fix/active-memory-empty-search-fast-fail has been updated.

Current source branch tip: 634ca812a3
Compare against main: ahead by 3 / behind by 0
main...codexGW:fix/active-memory-empty-search-fast-fail

What changed since the stale PR head:

  • Rebased onto current main.
  • Clarified Active Memory docs: setupGraceTimeoutMs is hook setup/cleanup grace, not extra prompt-visible recall budget.
  • Fixed the fast-fail parser so fallback/supplement memory_search results are preserved even when debug.hits === 0.
  • Added a regression test for that fallback-results case.

Local validation passed:

  • pnpm test extensions/active-memory/index.test.ts -> 101 passed
  • pnpm check:changed -> passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants