feat: add v1 agent-scoped memory scope for LCM tools by jacoblyles · Pull Request #2 · Martian-Engineering/lossless-claw

jacoblyles · 2026-03-01T13:13:49Z

Summary

add initial agent-scoped memory plugin helpers under src/plugins/agent-memory-scope
extend scope resolution and retrieval/store paths to support same-agent multi-conversation lookups
add/adjust tests for agent scope and expand-query behavior

Test

npx vitest run test/agent-memory-scope.test.ts test/lcm-tools.test.ts test/lcm-expand-query-tool.test.ts

Notes

default behavior remains unchanged when scope is omitted
follow-up can wire runtime config knobs (allowAgentScope, maxAgentConversations) from OpenClaw

Opus subagent analysis of v4.1 baseline (333 blocks) vs v4.2 stubs (689 blocks) at the same 258K-token budget recommended four mitigations to address moderate-risk findings: 1. Recency cue [t-NNm] on turn headers 2. Semantic stub wrapping <lcm-stub> XML tags 3. Empty-assistant collapsing 4. Resolution markers at completion boundaries Applied first-principles-architectural-decision skill (research, run-the-system, where-it-lives diagrams, adversarial debate) before building any of them. Verdict: REJECT ALL FOUR. Each fails on a specific load-bearing constraint: - #1 fails on prefix-cache stability (clock-based tag changes the rendered string on every assemble, invalidating the cache that v4.2's whole value proposition relies on). User timestamps already exist inline. - #2 fails on "novelty has cost, format already works" — the existing [LCM Tool Output: file_xxx | …] bracket form is correctly parsed by Opus in live tests (drilldown via lcm_describe works on Option F format). Replacing a working v4.1-trained format with a novel XML form is unjustified churn. - #3 fails on Anthropic/OpenAI wire contract. The "empty assistants" contain tool_use blocks (required to live in assistant turns; paired with tool_results by toolCallId). Dropping them would break pairing — providers reject orphan tool_results. - Martian-Engineering#4 fails on detection signal. No reliable way to mark "work completed" — user phrases like "go ahead" / "yes" / "keep digging" oscillate. False positives are strictly worse than no marker (license premature stubbing). Adversarial debate at ≥95% confidence target on each. AGAINST won on all four. Decision record committed for future operators who hit similar moderate-risk findings and reach for similar mitigations. Final v4.2 shipping shape: Options C + D + F at commit e309bed. Architecturally additive, reversible, default-off. Empirically: 333→689 items at same budget; Opus drills down correctly; no confabulation observed.

…pattern Wire #2 of 3 for the agent context-management architecture (Wave-14). # What this lands Tools that could push context over budget now run a pre-call gate BEFORE doing work: estimate the result size; if (currentTokens + estimated) / tokenBudget > REFUSAL_THRESHOLD (0.92), return a structured `{ok: false, needsCompact: true, ...}` payload instead. Agent reads, calls lcm_compact, retries — the natural negotiation pattern. Without this layer, an agent at 78% context calling `lcm_describe expandMessages=true expandMessagesLimit=20` (estimated 13K tokens) lands at ~84% AT BEST — but worst-case messages can saturate the result-cap and push past 100%, causing context_length_exceeded errors mid-turn. # Tools wired PRE-CHECK ENFORCED (7): - lcm_grep (5 modes) - lcm_semantic_recall - lcm_describe (HIGHEST priority — biggest blow-up risk per Agent C) - lcm_expand_query - lcm_get_entity - lcm_search_entities - lcm_compact (small footprint; included for uniform agent UX) NOT WIRED (intentionally — self-protecting or out-of-scope): - lcm_synthesize_around: internal 50K source cap; prompt-bounded output ~2-3K. Per Agent B, can't blow context. - lcm_expand: sub-agent-only, has its own grant ledger # Files NEW: - `src/plugin/needs-compact-gate.ts` (~190 LOC) — REFUSAL_THRESHOLD constant (0.92 — calibrated against real DB), per-tool `estimateResultTokens(toolName, params)` formulas, the `evaluateNeedsCompactGate` core logic, and a `runWithTokenGate` wrapper helper that tools use to compose pre-check + post-call cache accumulation. - `test/v41-needs-compact-gate.test.ts` (~120 LOC) — 19 tests covering per-tool estimator math, refusal logic, suggested-action narrowing, bypass-on-missing-telemetry, and threshold boundary cases. EDITED (each ~5-10 LOC of changes): - src/tools/lcm-grep-tool.ts — gate at top of execute, tap on returns - src/tools/lcm-describe-tool.ts — gate + tap on final return - src/tools/lcm-semantic-recall-tool.ts — runWithTokenGate wrapper - src/tools/lcm-expand-query-tool.ts — wrapper - src/tools/lcm-get-entity-tool.ts — wrapper - src/tools/lcm-search-entities-tool.ts — wrapper - src/plugin/index.ts — pass `getRuntimeContext` to all 7 tool factories - src/plugin/token-state.ts — add `tapResultForTokenAccounting` helper # How the agent experience works ``` Agent: lcm_describe id=sum_xxx expandMessages=true expandMessagesLimit=30 Tool gate: estimatedResultTokens = 10000 (capped) currentRatio = 0.78 projectedRatio = (156000 + 10000) / 200000 = 0.83 → BELOW 0.92 → run normally Agent: lcm_describe id=sum_yyy expandMessages=true expandMessagesLimit=30 Tool gate: currentRatio = 0.89 // accumulated from previous result projectedRatio = 0.94 → OVER 0.92 → REFUSE Tool returns: { ok: false, needsCompact: true, reason: "context-overflow-prevention", currentRatio: 0.89, estimatedResultTokens: 10000, projectedRatio: 0.94, note: "Serving this call would push context to 94% of budget...", suggested_actions: [ "lcm_compact then retry with same params", "retry with expandMessagesLimit=15" ] } Agent: reads, calls lcm_compact, retries. Now at 70% — call succeeds. ``` # Threshold (0.92) calibration Wave-14 Agent A sampled Eva's live DB (3,904 leaves, 414 condensed, 315K messages). Per-tool result hard cap is 10K tokens (MAX_RESULT_CHARS / 4). With 200K context: 0.95 cushion → 10K headroom = zero margin (one capped call → 100%) 0.92 cushion → 16K headroom = one capped call + agent response Lower thresholds → over-refusal on safe calls # Per-tool estimator confidence (Per Wave-14 Agent C calibration against actual format strings) - lcm_grep regex/full_text/hybrid/semantic — 90% - lcm_grep verbatim — 60% (variable per-message size) - lcm_semantic_recall — 90% - lcm_describe (no expand) — 70% - lcm_describe (expand flags) — 60% (high subtree variance) - lcm_get_entity / lcm_search_entities — 90% - lcm_expand_query — 80% Estimator capped at HARD_CAP_TOKENS (10K) regardless of natural estimate — protects against under-estimation. Tools that return less than estimated just have headroom; tools with bad estimates get their natural cap protection. # Verification - 1592/1592 tests passing (1573 baseline + 19 new gate tests) - 7/7 release-readiness preflight checks pass - 330 TS errors (under 700 baseline; PR introduced none) # What's next (Commit 3 of 3) Synchronous compaction at critical pressure (`afterTurn` deferred-mode drain runs sync at >0.85 currentRatio). System-level safety net behind the agent-driven layers.

Opus subagent analysis of v4.1 baseline (333 blocks) vs v4.2 stubs (689 blocks) at the same 258K-token budget recommended four mitigations to address moderate-risk findings: 1. Recency cue [t-NNm] on turn headers 2. Semantic stub wrapping <lcm-stub> XML tags 3. Empty-assistant collapsing 4. Resolution markers at completion boundaries Applied first-principles-architectural-decision skill (research, run-the-system, where-it-lives diagrams, adversarial debate) before building any of them. Verdict: REJECT ALL FOUR. Each fails on a specific load-bearing constraint: - #1 fails on prefix-cache stability (clock-based tag changes the rendered string on every assemble, invalidating the cache that v4.2's whole value proposition relies on). User timestamps already exist inline. - #2 fails on "novelty has cost, format already works" — the existing [LCM Tool Output: file_xxx | …] bracket form is correctly parsed by Opus in live tests (drilldown via lcm_describe works on Option F format). Replacing a working v4.1-trained format with a novel XML form is unjustified churn. - #3 fails on Anthropic/OpenAI wire contract. The "empty assistants" contain tool_use blocks (required to live in assistant turns; paired with tool_results by toolCallId). Dropping them would break pairing — providers reject orphan tool_results. - Martian-Engineering#4 fails on detection signal. No reliable way to mark "work completed" — user phrases like "go ahead" / "yes" / "keep digging" oscillate. False positives are strictly worse than no marker (license premature stubbing). Adversarial debate at ≥95% confidence target on each. AGAINST won on all four. Decision record committed for future operators who hit similar moderate-risk findings and reach for similar mitigations. Final v4.2 shipping shape: Options C + D + F at commit e309bed. Architecturally additive, reversible, default-off. Empirically: 333→689 items at same budget; Opus drills down correctly; no confabulation observed.

100yenadmin · 2026-05-30T17:05:18Z

@jacoblyles triage pass update: I marked this priority:P3 enhancement/linked-pr/stale-check. The P0 delegated-retrieval leakage path was fixed by #768, so this older agent-scoped memory branch may now be partially superseded. Can you confirm whether there is remaining scope behavior here that #768 did not cover?

Add v1 agent-scoped memory scope for LCM tools

c04a1c6

oguzbilgic mentioned this pull request Mar 10, 2026

Feature: Auto-transplant DAG on session reset (bridge the 4 AM amnesia wall) #34

Closed

fuyizheng3120 mentioned this pull request Apr 16, 2026

Empty assistant messages cause 'assistant message prefill' API rejection #445

Closed

rmarr mentioned this pull request Apr 29, 2026

Quoting marker text in customInstructions triggers reverse-priming, dropping doctor apply repair rate to 0% #543

Open

ChrisBot2026 mentioned this pull request May 5, 2026

Missing command:reset / command:new hook leaves stale LCM conversation rows after session reset #612

Closed

100yenadmin mentioned this pull request May 5, 2026

feat(lcm): v4.1 —LCM V2 (replaces #516; companion #616 deferred) #613

Open

100yenadmin added enhancement New feature or request priority:P3 Moderate bug or backlog item stale-check Stale issue/PR being checked with the original reporter linked-pr Has an identified PR or merge candidate labels May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add v1 agent-scoped memory scope for LCM tools#2

feat: add v1 agent-scoped memory scope for LCM tools#2
jacoblyles wants to merge 1 commit into
Martian-Engineering:mainfrom
jacoblyles:feature/agent-memory-scope-v1

jacoblyles commented Mar 1, 2026 •

edited

Loading

Uh oh!

100yenadmin commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jacoblyles commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test

Notes

Uh oh!

100yenadmin commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jacoblyles commented Mar 1, 2026 •

edited

Loading