Skip to content

feat: add v1 agent-scoped memory scope for LCM tools#2

Open
jacoblyles wants to merge 1 commit into
Martian-Engineering:mainfrom
jacoblyles:feature/agent-memory-scope-v1
Open

feat: add v1 agent-scoped memory scope for LCM tools#2
jacoblyles wants to merge 1 commit into
Martian-Engineering:mainfrom
jacoblyles:feature/agent-memory-scope-v1

Conversation

@jacoblyles

@jacoblyles jacoblyles commented Mar 1, 2026

Copy link
Copy Markdown

Summary

  • add initial agent-scoped memory plugin helpers under src/plugins/agent-memory-scope
  • extend scope resolution and retrieval/store paths to support same-agent multi-conversation lookups
  • add/adjust tests for agent scope and expand-query behavior

Test

  • npx vitest run test/agent-memory-scope.test.ts test/lcm-tools.test.ts test/lcm-expand-query-tool.test.ts

Notes

  • default behavior remains unchanged when scope is omitted
  • follow-up can wire runtime config knobs (allowAgentScope, maxAgentConversations) from OpenClaw

100yenadmin referenced this pull request in 100yenadmin/lossless-claw May 7, 2026
Opus subagent analysis of v4.1 baseline (333 blocks) vs v4.2 stubs (689
blocks) at the same 258K-token budget recommended four mitigations to
address moderate-risk findings:

1. Recency cue [t-NNm] on turn headers
2. Semantic stub wrapping <lcm-stub> XML tags
3. Empty-assistant collapsing
4. Resolution markers at completion boundaries

Applied first-principles-architectural-decision skill (research,
run-the-system, where-it-lives diagrams, adversarial debate) before
building any of them. Verdict: REJECT ALL FOUR. Each fails on a
specific load-bearing constraint:

- #1 fails on prefix-cache stability (clock-based tag changes the
  rendered string on every assemble, invalidating the cache that v4.2's
  whole value proposition relies on). User timestamps already exist
  inline.

- #2 fails on "novelty has cost, format already works" — the existing
  [LCM Tool Output: file_xxx | …] bracket form is correctly parsed by
  Opus in live tests (drilldown via lcm_describe works on Option F
  format). Replacing a working v4.1-trained format with a novel XML
  form is unjustified churn.

- #3 fails on Anthropic/OpenAI wire contract. The "empty assistants"
  contain tool_use blocks (required to live in assistant turns; paired
  with tool_results by toolCallId). Dropping them would break
  pairing — providers reject orphan tool_results.

- Martian-Engineering#4 fails on detection signal. No reliable way to mark "work
  completed" — user phrases like "go ahead" / "yes" / "keep digging"
  oscillate. False positives are strictly worse than no marker
  (license premature stubbing).

Adversarial debate at ≥95% confidence target on each. AGAINST won on
all four. Decision record committed for future operators who hit
similar moderate-risk findings and reach for similar mitigations.

Final v4.2 shipping shape: Options C + D + F at commit e309bed.
Architecturally additive, reversible, default-off. Empirically:
333→689 items at same budget; Opus drills down correctly; no
confabulation observed.
100yenadmin referenced this pull request in 100yenadmin/lossless-claw May 7, 2026
…pattern

Wire #2 of 3 for the agent context-management architecture (Wave-14).

# What this lands

Tools that could push context over budget now run a pre-call gate
BEFORE doing work: estimate the result size; if (currentTokens +
estimated) / tokenBudget > REFUSAL_THRESHOLD (0.92), return a
structured `{ok: false, needsCompact: true, ...}` payload instead.
Agent reads, calls lcm_compact, retries — the natural negotiation
pattern.

Without this layer, an agent at 78% context calling
`lcm_describe expandMessages=true expandMessagesLimit=20` (estimated
13K tokens) lands at ~84% AT BEST — but worst-case messages can
saturate the result-cap and push past 100%, causing
context_length_exceeded errors mid-turn.

# Tools wired

PRE-CHECK ENFORCED (7):
- lcm_grep (5 modes)
- lcm_semantic_recall
- lcm_describe (HIGHEST priority — biggest blow-up risk per Agent C)
- lcm_expand_query
- lcm_get_entity
- lcm_search_entities
- lcm_compact (small footprint; included for uniform agent UX)

NOT WIRED (intentionally — self-protecting or out-of-scope):
- lcm_synthesize_around: internal 50K source cap; prompt-bounded
  output ~2-3K. Per Agent B, can't blow context.
- lcm_expand: sub-agent-only, has its own grant ledger

# Files

NEW:
- `src/plugin/needs-compact-gate.ts` (~190 LOC) — REFUSAL_THRESHOLD
  constant (0.92 — calibrated against real DB), per-tool
  `estimateResultTokens(toolName, params)` formulas, the
  `evaluateNeedsCompactGate` core logic, and a `runWithTokenGate`
  wrapper helper that tools use to compose pre-check + post-call
  cache accumulation.
- `test/v41-needs-compact-gate.test.ts` (~120 LOC) — 19 tests covering
  per-tool estimator math, refusal logic, suggested-action narrowing,
  bypass-on-missing-telemetry, and threshold boundary cases.

EDITED (each ~5-10 LOC of changes):
- src/tools/lcm-grep-tool.ts — gate at top of execute, tap on returns
- src/tools/lcm-describe-tool.ts — gate + tap on final return
- src/tools/lcm-semantic-recall-tool.ts — runWithTokenGate wrapper
- src/tools/lcm-expand-query-tool.ts — wrapper
- src/tools/lcm-get-entity-tool.ts — wrapper
- src/tools/lcm-search-entities-tool.ts — wrapper
- src/plugin/index.ts — pass `getRuntimeContext` to all 7 tool factories
- src/plugin/token-state.ts — add `tapResultForTokenAccounting` helper

# How the agent experience works

```
Agent: lcm_describe id=sum_xxx expandMessages=true expandMessagesLimit=30

Tool gate:
  estimatedResultTokens = 10000 (capped)
  currentRatio = 0.78
  projectedRatio = (156000 + 10000) / 200000 = 0.83 → BELOW 0.92 → run normally

Agent: lcm_describe id=sum_yyy expandMessages=true expandMessagesLimit=30

Tool gate:
  currentRatio = 0.89  // accumulated from previous result
  projectedRatio = 0.94 → OVER 0.92 → REFUSE

Tool returns:
{
  ok: false,
  needsCompact: true,
  reason: "context-overflow-prevention",
  currentRatio: 0.89,
  estimatedResultTokens: 10000,
  projectedRatio: 0.94,
  note: "Serving this call would push context to 94% of budget...",
  suggested_actions: [
    "lcm_compact then retry with same params",
    "retry with expandMessagesLimit=15"
  ]
}

Agent: reads, calls lcm_compact, retries. Now at 70% — call succeeds.
```

# Threshold (0.92) calibration

Wave-14 Agent A sampled Eva's live DB (3,904 leaves, 414 condensed,
315K messages). Per-tool result hard cap is 10K tokens
(MAX_RESULT_CHARS / 4). With 200K context:
  0.95 cushion → 10K headroom = zero margin (one capped call → 100%)
  0.92 cushion → 16K headroom = one capped call + agent response
  Lower thresholds → over-refusal on safe calls

# Per-tool estimator confidence

(Per Wave-14 Agent C calibration against actual format strings)
- lcm_grep regex/full_text/hybrid/semantic — 90%
- lcm_grep verbatim — 60% (variable per-message size)
- lcm_semantic_recall — 90%
- lcm_describe (no expand) — 70%
- lcm_describe (expand flags) — 60% (high subtree variance)
- lcm_get_entity / lcm_search_entities — 90%
- lcm_expand_query — 80%

Estimator capped at HARD_CAP_TOKENS (10K) regardless of natural
estimate — protects against under-estimation. Tools that return less
than estimated just have headroom; tools with bad estimates get
their natural cap protection.

# Verification

- 1592/1592 tests passing (1573 baseline + 19 new gate tests)
- 7/7 release-readiness preflight checks pass
- 330 TS errors (under 700 baseline; PR introduced none)

# What's next (Commit 3 of 3)

Synchronous compaction at critical pressure (`afterTurn` deferred-mode
drain runs sync at >0.85 currentRatio). System-level safety net
behind the agent-driven layers.
100yenadmin referenced this pull request in 100yenadmin/lossless-claw May 7, 2026
Opus subagent analysis of v4.1 baseline (333 blocks) vs v4.2 stubs (689
blocks) at the same 258K-token budget recommended four mitigations to
address moderate-risk findings:

1. Recency cue [t-NNm] on turn headers
2. Semantic stub wrapping <lcm-stub> XML tags
3. Empty-assistant collapsing
4. Resolution markers at completion boundaries

Applied first-principles-architectural-decision skill (research,
run-the-system, where-it-lives diagrams, adversarial debate) before
building any of them. Verdict: REJECT ALL FOUR. Each fails on a
specific load-bearing constraint:

- #1 fails on prefix-cache stability (clock-based tag changes the
  rendered string on every assemble, invalidating the cache that v4.2's
  whole value proposition relies on). User timestamps already exist
  inline.

- #2 fails on "novelty has cost, format already works" — the existing
  [LCM Tool Output: file_xxx | …] bracket form is correctly parsed by
  Opus in live tests (drilldown via lcm_describe works on Option F
  format). Replacing a working v4.1-trained format with a novel XML
  form is unjustified churn.

- #3 fails on Anthropic/OpenAI wire contract. The "empty assistants"
  contain tool_use blocks (required to live in assistant turns; paired
  with tool_results by toolCallId). Dropping them would break
  pairing — providers reject orphan tool_results.

- Martian-Engineering#4 fails on detection signal. No reliable way to mark "work
  completed" — user phrases like "go ahead" / "yes" / "keep digging"
  oscillate. False positives are strictly worse than no marker
  (license premature stubbing).

Adversarial debate at ≥95% confidence target on each. AGAINST won on
all four. Decision record committed for future operators who hit
similar moderate-risk findings and reach for similar mitigations.

Final v4.2 shipping shape: Options C + D + F at commit e309bed.
Architecturally additive, reversible, default-off. Empirically:
333→689 items at same budget; Opus drills down correctly; no
confabulation observed.
@100yenadmin 100yenadmin added enhancement New feature or request priority:P3 Moderate bug or backlog item stale-check Stale issue/PR being checked with the original reporter linked-pr Has an identified PR or merge candidate labels May 30, 2026
@100yenadmin

Copy link
Copy Markdown
Collaborator

@jacoblyles triage pass update: I marked this priority:P3 enhancement/linked-pr/stale-check. The P0 delegated-retrieval leakage path was fixed by #768, so this older agent-scoped memory branch may now be partially superseded. Can you confirm whether there is remaining scope behavior here that #768 did not cover?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request linked-pr Has an identified PR or merge candidate priority:P3 Moderate bug or backlog item stale-check Stale issue/PR being checked with the original reporter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants