feat(lcm): v4.1 —LCM V2 (replaces #516; companion #616 deferred) by 100yenadmin · Pull Request #613 · Martian-Engineering/lossless-claw

100yenadmin · 2026-05-05T21:49:23Z

LCM v2 (iteration experiments v4.1) — Lossless Agent Memory

77 commits · 1502 tests passing · 10 audit waves closed · live-DB verified against the user's 2.6 GB / 4187-leaf corpus

Replaces the rollup approach from #516. Companion draft #616 preserves cut features.

This PR rebuilds LCM the way a person actually remembers: keep the raw conversation forever, embed it for similarity search, and synthesize fresh views on demand. The v3 lcm_recent rollup approach (summaries-of-summaries-of-summaries) is removed because it produced repetitive, lossy output that got worse the further back you looked.

After merge, the agent can answer every one of these without operator intervention:

Question	Answer path
"What did we work on yesterday?"	`lcm_synthesize_around` with `period: 'yesterday'` (timezone-aware)
"Have we ever discussed X?"	`lcm_grep --mode hybrid` (FTS + Voyage rerank, +52.5pp recall on paraphrases)
"What did Eva exactly say about Y?"	`lcm_grep --mode verbatim` (full message rows, no summary paraphrase)
"What's the history with the operator-VM customer?"	`lcm_get_entity('operator-VM customer')`
"Where did this synthesis claim come from?"	`lcm_describe(id, expandChildren: true)` then `lcm_expand_query`

The operator gets /lcm health, /lcm purge (soft-suppression cascading through 10+ read paths), /lcm reconcile-session-keys, /lcm eval (recall + drift), /lcm worker lifecycle, and a real lossless raw bedrock.

TL;DR — what merges and why
The problem this solves
Architecture (with diagrams)
The 8 agent tools
Why Voyage embeddings (Phase A spike)
Worker auto-ticks
Operator commands
Cost discipline
What was CUT and why
Test infrastructure
Audit history (10 waves, ~140 bugs closed)
Verification
Migration safety
Operator setup walkthrough
Non-goals (intentional)
Related PRs

TL;DR — what merges and why

Headline numbers:

77 commits spanning ~6 weeks of design + implementation + 10 audit waves
1502 tests passing in CI; 0 PR-introduced TypeScript errors (677 baseline matches main)
15,279 LOC of production code added across 42 source files
15K+ LOC of test infrastructure across 31 v4.1-tagged test files
~140 unique bugs found and closed across audit waves (most in Wave-9; Wave-10 closed the last 16)
Live-DB verified twice against the user's actual ~/.openclaw/lcm.db (2.6 GB, 4187 leaves)
8 of 9 antipattern classes from Wave-10's audit have automated detection — the 9th is partial (mutation testing on demand)

What you should care about as a maintainer:

The architecture replaces a known-bad rollup approach with a structurally lossless one. Section The problem this solves walks the failure modes; Architecture shows the new design with diagrams.
The agent surface is 8 tools mapped to 5 question types. Each has a concrete cost profile. See The 8 agent tools.
Test rigor is exceptional: 10 audit waves found ~140 bugs in code that passed all tests at each prior wave. Wave-10 pivoted from "more audits" to "build invariant tests that catch the 9 known antipattern classes." See Test infrastructure.
Reviewer findings: 12 separate reviewer findings (unrelated to my own audits) were verified 12-for-12 real and closed in Wave-10, including a P1 timezone bug and a P1 cache-staleness bug that violated user-trust contracts.
The PR is in the strongest test-coverage shape it's been in across the entire 10-wave cycle. Time to ship; let CI catch what comes next.

The problem this solves

We shipped lcm_recent (PR #516) in v3 — the rollup tool. Plan: every period (day/week/month) gets summarized into one rollup; the rollup is what the agent reads back. Cheap, deterministic.

In production it broke in three ways:

1. Compression of compression

day rollup    = summary(7 daily user/assistant turns)
week rollup   = summary(7 day rollups)            ← summaries OF summaries
month rollup  = summary(4 week rollups)           ← summaries OF summaries OF summaries

By the time a query reached the monthly view, the same fact had been summarized three times. The model started saying "as discussed earlier" referencing a discussion that wasn't in the rollup at all — it was 3 layers down, paraphrased away.

2. No way to ask sideways questions

lcm_recent only told you about a time window. If you wanted to ask "have we ever discussed X?" — you couldn't. The rollups were time-indexed, not topic-indexed.

3. Stale-output trap

If the rollup was generated yesterday and a leaf got suppressed today, the rollup still reflected the suppressed content. Every rollup needed to be invalidated and regenerated on every leaf change — which negated the precomputed-cost savings.

The decision

Stop building rollups entirely. Build a system where:

Raw leaves stay forever (the lossless bedrock).
Similarity search (Voyage + reranker) handles "have we ever talked about X?" — straight to source, not to a rollup.
Synthesis happens on demand (lcm_synthesize_around) when an agent asks for a window, working from the original leaves rather than re-summarizing summaries.
Per-tier model dispatch: haiku for daily, sonnet for weekly, opus for monthly + verify, opus-thinking + best-of-N for yearly. We don't pay premium prices for trivial summaries OR cheap out on the hard ones.
Suppression is a first-class read-path concern: the cascade fires through 10+ read surfaces so a single suppressed_at flip makes content invisible everywhere.

That's v4.1.

Architecture

Storage pyramid (the lossless bedrock)

flowchart BT
    subgraph Bedrock["Lossless bedrock — never compressed away"]
        M[(messages<br/>raw user/assistant/tool turns<br/>+ suppressed_at flag)]
    end
    subgraph Layer1["Layer 1: deterministic chunking"]
        L[(summaries kind=leaf<br/>~600 tokens each<br/>+ suppressed_at flag)]
        M -->|leaf-summarizer| L
    end
    subgraph Layer2["Layer 2: condensed views"]
        C[(summaries kind=condensed<br/>depth 1-2<br/>+ contains_suppressed_leaves flag)]
        L -->|condensation worker| C
    end
    subgraph OnDemand["Layer 3+: synthesized on demand"]
        SC[(lcm_synthesis_cache<br/>tier-tagged, prompt-tagged<br/>rebuildable)]
        L -->|dispatchSynthesis<br/>per-tier model| SC
        C -->|dispatchSynthesis| SC
    end
    subgraph Sidecars["Sidecars built async"]
        E[(lcm_entities<br/>+ lcm_entity_mentions)]
        VEC[(vec0:<br/>lcm_embeddings_voyage4large)]
        L -.->|entity coreference worker| E
        L -.->|backfill worker → Voyage| VEC
    end

Key invariants:

Lossless: messages + leaf summaries are never byte-deleted. Suppression is a flag, not a delete.
Cache is rebuildable: lcm_synthesis_cache can be wiped without data loss; everything regenerates from leaves.
Sidecars are eventually consistent: vec0 + entities are async; gateway hot path doesn't wait on Voyage.

Agent tool routing — 5 question types → 8 tools

flowchart LR
    Q[Agent's question]
    Q --> A{What kind?}
    A -->|Time-anchored<br/>'yesterday' / 'last week'| TA[lcm_synthesize_around<br/>period mode]
    A -->|Topic-anchored<br/>'discussed X?'| TB[lcm_grep --mode hybrid<br/>OR lcm_semantic_recall]
    A -->|Verbatim<br/>'exact wording'| TC[lcm_grep --mode verbatim]
    A -->|Pattern entity<br/>'history of X'| TD[lcm_get_entity<br/>+ lcm_search_entities]
    A -->|Drilldown<br/>'where did this come from?'| TE[lcm_describe<br/>+ lcm_expand_query]
    TA -->|tier dispatch<br/>haiku/sonnet/opus/thinking| LLM[LLM]
    TB -->|FTS + Voyage rerank<br/>+52.5pp paraphrase recall| Hits[Ranked hits]
    TC -->|Full message rows<br/>cap 20| Quotes[Verbatim quotes]
    TD -->|Read-only DB query| Entity[Entity record + mentions]
    TE -->|Sub-agent expansion<br/>via grant ledger| Sub[Synthesized answer]

Suppression cascade (the "soft purge" mechanism)

flowchart LR
    OP[Operator: /lcm purge --reason X --apply]
    OP -->|sets suppressed_at| S[(summaries.suppressed_at)]
    OP -->|cascades to messages| MS[(messages.suppressed_at)]
    S -->|trigger| V[(vec0 metadata<br/>suppressed=1)]
    S -->|cascade DELETE| CI[(context_items)]
    S -->|cascade DELETE| CR[(lcm_synthesis_cache rows<br/>referencing suppressed leaves)]
    S -->|flag| CS[(parent condensed:<br/>contains_suppressed_leaves=1)]

    subgraph ReadPaths["10+ read paths filter suppressed_at IS NULL"]
        R1[FTS5 searches]
        R2[LIKE fallback]
        R3[CJK trigram]
        R4[Regex post-filter]
        R5[vec0 KNN pre-filter]
        R6[Semantic search JOIN]
        R7[Hybrid rerank input]
        R8[summaryStore.getById]
        R9[conversationStore.getMessageById]
        R10[Entity tools — EXISTS guard]
    end

    S -.->|filter applied| R1 & R2 & R3 & R4 & R5 & R6 & R7 & R8 & R9 & R10
    MS -.->|filter applied| R9
    V -.->|pre-filter| R5

Wave-10 reviewer fix: lcm_get_entity / lcm_search_entities now require an EXISTS (... unsuppressed mention) guard. If every mention of an entity gets purged, the entity row stops returning to the agent — closes the leak Wave-10 reviewer P2 found.

Synthesis dispatch (per-tier model selection)

flowchart TB
    REQ[SynthesizeRequest:<br/>tier + memoryType + sourceText + targetSummaryId]
    REQ --> ROUTE{tier?}
    ROUTE -->|daily| D[Single pass<br/>haiku-4-5]
    ROUTE -->|weekly| W[Single pass<br/>sonnet-4-5]
    ROUTE -->|monthly| M[Single pass<br/>opus-4-7<br/>+ verify_fidelity check]
    ROUTE -->|yearly| Y[Best-of-3 candidates<br/>opus-4-7-thinking<br/>+ judge picks winner]
    ROUTE -->|custom/filtered| C[Single pass<br/>sonnet-4-5]

    D & W & M & Y & C --> AUDIT[(lcm_synthesis_audit<br/>per-pass row)]
    M -->|verify pass output| VR{OK marker?}
    VR -->|missing OR HALLUCINATION/UNSUPPORTED| FLAG[hallucinationFlagged=true<br/>conservative default<br/>Wave-4 P0]
    VR -->|present| OK[hallucinationFlagged=false]

    Y -->|all 3 in parallel<br/>Promise.allSettled<br/>tolerates 1 failure| CAND[Candidates]
    CAND --> JUDGE[Judge prompt picks winner]

Concurrency model (gateway vs worker)

flowchart LR
    subgraph Gateway["Gateway process — hot path, sync per turn"]
        T[Turn] --> ASM[assemble pyramid]
        ASM --> LW[leaf write<br/>BEGIN IMMEDIATE]
        LW -->|enqueue async| EQ[(extraction queue)]
        LW -->|atomic| FTS[(FTS5)]
        LW -->|sync index| SUM[(summaries)]
    end

    subgraph Worker["Worker process — background"]
        BAC[backfill autostart<br/>5min interval<br/>gated on VOYAGE_API_KEY]
        EXC[entity-coref autostart<br/>60s interval<br/>gated on LCM_EXTRACTION_LLM_ENABLED]
        BAC -->|drains| EQ
        EXC -->|drains| EQ
        BAC -->|HTTP POST| VOY[Voyage API<br/>raw fetch, not SDK]
        EXC -->|LLM call| LLM[Worker LLM]
    end

    subgraph Locks["lcm_worker_lock — TTL + heartbeat"]
        L1[acquire INSERT OR IGNORE]
        L2[heartbeat every 30s<br/>WHERE expires_at > now]
        L3[release on shutdown]
    end

    BAC --> L1 & L2 & L3
    EXC --> L1 & L2 & L3

    Note["§0 invariant: NO LLM/network call inside any SQLite write tx"]

Critical invariant (verified by test/v41-concurrency-invariants.test.ts with worker_threads parallel writers): no await on an LLM/network call ever appears inside a BEGIN IMMEDIATE. Voyage HTTP calls happen OUTSIDE the transaction; results are committed in a separate write tx.

The 8 agent tools

Each tool maps to one or more of the 5 question types. Cost profile is per-call against the user's typical corpus.

`lcm_grep` — multi-modal search

The Swiss-army search tool. 5 modes for 5 different jobs.

mode	What it does	When to use	Cost
`regex`	JS regex post-filter on FTS-loaded set	Specific patterns, error codes, IDs	$0 (DB only)
`full_text`	FTS5 BM25-style ranked	Keyword recall, FTS-easy queries	$0
`hybrid`	FTS top-K + semantic top-K + Voyage rerank merge	Paraphrase + keyword in one call	~$0.001/query
`verbatim`	Full message rows (cap 20) — NOT summary paraphrase	Citation, quote-back, "exact wording"	$0
`semantic`	Pure embed-only KNN (no rerank)	Cheap broad recall	~$0.0001/query

Wave-9 P1.4 fix: verbatim mode handles CJK queries (Chinese/Japanese/Korean) via LIKE fallback. The FTS5 unicode61 tokenizer can't segment ideographs, so messages_fts MATCH '机器学习' returned 0 rows silently. Now containsCjk() detection routes CJK queries directly to LIKE substring match.

Example call:

await lcm_grep({
  pattern: "rebase conflict",
  mode: "hybrid",
  allConversations: true,
  limit: 10
});
// Returns ranked hits with cosineSimilarity, confidenceBand, ftsRank

`lcm_semantic_recall` — pure semantic search

Same cost profile as lcm_grep --mode semantic (~$0.0001/query); kept as a distinct tool for clarity. Returns ranked snippets with cosineSimilarity + confidenceBand.

`lcm_synthesize_around` — time-anchored synthesis (the `lcm_recent` replacement)

Three modes:

period mode: period: 'yesterday' | 'today' | 'this-week' | 'last-week' | 'this-month' | 'last-month' | 'last-Nh' | 'last-Nd'. Target is OPTIONAL. Wave-10 P1 fix: day-boundary periods (today/yesterday/etc.) are computed in the operator's local timezone (lcm.timezone), not UTC. A Bangkok operator (UTC+7) at 02:00 local asking "yesterday" gets local-yesterday, not UTC-yesterday (which would be ~17 hours off).
time mode: target: 'sum_xxx' + windowHours: N. Anchors on a known leaf's timestamp.
semantic mode: target: 'sum_xxx' OR free-text query. Top-K most-similar leaves.

Output is a fresh markdown synthesis using the per-tier model (haiku for custom and filtered tiers; full per-tier dispatch in Synthesis dispatch above).

Backed by lcm_synthesis_cache with Wave-10 P1 fix: cache UNIQUE index now keys on (session_key, range_start, range_end, leaf_fingerprint, grep_filter, tier_label, prompt_id). Previously ignored tier_label and prompt_id, so two correctness bugs:

Same range/leaves with different tier silently returned wrong-tier text
registerPrompt() changing the active prompt left cache serving stale text

`lcm_describe` — drilldown by ID

Look up a specific summary or file by its ID. Returns metadata, lineage manifest (parent chain + descendant counts), source messages (for leaves), and on-demand expansion via expandChildren / expandMessages flags.

Wave-9 + Wave-10 fixes: when called from a sub-agent session, expansion now consumes the grant's token budget — previously sub-agents could drain context for free. The base summary's s.content token count is also charged (Wave-10 P1).

`lcm_expand` — sub-agent-only deep expansion

Gated behind runtimeExpansionAuthManager.grant(). Issues a token-budgeted expansion request that traverses the summary lineage and returns content under a hard cap. Main agents cannot call this directly — they go through lcm_expand_query which delegates to a sub-agent that holds the grant.

`lcm_expand_query` — main-agent wrapper for delegated expansion

Takes a free-text query + token cap. Spawns a sub-agent holding a runtime expansion grant; sub-agent runs lcm_expand repeatedly to materialize source. Returns synthesized answer with citation IDs.

Wave-9 P1.1 fix: citation-fabrication count (citedIdsRejectedAsFabricated + citedIdsExceededValidationCap) now surfaces through the ExpandQueryReply — previously computed internally but dropped at the API boundary. Agents can now distinguish "the LLM produced no citations" from "all citations were hallucinated and rejected."

`lcm_get_entity` — entity catalog lookup by canonical name

Returns entity record + mention list with summary IDs. Mentions are filtered through summaries.suppressed_at IS NULL.

Wave-10 reviewer P2 fix: requires EXISTS (... unsuppressed mention) for the entity itself. When every mention of an entity is suppressed via purge, the entity row no longer leaks canonical_text / alternate_surfaces / metadata. The "not found" branch is intentionally indistinguishable between "no such entity" and "all mentions suppressed" so an attacker can't infer existence by querying.

`lcm_search_entities` — entity catalog browse by query

Substring/prefix/exact match across canonical_text. Same suppression guard.

Why Voyage embeddings

Phase A spike data (real eval, not gut feel)

A 31-query stratified eval against the user's snapshot DB measured per-stratum lift:

Stratum	n	FTS-only	Hybrid (Voyage rerank-2.5)	Lift
FTS-easy	14	40.5%	69.0%	+28.5pp
FTS-medium	9	not graded	not graded	—
Paraphrastic	8	5.0%	57.5%	+52.5pp

The +52.5pp paraphrastic lift was the threshold (decision gate was ≥30pp). Hybrid mode is the answer to "have we ever discussed X?" — paraphrase coverage is the differentiator.

Spike cost: $0.58 total (one-time eval).

Why voyage-4-large + rerank-2.5

voyage-4-large: highest-quality general-domain model in the Voyage lineup (1024-dim, $0.18/1M tokens).
rerank-2.5: pairwise reranker that takes the top-K from FTS + semantic and produces a final ranking. The ~+52.5pp lift came from rerank, not embed.
Single model: no hybrid embedding stack. We considered voyage-3-lite but its vector space is incompatible with voyage-4-large; mixing required dual-corpus storage.

Why not OpenAI / Cohere / local

OpenAI text-embedding-3-large: comparable quality but no first-party reranker. Adding Cohere rerank introduces a second vendor dependency.
Cohere embed-english-v3: lower quality on the user's eval set.
Local embedding (e.g., bge-large): runs on user's machine but no rerank-2.5-quality local reranker exists. Quality lift was the driver.

Cost reality

Backfilling Eva's 4187-leaf corpus: ~$1 one-time.
Per-query hybrid (with rerank): ~$0.001.
Per-query semantic-only (no rerank): ~$0.0001.
Per-leaf incremental embed: ~$0.000045.

A year of heavy use is ~$5-10 in Voyage costs. Not a budget concern.

Worker auto-ticks

Backfill autostart

Triggers: gateway init detects non-zero pending docs.
Gated on: VOYAGE_API_KEY present + sqlite-vec loaded + active embedding profile registered.
Cadence: 5-minute interval, perTickLimit=200, 0.5 RPS rate limit (Voyage policy + 9.5% tokenizer-inflation margin).
Self-stop: 3 consecutive failures (e.g., Voyage outage) auto-stops the loop. Operator restarts via /lcm worker tick embedding-backfill.
Wave-9 fix: auto-stop now also fires on all-skipped ticks (over-cap leaves), not just on hard failures.

Entity coreference autostart

Default-on, opt-out via LCM_EXTRACTION_LLM_ENABLED=false.
Cadence: 60-second interval, drains lcm_extraction_queue.
Per-row SAVEPOINT (Wave-7 P0): one bad surface in a batch doesn't ROLLBACK the whole leaf.
Dead-letter: max 5 attempts per row (Wave-10 fix: pending count predicate now matches selector exactly — previously could spin forever on suppressed/dead-letter rows).

Lock semantics

lcm_worker_lock table: INSERT OR IGNORE on PK gives single-flight; heartbeat every 30s with WHERE expires_at > now prevents stealing live locks. TTL + heartbeat verified against parallel writers via worker_threads (test/v41-concurrency-invariants.test.ts).

Operator commands

Command	What it does	Owner-gated?
`/lcm status`	Plugin / DB / current-conversation status	No (read-only)
`/lcm health`	v4.1 subsystem health (embeddings / workers / synthesis / eval / suppression / over-cap leaves)	No (read-only)
`/lcm worker [status\|tick embedding-backfill]`	Worker management; `tick` runs one backfill cycle	status: no, tick: yes
`/lcm reconcile-session-keys [--list-candidates\|--apply ...]`	Merge legacy session keys	--apply: yes
`/lcm eval [--baseline\|--mode hybrid\|...]`	Recall + drift report; mutates `lcm_eval_run`	yes (Wave-10)
`/lcm purge [--reason ... --apply]`	Soft-purge cascade	yes
`/lcm backup`	Timestamped DB backup	No
`/lcm rotate`	Compact session transcript while preserving LCM identity	No
`/lcm doctor [clean [apply ...]\|apply]`	Broken-summary scanning / repair	No (analysis)

Wave-9 P0 fix + Wave-10 reviewer P1 fix: every destructive command requires senderIsOwner=true. Previously only /lcm purge had the gate; Wave-9 added it to /lcm reconcile-session-keys --apply (cross-session data theft vector). Wave-10 added it to /lcm eval (the reviewer correctly challenged Wave-9's READ_ONLY classification — eval mutates lcm_eval_run and may use Voyage in hybrid mode). The authorization-invariant test (test/v41-authorization-invariants.test.ts) statically scans lcm-command.ts for new cases and FAILS at test time if a destructive case is added without the gate.

Cost discipline

Workload	One-time	Ongoing
Voyage embedding backfill (4187 leaves)	~$1	n/a
New leaf embedding	n/a	~$0.000045/leaf
`lcm_grep --mode hybrid` (per query)	n/a	~$0.001
`lcm_grep --mode semantic` / `lcm_semantic_recall`	n/a	~$0.0001
`lcm_grep --mode verbatim` / `regex` / `full_text`	n/a	$0 (DB only)
Daily synthesis (haiku-4-5)	n/a	~$0.005
Weekly synthesis (sonnet-4-5)	n/a	~$0.05
Monthly synthesis (opus-4-7 + verify)	n/a	~$0.50
Yearly synthesis (opus-thinking + best-of-3 + judge)	n/a	~$5

Per-tier model dispatch is the cost lever: we don't pay opus-thinking prices for yesterday's summary, and we don't ask haiku to do yearly synthesis.

What was CUT and why

Per first-principles pass + 8 challenger agents (2026-05-06):

Feature	Why cut	Preserved at
Themes (3 tools + worker + schema)	Half-shipped UX worse than not shipping. Worker had no auto-tick wiring; operators couldn't manually trigger via `/lcm worker tick themes-consolidation`. Tool error message itself admitted "auto-tick is cycle-3".	Draft PR #616
Procedure mining (worker + prefilter + schema)	0% shipped. No agent tool, no LLM injection, no auto-tick. Pure dead code.	Draft PR #616
Intentions (schema + prospective-extract prompt)	ZERO producer / ZERO consumer / ZERO agent tools. Schema-only artifact.	Draft PR #616
`runPurge --immediate` mode	No drainer worker (~20-40h work, HIGH risk to assemble-pyramid invariants). Functionally identical to soft mode without it.	Draft PR #616
`lcm_voyage_rate_state` schema	Table-only feature, ZERO production readers/writers. Per-process throttle covers single-gateway use.	Draft PR #616
`lcm_purge_rebuild_queue` schema	Queue with no drainer (paired with `--immediate` cut).	Draft PR #616
`lcm_describe` consolidation (entity_id / theme_id polymorphism)	400-LOC refactor touching the canonical describe tool. After 4 final-review passes, reopens adversarial review surface for ergonomic-only gain.	Draft PR #616

Net diff: ~2935 LOC removed from PR. Net change after capability adds (verbatim mode, semantic mode, expandChildren flags, doc updates): ~−2605 LOC.

The companion draft PR #616 preserves each cut with full context for focused future-cycle pickup. Each cut was assessed against THE_FIVE_QUESTIONS coverage; no question type lost coverage (procedure/theme sub-cases have adequate-fallback coverage via lcm_grep --mode hybrid).

Test infrastructure

This is what makes the PR shippable. Wave-10 pivoted from "more audits" to "build automated tests that catch every known antipattern class."

8 of 9 antipattern classes have automated detection

Antipattern	Closed by	Test file
A1 Implementation-mirroring tests	Adversarial scenarios (37 hard scenarios)	`v41-adversarial-scenarios.test.ts`
A2 Per-function tests, no cross-cutting invariant	5 invariant test files	`v41-{authorization,suppression,tool-parity,schema-drift,concurrency}-invariants.test.ts`
A3 Mocked-too-high tests	Mock LLM at the LLM-call seam	`fixtures/v41-mock-llm.ts` + `v41-synthesis-quality.test.ts`
A4 Missing edge-case fixtures	Synthetic + stress + adversarial fixtures	`fixtures/v41-{test,stress}-corpus.ts`
A5 Missing adversarial / negative-path tests	Adversarial scenarios + mock failure shapes	`v41-adversarial-scenarios.test.ts` + `v41-synthesis-quality.test.ts`
A6 Seam-between-units untested	End-to-end scenario tests via real DB	`v41-five-questions.test.ts`
A7 Coverage ≠ correctness	Mutation testing config (on-demand)	`stryker.config.json`
A8 Concurrency / TOCTOU	worker_threads parallel-writer harness	`v41-concurrency-invariants.test.ts`
A9 Schema / contract drift	Static-analysis test suite	`v41-schema-drift-invariants.test.ts`

A7 (mutation testing) is partial — stryker-mutator config is checked in but not run in CI (too slow for per-PR; ~5min per file). On-demand only.

Test layer cost profile

Layer	Cost per run	When
Unit + invariant + scenario + synthesis-quality	~30s	Every PR
Stress fixture (1500-2500 leaves)	<2s	Every PR
Concurrency (worker_threads parallel writers)	~4s	Every PR
Live-DB harness (full 2.6 GB corpus)	~2 min, ~$0.001	Operator pre-merge
QA runner full suite	~5-10 min, ~$0.20	Operator pre-merge OR release gate
Mutation testing (per file)	~5 min	On-demand diagnostic

THE_FIVE_QUESTIONS as executable tests

The 25 scenarios in docs/v4.1/THE_FIVE_QUESTIONS.md are now executable against a deterministic synthetic fixture (test/fixtures/v41-test-corpus.ts, ~80 leaves with known content). Wave-10 sub-agent #3's audit found that the original 26 tests were 16 strong / 9 weak / 1 sentinel — strengthened the weak ones to assert specific summary IDs in results.

Synthesis quality — closed via mock LLM

The single un-tested gap after Wave-9 was synthesis quality (real LLM tests are non-deterministic + cost money + need network). Wave-10 added a deterministic LlmCall mock with 10 response shapes including adversarial (fabricated citations, malformed JSON, hallucinated content). 12 synthesis-quality tests verify:

Per-tier dispatch routing (daily=haiku, weekly=sonnet, monthly=opus+verify, yearly=opus-thinking+best-of-N)
Prompt rendering (placeholder substitution into actual LLM call)
Best-of-N tolerates 1 failing candidate via Promise.allSettled
Verify-fidelity defaults to hallucinationFlagged=true on garbled output (Wave-4 P0 conservatism)

Audit history

10 audit waves over ~6 weeks, with ~140 unique bugs found and closed. The progression validates the test-infrastructure investment:

Wave	Type	Findings	Notable
1	10 Opus agents, full PR	27	Initial post-implementation pass
2	10 Opus agents	19	Schema + cleanup
3	10 Opus agents	11	Tool surface
4	22 Opus agents, full re-audit	22	Citation-fabrication validation infrastructure
5	3 Opus agents on Wave-4 fixes	8	Wave-4 P0 regressions
6	2 Opus agents	2	Confirmed mergeability claim
7	22 Opus agents, full re-audit	22	Operator gate P0 + Voyage retry tightening
8	22 Opus agents, full re-audit	9	TOCTOU race in runPurge (now fixed in Wave-9)
9	11 Opus agents, FULL FILE CONTEXT	78	First wave with non-diff context — found 1 P0 + 13 P1
10	Reviewer 12 findings (12/12 verified) + 4 sub-agents	12 + 4	Closed last antipatterns; built mock LLM

Wave-10 reviewer findings: 12 separate reviewer findings, verified one-by-one before fixing (the user explicitly said "wasn't sure if verified"). Result: 12-for-12 real bugs, including the timezone P1 (Bangkok-yesterday-is-not-UTC-yesterday) and the cache-staleness P1 (prompt_id and tier_label missing from UNIQUE index).

Wave-10 sub-agent discoveries: 4 parallel sub-agents building test infrastructure ALSO found additional real bugs:

Sub-agent feat: add v1 agent-scoped memory scope for LCM tools #2 (schema-drift): 3 P3 FK ON-DELETE missing clauses
Sub-agent Wrong config path in README causes gateway crash loop #3 (adversarial scenarios): 1 fixture-FTS-circularity bug — summaries_fts insert was using rowid but schema declares summary_id UNINDEXED. Original B1-B5 tests passed by accident matching at the messages layer.

Total Wave-10 closed: 12 (reviewer) + 4 (sub-agent discoveries) = 16 source bugs, plus 89 new tests.

Verification

Test counts (final)

1502/1502 tests passing across 105 test files
31 v4.1-tagged test files (test/v41-*.test.ts)
0 PR-introduced TypeScript errors (677 errors total — fewer than the 739 baseline on main; type-tightening fixes cascaded from source changes)

Live-DB verification (real corpus)

Run twice in Wave-10 against a copy of ~/.openclaw/lcm.db (2.6 GB, 4187 leaves):

[harness] ✓ extraction tick processed 1 items
[harness] ✓ entity coref created 1 entities
[harness] ✅ ALL CHECKS PASSED. Harness DB at: /Volumes/LEXAR/lcm-tmp/lcm-harness-...db
[harness] v4.1 retrieval pipeline verified end-to-end against the real corpus.

QA runner against real DB

[qa-runner] critical failures: 0
[qa-runner] important failures: 0
[qa-runner] tool errors (uncaught): 0

(After fixing 3 test bugs in Wave-10 — they were test-data-naming issues, not source bugs. Tools correctly rejected harness leaves with non-sum_ prefix; we updated harness + QA runner to use production naming.)

Sample mutation testing

Two sample files measured (full mutation run is too slow for CI):

src/store/fts5-sanitize.ts: 82.35% mutation kill rate (well-tested utility)
src/operator/purge.ts: 67.97% mutation kill rate, 17 uncovered mutants — recently-grown workflow file with measurable gaps

The gap is the diagnostic the user predicted ("tests show green while we keep finding real bugs"). Future work uses these sample numbers to prioritize where to add tests.

Migration safety

All schema changes are additive. Re-running runLcmMigrations is idempotent (verified in tests + live-DB twice). No column drops, no type changes. Cut tables are simply not created on fresh installs; existing operator DBs that already have them keep them as no-op residue (no FK breakage, no data loss).

Wave-10 schema-drift invariant (test/v41-schema-drift-invariants.test.ts) statically validates:

Every {{placeholder}} in seeded prompts has a corresponding renderPrompt substitution
Every tier_label CHECK constraint accepts every value in the TierLabel TS union
Every operator command has matching parser + handler entries
Every FK declaration has an explicit ON DELETE clause (Wave-10 sub-agent feat: add v1 agent-scoped memory scope for LCM tools #2 finding)
Every manifest tool has a registered factory

These run statically at test time, no DB needed. Future drift breaks the test before it breaks production.

Operator setup walkthrough

# 1. Install sqlite-vec (now in optionalDependencies as of Wave-10)
npm install sqlite-vec

# 2. Configure Voyage
mkdir -p ~/.openclaw/credentials && chmod 700 ~/.openclaw/credentials
# Paste your key into the file:
echo "$VOYAGE_KEY" > ~/.openclaw/credentials/voyage-api-key
chmod 600 ~/.openclaw/credentials/voyage-api-key
export VOYAGE_API_KEY="$(cat ~/.openclaw/credentials/voyage-api-key)"

# 3. Restart the gateway. Watch the log:
tail -f ~/.openclaw/logs/gateway.log | grep -E "lcm|voyage|backfill"
# Expected within ~10s of boot:
#   [lcm] semantic infra initialized: profile=voyage4large, dim=1024
#   [lcm] backfill autostart enabled (5min cadence)
# Expected within first 5min tick:
#   [lcm] backfill tick: embedded=200 of pending=3801

# 4. Check progress (~1hr to fully embed a 4000-leaf corpus at 0.5 RPS):
/lcm health

# 5. Want it faster? Force a tick (operator only):
/lcm worker tick embedding-backfill

# 6. Once embeddedCount catches up, semantic + hybrid retrieval works.
#    Try in a chat:
#      "Use lcm_grep with mode hybrid to find anything about race conditions"
#      "Use lcm_grep with mode verbatim to quote what was said about X"
#      "Use lcm_synthesize_around with period yesterday to recap"

# 7. Soft-forget a leaf (operator-only):
/lcm purge --reason "PII removal" --summary-ids sum_xxx --apply

If VOYAGE_API_KEY is missing, the plugin still works — lcm_grep --mode hybrid returns an error pointing to use mode='full_text' instead. Operator opts in by setting the key. (sqlite-vec is now an optionalDependency per Wave-10 reviewer P2 fix — install it explicitly to enable semantic.)

Non-goals

What v4.1 is intentionally not:

Not RAG. The assemble() pyramid is structural (fresh tail → recent leaves → last-week condensed → last-month condensed → last-year synthesis). It does NOT do per-turn semantic retrieval into the prompt. Semantic retrieval is an agent tool the model can call when the user asks for it.
Not a rollup replacement that produces more rollups. Synthesis is on-demand via lcm_synthesize_around, not a precomputed nightly job.
Not auto-tied to themes / procedures / intentions. All three were half-shipped or fully speculative; cut from this PR (preserved in [CUT FEATURES] [DRAFT] LCM v4.1 — deferred features (themes / procedures / hard-delete drainer / voyage rate-state / intentions / describe consolidation) #616).
Not a replacement for hard-delete. runPurge does soft-suppression only. The DB rows remain; only suppressed_at is set. For GDPR/erasure that requires byte-level deletion, the operator must run raw SQL DELETE + VACUUM out-of-band until [CUT FEATURES] [DRAFT] LCM v4.1 — deferred features (themes / procedures / hard-delete drainer / voyage rate-state / intentions / describe consolidation) #616 lands the drainer worker.

Related PRs

Replaces: feat: add temporal lcm_recent rollups (1/10) #516 — same problem space, different architectural answer (rejected for repetition + lossy compression-of-compression). This PR closes the gap feat: add temporal lcm_recent rollups (1/10) #516 was trying to fill.
Companion draft: [CUT FEATURES] [DRAFT] LCM v4.1 — deferred features (themes / procedures / hard-delete drainer / voyage rate-state / intentions / describe consolidation) #616 — preserves themes / procedures / intentions / hard-delete drainer / voyage rate-state / lcm_describe consolidation with full context for focused future-cycle pickup.

Reviewer checklist

If you're reviewing for merge, focus on these in order:

The architecture decision (The problem this solves → Architecture). The whole PR rests on "stop building rollups; build a system where raw stays + synthesis is on-demand." If you disagree with that, everything below is moot.
Suppression cascade (architecture diagram 3). 10+ read paths must filter suppressed_at IS NULL. The invariant test (v41-suppression-invariants.test.ts) loops over every read path on the storage stores. If you find a method that returns content without consulting the filter, that's a P0.
Authorization (operator commands). 4 destructive commands (purge / reconcile / worker-tick / eval) are senderIsOwner gated. The invariant test (v41-authorization-invariants.test.ts) statically scans for new cases and FAILS at test time if classification is missing.
Synthesis dispatch (architecture diagram 4). Per-tier model selection. Verify-fidelity conservatism. Best-of-N with Promise.allSettled tolerance. All covered by v41-synthesis-quality.test.ts (12 tests, mock LLM).
Migration safety (Migration safety). Idempotent. Live-DB verified twice. Schema-drift invariant catches future drift.
Test rigor (Test infrastructure). 8 of 9 antipattern classes automated. The mock LLM closed the last big gap (synthesis quality).

If those 6 sections check out, the rest of the PR is implementation detail that the test layer will catch regressions on.

First commit of the v4.1 omnibus implementation. Smallest possible slice: introduces the cross-process concurrency model module and the `lcm_worker_lock` table that enables a sidecar worker process for cold maintenance work (condensation, extraction, embedding backfill, theme consolidation, eval, profile rebuild). Resolves v4.1.1 amendment A9 (`last_heartbeat_at` column required by §0.5 fallback rule: gateway can take over only when BOTH `expires_at < now` AND `last_heartbeat_at < now - 300s`). Changes: - src/concurrency/model.ts (NEW) — single source of truth for §0 invariants, busy_timeout constants, worker job-kind catalogue, and defensive assertion helpers (assertForeignKeysEnabled, assertBusyTimeoutForRole). Documents the no-LLM-in-write-tx invariant and the worker_threads heartbeat requirement (v4.1.1 A9). - src/db/migration.ts (+25 lines) — new `ensureLcmWorkerLockTable` migration step. Idempotent CREATE TABLE IF NOT EXISTS, runs after FTS setup, before the BEGIN EXCLUSIVE COMMIT. - test/concurrency-model.test.ts (NEW, 10 tests) — verifies invariant ordering (worker timeout < gateway, TTL ≥ 3× heartbeat, fallback soak > TTL), job-kind catalogue, and assertion helpers. - test/lcm-worker-lock.test.ts (NEW, 4 tests) — verifies migration creates the table with the right columns (including A9's last_heartbeat_at), is idempotent, supports basic acquire/heartbeat, and supports stale-lock GC. Verification: - npm run build: passes - npm test --run: 48 files / 872 tests passing (up from 858 baseline, +14 new tests, zero regressions) - Live DB ground-truth check: ran the new DDL against a copy of /Users/lume/.openclaw/lcm.db (2.5GB, 762 conversations, 3771 leaf summaries). Migration succeeds; existing data untouched; acquire pattern works; PK conflict throws as expected. Notes: - Code-as-ground-truth pivot: per the v4.1.1 plan, each commit cites the amendment(s) it resolves and is verified against live data. - v4.1.1 A6 finding (PRAGMA foreign_keys = OFF on Eva's CLI test) partially superseded: src/db/connection.ts:configureConnection() already sets it ON for every connection that goes through the standard path. The new assertForeignKeysEnabled() is a defensive guardrail for future code paths that bypass configureConnection.

…_feature_flags (A.02) Resolves v4.1.1 amendments A2 (suppress_reason + superseded_by columns) and A8 (feature-flag storage). Adds the v3.1 columns the v4.1 spec depends on (session_key, suppressed_at, entity_index, contains_suppressed_leaves) since v3.1 never shipped to upstream. Changes: - src/db/migration.ts (+104 LOC): - ensureSummaryV41Columns(db) — adds 7 columns to summaries via the existing PRAGMA table_info / ADD COLUMN pattern (matches ensureSummaryDepthColumn / ensureSummaryMetadataColumns / etc.): session_key TEXT NOT NULL DEFAULT '' (v3.1 A1) suppressed_at TEXT (v3.1 A3) entity_index TEXT (v3.1 §7.2) contains_suppressed_leaves INTEGER NOT NULL DEFAULT 0 (v3.1 A3) suppress_reason TEXT (v4.1.1 A2) superseded_by TEXT REFERENCES summaries (v4.1.1 A2/A4) ON DELETE SET NULL leaf_summarizer_cap_was INTEGER (v4.1) - ensureMessageSuppressedAtColumn(db) — adds messages.suppressed_at (v3.1 A3 cascade target for lcm_quote / lcm_factcheck filtering) - ensureLcmFeatureFlagsTable(db) — clean new table `lcm_feature_flags(flag PK, value NOT NULL, updated_at NOT NULL)` - lcm_worker_lock TEXT PK explicitly NOT NULL (SQLite legacy quirk allows NULL in TEXT PK columns without it). - test/v41-summaries-columns.test.ts (NEW, 12 tests): - Per-column verifications (NOT NULL, default value, FK target/action) - lcm_feature_flags schema + basic set/read pattern - Legacy `lcm_migration_flags` coexistence verified Verification: - npm run build: passes - npm test --run: 49 files / 884 tests passing (+12 from A.01's 872, 0 regressions) - Live DB ground-truth check on copy of /Users/lume/.openclaw/lcm.db: summaries 14 → 21 columns; 7 v4.1 cols added. messages gains suppressed_at; 3774 leaves preserved. lcm_worker_lock + lcm_feature_flags created. Eva's legacy lcm_rollups* + lcm_migration_flags untouched. 4187 summaries now have session_key='' (A.08 backfill target). Code-as-ground-truth findings (revising v4.1.1 spec): 1. v4.1.1 A8 originally said "extend lcm_migration_flags with value column." That table doesn't exist in upstream src/ — it only exists on Eva's live DB from old fork-side code. Replaced with a clean new `lcm_feature_flags` table. Eva's legacy table stays alongside, untouched. 2. v4.1.1 A6 (PRAGMA foreign_keys = OFF) is partly misleading: the codebase's src/db/connection.ts:configureConnection() already sets foreign_keys = ON for every connection through the standard path. Eva's earlier sqlite3 CLI test was using a different connection, not the production path. The new src/concurrency/model.ts already provides assertForeignKeysEnabled() as a defensive guardrail. 3. SQLite TEXT PRIMARY KEY columns do NOT auto-enforce NOT NULL (legacy behavior). Both new tables (lcm_worker_lock, lcm_feature_flags) now have explicit NOT NULL on their PK column. Caught by tests. 4. SQLite ADD COLUMN with REFERENCES requires NULL default — verified `superseded_by TEXT REFERENCES summaries(summary_id) ON DELETE SET NULL` works as ALTER TABLE ADD COLUMN (no NOT NULL allowed). Documented in ensureSummaryV41Columns docstring.

… + audit (A.03) Adds the four "support tables" the worker process and operator surface need before the heavy schema (synthesis cache, embeddings, entities, themes) lands. Each is a clean idempotent CREATE TABLE IF NOT EXISTS. Resolves v4.1.1: - A3 — `lcm_extraction_queue`: gateway atomically inserts a queue row with every leaf write; worker drains it for entity coreference and procedure-recheck. CHECK constraint on `kind` ('entity' | 'procedure-recheck'). Indexes on pending (queued_at WHERE picked_at IS NULL) and dead-letter (attempts >= 5). - B2 (partial) — `lcm_purge_rebuild_queue`: persistent rebuild queue for `lcm_purge --immediate`. T1 fires suppression cascade + enqueues; worker drains using A4 forwarder pattern. Indexes on pending + purge_session_id. - B3 (partial) — `lcm_voyage_rate_state`: cross-process rate-limit budget for Voyage embed + rerank. SQLite serializes BEGIN IMMEDIATE naturally so gateway + worker coordinate via this shared row. CHECK constraint on bucket ('embed' | 'rerank'). Seeded with both rows idempotently (`INSERT OR IGNORE`). Spec note: HTTP call MUST happen AFTER the COMMIT — wrapping HTTP in BEGIN IMMEDIATE would serialize every gateway query embed and add 200-2000ms latency. - §C item — `lcm_session_key_audit`: reversibility log for §2.1 step 1 re-key of 5 legacy convs. Allows operator `/lcm undo-session-key-rekey <conv_id>` if the spike's identification was wrong for any of those convs. Changes: - src/db/migration.ts (+90 LOC): four `runMigrationStep` blocks added inline after the v3.1+v4.1 column work from A.02 - test/v41-support-tables.test.ts (NEW, 9 tests): per-table schema verification (columns, FKs, indexes, CHECK constraints), CHECK rejection paths, idempotent re-run verification, brief-tx update pattern verification for rate state Verification: - npm test --run: 50 files / 893 tests passing (+9 from A.02's 884, zero regressions) - Live DB ground-truth check on copy of /Users/lume/.openclaw/lcm.db: PRE lcm_ tables: 5 (legacy lcm_migration_flags + lcm_migration_state + 3 lcm_rollups* from Eva's fork) POST lcm_ tables: 9 (5 legacy preserved + 4 new) voyage rate state seeded with embed + rerank rows 3774 leaves preserved, 762 conversations preserved Eva's lcm_rollups* untouched (out-of-scope for v4.1; v4.1 replaces its functionality via lcm_synthesis_cache landing in A.04) Notes: - All four FKs use the production summaries / conversations tables; CASCADE on DELETE is the right semantics (queue/audit rows are derived; if their parent is genuinely deleted, they should follow). - Per v4.1.1 A6 (now confirmed code-side): connection.ts already enforces foreign_keys = ON, so these CASCADEs work in production.

… cache_leaf_refs + synthesis_audit (A.04) Adds the four-table synthesis layer per v4.1 §3 + §1.3 + v4.1.1 B1/B4. Tables created in dependency order so FKs work on first run: prompt_registry → synthesis_cache (FK on prompt_id) → cache_leaf_refs (FK on cache_id) → synthesis_audit (FK on prompt_id + either summary_id or cache_id). Resolves v4.1.1: - B1 — `lcm_synthesis_audit` schema: pass_output is NULLable (insert with NULL before LLM call, UPDATE on return). Adds `status` column ('started' | 'completed' | 'failed') for orphan-row tracking. Started- GC index supports the 1-hour orphan cleanup query. - B4 — UNIQUE lookup index on `lcm_synthesis_cache` enables cross- process single-flight via INSERT OR IGNORE pattern (loser of race reads back in-flight row, polls for status='ready'). - §3 + §1.3 — prompt registry with versioning per (memory_type, tier_label, pass_kind, version) tuple. Append-only; bundle_version groups prompt sets for synchronized voice-consistency rebuild. - §3 — synthesis cache with status='building' single-flight, prompt_id FK enables prompt-selective invalidation (NEVER touches durable summaries.content rows — closes v3 design principle 4 violation that v4 had introduced). - v3.1 A3 extension — cache_leaf_refs inverse index for proactive purge on lcm_suppress (cascades both directions: ref deleted when either cache_id OR leaf_summary_id parent is deleted). Changes: - src/db/migration.ts (+150 LOC): four runMigrationStep blocks, all idempotent, all in dependency order. - test/v41-synthesis-tables.test.ts (NEW, 14 tests): - prompt_registry: CHECK constraint enforcement (memory_type, pass_kind), UNIQUE constraint on (memory_type, tier_label, pass_kind, version) - synthesis_cache: status + tier_label CHECK enforcement, INSERT OR IGNORE single-flight pattern (ON CONFLICT DO NOTHING) - cache_leaf_refs: bidirectional CASCADE behavior verified - synthesis_audit: pass_output NULLable, started→completed pattern, CHECK requiring at least one target column, started-GC index exists Verification: - npm test --run: 51 files / 907 tests passing (+14 from A.03's 893, zero regressions) - Live DB ground-truth check on copy of /Users/lume/.openclaw/lcm.db: PRE: 5 lcm_ tables (legacy) POST A.01-A.04 cumulative: 15 lcm_ tables = 5 legacy preserved + 10 new (worker_lock, feature_flags, extraction_queue, purge_rebuild_queue, voyage_rate_state, session_key_audit, prompt_registry, synthesis_cache, cache_leaf_refs, synthesis_audit) 3774 leaves preserved, 762 conversations preserved. PRAGMA foreign_keys=1. Notes: - DB copies for end-to-end verification moved to /Volumes/LEXAR/lcm-tmp (the live DB is 2.5GB; /tmp filled up after a few iterations). - B4 UNIQUE index uses COALESCE(grep_filter, '') so SQLite can index the expression deterministically (NULL-grep_filter rows would otherwise not be uniquely-indexed since NULL ≠ NULL in SQL semantics).

… (A.05) Per v4.1 §11 + v4.1.1 (revising v4 design): - N≥100 stratified queries (50% fts-easy, 25% fts-medium, 25% paraphrastic). - 2× empirical SD threshold (calibrate by 5x repeated baseline runs). - Ensemble judge (3 different model families). - Mixed absolute+pairwise scoring per dimension. - Drift index for cumulative regression. - Measures BOTH retrieval_recall AND synthesis_quality (separate metrics per v4.1.1 — closes the v4 gap where eval collapsed them). Tables (dependency order): - lcm_eval_query_set: query set registry (e.g. 'eva-baseline-v2') - lcm_eval_query: per-query rows with stratum CHECK constraint, optional reference_summary for gold-standard comparison, must_not_regress flag for critical Eva queries - lcm_eval_run: per-run rows with separate retrieval_recall_score AND synthesis_quality_score, ensemble judge_models JSON, noise_floor_sd for drift calibration, trigger CHECK constraint - lcm_eval_drift: cumulative-delta drift index per query_set All cascade via FK on query_set_id deletion. Verified: - 52 files / 915 tests passing (+8 from A.04, zero regressions) - Live DB copy: 15 → 19 lcm_ tables. 3774 leaves preserved.

…ions + procedures + intentions (A.06) Per v4.1 §7 + v4.1.1 B5/B6/B7/B8/B11. Five tables for the extraction layer (entity coreference + procedures + intentions tracking). Tables (all idempotent, dependency-ordered): - lcm_entity_type_registry: freeform entity_type catalogue (Eva domain has session_key, config_flag, R-XXX agent IDs, error_code, etc. — no closed CHECK enum, per v4.1.1 §C). - lcm_entities: simplified schema (no separate aliases table per v4.1.1 B5; alternate surface forms denormalized into JSON column). UNIQUE index (session_key, canonical_text COLLATE NOCASE) enables case-insensitive cross-process single-flight (B4 pattern). FK to summaries(first_seen_in_summary_id) ON DELETE SET NULL. - lcm_entity_mentions: tracks each mention site. CASCADE on both entity_id and summary_id deletion (basis for v4.1.1 §C suppression cascade — when leaf gets suppressed, mentions cascade-delete). - lcm_procedures: status lifecycle ('draft'|'active'|'stale'| 'archived'|'deprecated'); extraction_source distinguishes auto (clustering pipeline) from 'manual' (lcm_remember_procedure tool, v4.1.1 B8 fix for one-shot procedures). - lcm_intentions: 3 statuses ('pending'|'fulfilled'|'cancelled' per B11); resolution_text + resolved_at columns for capture context. source_leaf_id is NULL-allowed since ON DELETE SET NULL requires it. Verified: - 53 files / 929 tests passing (+14 from A.05, zero regressions) - All 5 tables created, FK + CHECK constraints enforced.

….07) Per v4.1 §1 + v4.1.1 A5/A7. The MANAGED tables only — vec0 virtual table itself defers to Group B (requires sqlite-vec extension load, best-effort per A7's two-transaction pattern). - lcm_embedding_profile: model registry (model_name PK, dim, active flag, archive_after for graceful retirement). Group B startup seeds voyage-4-large after successful sqlite-vec load. - lcm_embedding_meta: sidecar with composite PK (embedded_id, embedded_kind, embedding_model) enabling parallel rows during model-bump cutover. CHECK on embedded_kind ('summary' | 'entity' | 'theme'). FK to lcm_embedding_profile prevents orphan model refs. No FK on embedded_id — polymorphic per v4.1.1 §C item; orphan cleanup via idle pass in Group B. Verified: - 54 files / 934 tests passing (+5 from A.06, zero regressions)

…4.1 read patterns (A.08) Per v4.1 — adds 5 partial/composite indexes that the new retrieval + suppression + idle-rebuild paths need. All CREATE INDEX IF NOT EXISTS, all idempotent, all conditional on the v4.1 columns added by A.02. Indexes: - summaries_session_key_kind_latest_idx: cross-conv assemble + retrieval scope filter. Partial WHERE session_key != '' (skips pre-A.09 backfill rows so the index stays compact during the cleanup window). - summaries_suppressed_idx: WHERE suppressed_at IS NOT NULL — small footprint partial index for the suppression filter on every retrieval. - summaries_contains_suppressed_idx: WHERE contains_suppressed_leaves = 1 AND superseded_by IS NULL — §8.1 idle-rebuild candidate scan. - messages_suppressed_idx: WHERE suppressed_at IS NOT NULL — for lcm_quote / lcm_factcheck filtering. - conversations_session_key_v41_idx: WHERE session_key IS NOT NULL — boosts the cross-conv JOIN path that legacy:conv_<id> session_keys use (existing conversations_session_key_active_created_idx is on the active flag too, which legacy convs don't satisfy). Verified: - 55 files / 942 tests passing (+7 from A.07, zero regressions)

…lowup) The optimizer picks full table scan for tiny test datasets (3 rows), not the new index — that's the right query plan for that data size, just not what the test asserted. Index PRESENCE verification (the other 6 tests in this file) covers what unit tests can; index USE in production data shape is verified by A.09's live-DB run-script.

…JOIN backfill (A.09) Per v4.1 §2.1 (universal cleanup; per-user re-keying like Eva's 5-legacy-convs → agent:main:main is OPERATOR-DRIVEN via Group F's `/lcm reconcile-session-keys`, NOT hardcoded into upstream migration). Three idempotent migration steps: 1. backfillConversationSessionKeys: every NULL conversations.session_key gets backfilled to 'legacy:conv_<id>'. Each re-key writes a row to lcm_session_key_audit (deterministic audit_id derived from conv_id ensures idempotent re-runs don't duplicate audit rows). Closes v4.1.1 A5 (NULL collapse to empty bucket would destroy cross-conv identity for legacy data). 2. backfillSummarySessionKeys: every summary still at the A.02 default session_key='' gets backfilled from the parent conversation via JOIN. After step 1 ran, conversations.session_key is non-NULL for all rows. Idempotent: condition is WHERE session_key = '' so already- set rows are preserved. 3. backfillForkRollupsSessionKeys: forward-compat for Eva's fork-side lcm_rollups table (created by PR Martian-Engineering#516, not in upstream src). Only touches the table if it exists AND has session_key column. No-op on fresh upstream installs. Verified on copy of Eva's live DB (/Volumes/LEXAR/lcm-tmp/lcm-test.db): PRE: 762 convs, 522 NULL session_keys, 4 agent:main:main, 0 legacy: POST: 762 convs, 0 NULL, 4 agent:main:main preserved, 522 legacy:conv_* 4187 summary session_key backfills (all summaries now keyed) 522 audit rows recorded 5 legacy convs identified as having leaves (target for Eva's future `/lcm reconcile-session-keys` to merge into agent:main:main) - 56 files / 947 tests passing (+6 from A.08, zero regressions)

… (A.10) Per v4.1 §2.2 — fixes the leaf-summarizer cap bug. The empirical-spike-agent found 543 leaves on Eva's live DB pegged at exactly 2,415 tokens (the LLM hitting the old 2400 default and producing artificially-truncated summaries). This commit raises the default in two places that share the constant: - src/summarize.ts:50 DEFAULT_LEAF_TARGET_TOKENS: 2400 → 4000 - src/db/config.ts:464 fallback default for pc.leafTargetTokens: 2400 → 4000 Comment added to both locations citing the empirical finding so future readers see the rationale. Voyage embedding (Group B) supports 32K input context, so 4000-token leaves are well within budget. Average leaf on Eva's corpus is 1,167 tokens (most leaves don't approach the cap); the change only affects leaves where the source content is dense enough to need it. Existing 543 capped leaves on Eva's DB stay as-is — regenerating them from source messages is expensive (LLM calls) and is operator-driven, not a migration step. Leaves are immutable per v3 design principle 4. Tests: - test/v41-leaf-cap.test.ts (NEW, 3 tests): verifies new constant + rationale comment present - test/config.test.ts: updated existing assertion 2400 → 4000 950/950 tests passing.

Raw fetch wrapper for Voyage AI. We do NOT use the voyageai npm SDK: v0.2.1 has an ESM resolution bug confirmed during Phase A spike (see docs/projects/lcm-rollup-overhaul/voyage-spike-results.md). Two entry points: embedTexts() and rerankCandidates(). Both: - Send `truncation: false` so over-cap docs are surfaced as 400 errors rather than silently clipped (lossless invariant — a truncated embedding produces a vector that doesn't reflect the source, with no signal in the vector itself that anything was dropped). - Throw typed VoyageError on every failure mode (auth/bad_request/ rate_limit/server_error/network/unexpected) so callers can react appropriately. Backfill cron will use `kind` to decide whether to park, requeue, or surface to operator. - Retry on 5xx + network errors with exponential backoff (capped 30s). NOT on 4xx (caller bug — retrying just spends quota). - Honor Retry-After header on 429 (seconds OR HTTP-date). - Support mock fetch injection for tests — no module-level state, no globals, no live API calls in CI. Token budget constants exported for callers: - MAX_TOKENS_PER_EMBED_BATCH = 80K (Voyage caps at 120K, tokenizer counts ~9.5% higher than our token_count, so 80K leaves margin). - MAX_TOKENS_PER_EMBED_DOC = 30K (voyage-4-large per-doc cap is 32K). - MAX_TOKENS_PER_RERANK_CALL = 600K (rerank-2.5 per-call total). Privacy: error messages strip Voyage-echoed input from 400 responses (some Voyage 400s include the input verbatim — could leak PII to logs that aren't supposed to see it). Raw responseBody preserved on the VoyageError for callers that need it. Coverage: 22 tests, all mock fetch: - embed happy path (input_type, ordering, empty input, truncation flag) - rerank happy path (top_k, sorting, id join) - all 6 error kinds + retry behavior - VOYAGE_API_KEY env var resolution Resolves: foundation for v4.1 §13 (embedding generation + reranking). Next (B.02): per-model vec0 table creation.

…(B.02) Centralizes all sqlite-vec interaction in src/embeddings/store.ts. Callers never touch vec0 SQL directly. Reasons documented in module header, but short version: 1. sqlite-vec is best-effort. tryLoadSqliteVec() searches candidate paths (env, plugin node_modules, ~/.openclaw/extensions) and returns boolean. If false, the rest of LCM still works (FTS-only retrieval). Aligned with v4.1.1 A7 graceful-degrade amendment. 2. vec0 has class-of-column quirks that bite: INTEGER metadata cols reject JS number literals (need BigInt at the binding site), and auxiliary cols throw "illegal WHERE constraint" if filtered inside MATCH queries. Schema choice: embedding float[<dim>] -- the vector +embedded_id text -- AUX (never WHERE-filtered) embedded_kind text -- METADATA (filterable in MATCH) suppressed integer -- METADATA (filterable in MATCH) Empirically verified: WHERE on +embedded_kind crashes vec0; WHERE on plain `embedded_kind text` (metadata) works. Centralizing this here so future code can't accidentally pick wrong column class. 3. Profile dim is immutable. registerEmbeddingProfile() throws on mismatch. To switch dim, bump the model name (e.g. add a suffix) and run cutover — never silently change dim of an existing profile. API surface: - tryLoadSqliteVec(db, opts) → boolean - vec0Version(db) → "v0.1.9" | null - candidateVec0Paths() → string[] (for diagnostics) - embeddingsTableName(modelName) → "lcm_embeddings_<slug>" - embeddingsTableExists(db, modelName) → boolean - registerEmbeddingProfile(db, modelName, dim) - ensureEmbeddingsTable(db, modelName, dim) - recordEmbedding(db, {modelName, embeddedId, embeddedKind, vector, suppressed?, sourceTokenCount}) — vec0 INSERT + meta UPSERT - replaceEmbedding(...) — DELETE-then-INSERT (for re-embed) - deleteEmbedding(...) — for purge cascade - markEmbeddingSuppressed(...) — UPDATE metadata (works on metadata cols; would corrupt if used on PARTITION KEY per v4.1.1 finding) - searchSimilar(db, {modelName, queryVector, k, embeddedKinds, excludeSuppressed}) — KNN with default exclude-suppressed - isEmbedded(db, {embeddedId, embeddedKind, modelName}) → boolean Coverage: 28 tests - 15 always-on: name validation, candidate paths, graceful degrade, profile registration with dim mismatch / bad-input rejection - 13 vec0-gated: load extension, ensure table, record/replace/delete embedding, KNN with kind filter, KNN with suppression, mark suppressed flips visibility, two independent models per DB The vec0-gated suite uses LCM_TEST_VEC0_PATH env var override (or defaults to /Users/lume/.openclaw/... on dev). vitest.config.ts overrides $HOME so homedir() inside tests doesn't see the dev install — this gate accommodates that. Build: dist/index.js = 708.4kb (was 708.4kb pre-B.02 — empty plugin import boundary, store module is tree-shaken from index.ts which doesn't import it yet; gateway picks up via Group B.05 leaf-time embed wire-up). Tests: 1000 passing (was 972 before B.02; +28 new). Resolves: foundation for v4.1 §13 (vec0 storage layer). Next (B.03): AFTER DELETE TRIGGER on summaries → cascades suppression + deletion into vec0 (since FK from vec0 → summaries corrupts vec0).

…B.03) Three new SQLite triggers, each with a specific job: 1. Per-model `lcm_embed_suppress_<slug>` (in src/embeddings/store.ts): AFTER UPDATE OF suppressed_at ON summaries WHEN (NEW.suppressed_at IS NULL) != (OLD.suppressed_at IS NULL) → mirrors the NULL-vs-not transition into vec0.suppressed metadata column for the corresponding embedded_id (kind='summary'). Why a trigger: suppression can be set from any path — operator's /lcm purge, agent tool, manual SQL, future migration cleanup. A trigger guarantees the cascade by-DB rather than by-convention. Why metadata col + WHEN clause: the trigger fires only on actual transitions, not on every other UPDATE; vec0 metadata column is pre-filterable in KNN MATCH queries (auxiliary cols throw "illegal WHERE constraint" — verified empirically). 2. Per-model `lcm_embed_delete_<slug>` (in src/embeddings/store.ts): AFTER DELETE ON summaries → DELETE matching vec0 row. Why a trigger and not FK CASCADE: vec0 corrupts under FK (v4.1.1 finding from upstream review). Trigger is the only safe path to keep vec0 + summaries in sync on hard-delete. 3. Shared `lcm_embedding_meta_cleanup_summary` (in src/db/migration.ts): AFTER DELETE ON summaries → DELETE matching lcm_embedding_meta row WHERE kind='summary'. Why this is in migration not store: lcm_embedding_meta exists once regardless of how many vec0 model tables exist (it's a cross-model sidecar). The kind='summary' filter prevents accidental cleanup of polymorphic entity/theme rows. Entity/theme cleanup triggers will land in Groups E/G when those embeddings ship. Per-model triggers are created idempotently when ensureEmbeddingsTable is called for a model. dropEmbeddingsTriggers() is exported for the model-archival cutover path (Group F operator surface). Coverage: 9 new tests (3 always-on, 6 vec0-gated): - meta-table cleanup trigger only deletes kind='summary' (entity row untouched) - meta cleanup trigger is idempotent across re-migration - suppression cascade NULL → not-NULL hides row from KNN - un-suppression cascade not-NULL → NULL restores visibility - WHEN clause skips no-op transitions (NULL → NULL, or content updates) - delete cascade removes vec0 row + meta row - two-model setup: cleanup hits both vec0 tables - dropEmbeddingsTriggers stops cascade firing - re-creating triggers is idempotent Live-DB verification: copied Eva's lcm.db (4187 summaries, 762 conversations) to /Volumes/LEXAR; migration completes in 3.9s; meta cleanup trigger created cleanly. Tests: 1009 passing (was 1000 before B.03; +9 new). Resolves: v4.1 §10 suppression cascade for vec0 retrieval surfaces. Next (B.fix): fold Group A adversarial-pass fixes (Gap 2 NULL UNIQUE on lcm_prompt_registry; Gap 7 wire concurrency assertions; Gap 9 add live-DB regression test).

Resolves Gaps 2, 7, 9 from the Group A adversarial code review: Gap 2 (MED) — lcm_prompt_registry NULL tier_label deduplication. SQLite treats multiple NULL values as distinct in UNIQUE constraints, so the original UNIQUE(memory_type, tier_label, pass_kind, version) admits duplicate rows when tier_label IS NULL. The synthesis spec requires singletons-per-version, so add a follow-up migration step (ensureLcmPromptRegistryNullSafeUniqueIdx) that creates a COALESCE-based UNIQUE INDEX. Same pattern is already used for lcm_synthesis_cache_lookup_uniq. The original UNIQUE constraint stays (catches non-NULL collisions); the new index catches NULL collisions. Gap 7 (LOW) — wire assertForeignKeysEnabled into configureConnection. src/concurrency/model.ts already exports assertForeignKeysEnabled(db) but nothing in production calls it. Add a call after the existing PRAGMA foreign_keys = ON in src/db/connection.ts:configureConnection so any future regression that opens a connection without FK enforcement (which would silently degrade every ON DELETE CASCADE in the schema) fails fast. assertBusyTimeoutForRole wiring is intentionally deferred to Group B.05 (worker startup) per the Group A reviewer's recommendation. Gap 9 (MED) — live-DB-shape regression test. All other v41-*.test.ts files start from a fresh :memory: and run the full migration on an empty DB. None tested the migration against a partially pre-existing schema (where conversations / summaries / messages already exist with rows but lcm_* tables don't yet). The Eva-live-DB verification was one-off and not in CI. New test v41-pre-existing-schema-migration.test.ts seeds the upstream pre-v4.1 baseline shape, inserts conversations + summaries + messages, runs runLcmMigrations, and verifies: NULL session_keys are backfilled, audit rows exist, summaries.session_key is JOIN-backfilled, all 21 v4.1 tables exist, the new lcm_prompt_registry_uniq_lookup index exists, and re-runs are idempotent.

Helper module on top of A.01's lcm_worker_lock table. Acquisition is atomic via PRIMARY KEY uniqueness on (job_kind) — INSERT OR IGNORE returns 1 if we got it, 0 if someone else holds it. API: - acquireLock(db, jobKind, {workerId, ttlMs?, jobSessionKey?, jobMetadata?}) → boolean. GC's expired locks BEFORE acquiring (≤ datetime('now') so ttl=0 is immediately reclaimable; race-safe via INSERT OR IGNORE). - releaseLock(db, jobKind, workerId) → boolean. Only frees if the workerId matches (prevents accidental cross-worker release). - heartbeatLock(db, jobKind, workerId, ttlMs?) → boolean. Updates expires_at + last_heartbeat_at. Returns false if the lock was preempted (caller MUST abort to avoid double-processing). - lockInfo(db, jobKind) → LockInfo | null. Used by /lcm health. - generateWorkerId(role) → string. Format `<role>-<pid>-<ms>-<6hex>`. Used by Group B.04 backfill cron (next commit) and Groups E (extraction) + G (themes consolidation) + worker scaffolding (B.05). Coverage: 13 tests (single-process acquire/release, TTL+GC behavior, heartbeat semantics including preemption-detection, metadata round-trip, multi-kind isolation, generateWorkerId uniqueness). Tests: 1017 → 1030 (+13). Resolves: §0 cross-process lock primitive used by all worker jobs. Next (B.04b): backfill cron module that uses these primitives.

…(E.spike) Wraps ml-hclust (mljs ecosystem) for use by Group E procedure clustering. Library choice rationale (full notes in module header): - ESM-native (this plugin ships ESM only) - MIT licensed, actively maintained (v4.0.0 published 2025-11-26) - Small footprint (~48KB unpacked); esbuild tree-shakes most transitive deps. Bundle delta: 708.7kb → 709.4kb (+0.7KB; index.ts doesn't import yet — Group E will pull it in) - Accepts precomputed distance matrix (we pass cosine distance), so we can do Ward+cosine without hacking the lib's internal euclidean - Cluster.cut(height) AND Cluster.group(K) both supported, satisfying both "let dendrogram decide" and "force K" use cases Architecture choice notes: - Ward + cosine on precomputed matrix: same approximation scipy gives you (linkage(method="ward", metric="cosine")). Mathematically loose (Ward assumes squared Euclidean) but conventional for text embeddings. Fallback method: "average" (UPGMA) — no Euclidean assumption — if empirical eval shows wonky merges. - Pre-normalize each vector once → cosine distance becomes (1 - dot). Halves the inner-loop cost and centralizes float-drift clamping. - O(N^2 D) distance build + O(N^3) agnes. For N=2000 D=1024 that's ~few seconds in JS — comfortably within the worker-process budget. Alternatives considered + rejected: - hierarchical-clustering-js: 404 on npm - density-clustering: wrong algorithm family (DBSCAN/k-means only) - clusterfck: deprecated - clustering-js: abandoned API: - clusterHierarchical({vectors, cutHeight?, numClusters?}) → ClusterResult Coverage: 11 tests - empty input, single vector, identical vectors, separable groups - force-K mode, mixed-dim rejection, non-Float32Array rejection, cutHeight validation, internal coverage check - 100-vector perf sanity (<2s) Built (subagent: a1e8a944580405a69) — research + library survey done in parallel with Group B.04 work; spec checked + tests verified before committing. Tests: 1030 → 1041 (+11). Resolves: foundation for Group E procedure clustering. Group E will: (1) pre-filter leaves (structural — numbered steps / commands / explicit "how to" markers, NOT FTS verb regex) (2) call clusterHierarchical() over voyage-4-large embeddings (3) filter to clusters with ≥8 members + LLM-judge confidence > 0.9 (4) write to lcm_procedures with status='active'

…idempotent (B.04b) Walks unembedded leaves, batches by token budget, calls Voyage, writes vec0 + meta. Designed as a single-tick API: caller (worker scheduler) invokes once per tick; the function acquires lcm_worker_lock, processes up to perTickLimit documents, releases lock, returns BackfillResult. API: - runBackfillTick(db, opts) → Promise<BackfillResult> - countPendingDocs(db, args) → number (for /lcm health and tick-scheduling) BackfillOptions covers: model + Voyage model dispatch, input_type (MUST be 'document' for backfill), API key + mock fetch, RPS pacing (default 0.5 = one call per 2s), batch token cap (default 80K), per-tick doc cap (default 200), token-count min/max (default 1 .. 30K), worker_id override (for stable IDs across ticks), onBatchComplete hook for telemetry, skipLock for tests. BackfillResult tracks: embeddedCount, skippedOverCap (rows above the 30K cap, requiring operator attention), skipped[] (per-row failures with kind='voyage_400'/'voyage_other'/'over_cap'), perTickLimitReached (scheduler reschedules if true), lockNotAcquired (scheduler skips this tick), voyageTokensConsumed (API usage telemetry), durationMs. Invariants: 1. NO LLM/network in any DB write tx. Each Voyage HTTP call lives OUTSIDE the per-batch transaction; rate-state UPDATE (when added in B.04c follow-up) will be a brief BEGIN IMMEDIATE that COMMITs before the HTTP call (never holds a write lock through HTTP latency). 2. Single-flight via worker lock — gateway-fallback safe. 3. Resumable — each batch's writes commit independently. Crash mid-tick loses one in-flight batch worth of Voyage spend at most. Next tick picks up still-unembedded rows. 4. Idempotent on per-row basis. SELECT pre-filters rows that already have a non-archived `lcm_embedding_meta` entry; a duplicate-write would just be a no-op via INSERT OR REPLACE. 5. Suppression-aware: rows where `summaries.suppressed_at IS NOT NULL` are excluded. 6. Per-tick failure blocklist — failed_summary_ids set excludes them from subsequent SELECTs within the same tick. Next tick re-attempts (Voyage may have recovered). Without this, a persistent 400 would spin the loop until perTickLimit. 7. Auth errors are FATAL — re-thrown so the operator gets surfaced. Still releases the lock via try/finally. Heartbeat: lock heartbeat fires every batch. If preempted (heartbeat returns false), tick aborts cleanly without partial state. Coverage: 13 tests (all vec0-gated, mock fetch — NO live API): - basic embed-all, isEmbedded reflects state - skip suppressed leaves (no Voyage call for them) - idempotent on second tick (zero new Voyage calls) - over-cap leaves filtered at SELECT (countPendingDocs verifies) - perTickLimit caps work + perTickLimitReached flag - 400 records skipped doc, no abort - 401 (auth) re-thrown, lock released via finally - 500 records skipped, continues with other batches - lockNotAcquired when another worker holds (no Voyage call) - lock released on success - lock released even on auth error - batches packed to maxBatchTokens (greedy bin-pack) - countPendingDocs accurate Tests: 1041 → 1054 (+13). Resolves: foundation for v4.1 §13 backfill — first-run embedding of existing summaries on Eva's live DB. Group B.05 (next) wires async leaf-time embed for new leaves so the cron only handles backfill of the 4187-row corpus, not new ongoing leaves.

….05) Two pieces, both foundation for Group F's `/lcm worker` operator surface (later) and to close Group A adversarial-review Gap 8. ## 1. Worker loop (src/concurrency/worker-loop.ts) Generic single-process worker loop. One Node process running multiple background jobs cooperatively, single-threaded, each with its own cadence. Cross-process safety via lcm_worker_lock from B.04a. API: - new WorkerLoop(db, {jobs: WorkerJob[], onJobComplete?}) - loop.start() → idempotent, schedules setInterval per job - loop.stop({gracefulTimeoutMs?: 30000}) → waits for in-flight ticks - loop.runOnce(kind) → outside-schedule manual tick (used by leaf-write hooks to nudge backfill, and by `/lcm worker tick` operator command) - loop.isRunning() / loop.inFlightCount() — for /lcm health Design choices: - setInterval (not setTimeout chain): predictable cadence, dispatcher skips overlapping ticks rather than queuing — extra ticks lose, not queued forever. - Errors in jobs captured via onJobComplete, never propagate to loop — one bad tick doesn't crash the worker. - generationId guard: stop()-then-start() doesn't run leftover ticks from the old loop. - validateJobs() at construction: duplicate kinds + invalid intervalMs rejected up-front (programmer error). NOT yet wired into plugin lifecycle. Group F's /lcm worker [start|stop] operator command will instantiate it with the actual job list. Until then, the loop is a library — the embedding store + backfill modules are usable standalone. NOT using worker_threads. v4.1.1 A9 foresees true heartbeat-isolation via worker_threads, but that's a future commit. setInterval-driven dispatch is fine for our cadences (5-60s). ## 2. Leaf-write session_key fix (Gap 8 from Group A adversarial review) src/store/summary-store.ts:411 — INSERT INTO summaries now atomically populates session_key from a sub-SELECT of conversations.session_key. Closes the gap where new summaries inserted between gateway boots had session_key='' until next boot's JOIN-backfill ran. The COALESCE defends against (theoretically impossible) NULL conversations.session_key. This means every newly-written summary IMMEDIATELY participates in session_key-filtered partial indexes (summaries_session_key_kind_latest_idx from A.08), without waiting for migration boot. All 1054 existing tests still pass — change is additive (default still '' if conversation has no session_key, but the migration ensures every conv has one). Coverage: 13 new worker-loop tests - start/stop idempotency - schedules at cadence (timing-based) - two jobs with different intervals - overlapping ticks skipped (not queued) - errors in jobs captured + loop continues - graceful stop waits for in-flight - graceful stop returns false on timeout - runOnce returns result, throws on unknown kind, throws on in-flight - validates duplicate kinds + bad intervalMs Tests: 1054 → 1067 (+13). Resolves: foundation for v4.1 §0 worker scheduling + Group A Gap 8. Group B is now complete (B.01 Voyage client, B.02 vec0, B.03 cascade triggers, B.fix polish, B.04a worker-lock, B.04b backfill cron, B.05 worker loop + session_key fix). Next: Group B adversarial pass, then Group C retrieval (hybrid lcm_grep, lcm_semantic_recall).

… join (C.01) Wraps the embed-query → vec0 KNN → JOIN-back-to-summaries flow used by both `lcm_semantic_recall` (Group C) AND the hybrid mode of `lcm_grep` (C.02). Centralizing here so the two callers can't drift on suppression semantics, kind filtering, or session-key scope. API: - getActiveEmbeddingModel(db) → {modelName, dim} | null Picks active=1 + archive_after IS NULL row, most-recent registered_at on ties (handles model-cutover gracefully). - runSemanticSearch(db, opts) → Promise<SemanticSearchResult> Throws SemanticSearchUnavailableError if vec0 not loaded OR no active profile OR vec0 table missing — caller decides whether to degrade (FTS-only) or surface error. SemanticSearchOptions covers: query (text) OR queryVector (precomputed), session_keys / conversation_ids / since / before / summary_kinds filters, embedded_kinds default ['summary'], excludeSuppressed default true, all Voyage knobs (apiKey/fetch/maxRetries/inputType — default 'query' for asymmetric retrieval). Suppression filtered at TWO layers (defense in depth — race between trigger fire and KNN call could leak a stale row through metadata): 1. vec0 metadata `suppressed = 0` pre-filter inside MATCH 2. Final JOIN to summaries WHERE `suppressed_at IS NULL` session_key scope uses the column populated atomically at write time per Group A Gap 8 fix (in B.05). conversation_id, time, and kind filters all bind via parameterized SQL — no injection vectors. Coverage: 15 tests - getActiveEmbeddingModel: null when no profile, picks active+ most-recent, excludes archived - SemanticSearchUnavailableError when vec0 not loaded / no profile - input validation: requires query OR queryVector; dim mismatch - happy path: ranked hits, joined content + metadata - suppression filter (default + opt-in to include) - session_keys filter restricts to matching sessions - conversation_ids filter restricts to matching conversations - since/before time filter - Voyage call with input_type='query' verified, voyageTokensConsumed tracked - summary_kinds filter (leaf vs condensed) Tests: 1067 → 1082 (+15). Resolves: foundation for v4.1 §13 retrieval pipeline. Next (C.02): new lcm_semantic_recall tool + hybrid mode for lcm_grep that calls this service alongside FTS and merges with Voyage rerank-2.5.

…rank (C.02a) Combines FTS5 candidates with vec0 KNN candidates, deduplicates by summary_id, then either: - Reranks via Voyage rerank-2.5 (default) — produces final relevance scoring across the union, taking advantage of the spike-validated +52.5pp lift on paraphrastic queries - OR reciprocal-rank-fusion (RRF) when rerank=false OR when Voyage rerank fails (transient 5xx; auth re-thrown for operator surfacing) API: - runHybridSearch(db, opts) → Promise<HybridSearchResult> opts: query, kFts (default 50), kSemantic (default 50), topN (default 20), filters (sessionKeys/conversationIds/since/before/summaryKinds), excludeSuppressed default true, rerank default true, voyage HTTP knobs. Caller injects ftsSearch() so this module doesn't take ownership of FTS5 sanitization or hybrid-recency sort logic — that lives in the existing SummaryStore/RetrievalEngine path. HybridHit returned with: - {summaryId, conversationId, sessionKey, kind, content, tokenCount, createdAt} - score (rerank score OR RRF score) - fromFts / fromSemantic provenance flags - semanticDistance (cosine), ftsRank — for diagnostics + caller display Graceful degrade: - vec0 not loaded → degradedToFtsOnly=true, FTS-only result - rerank 5xx → degradedSkippedRerank=true, RRF fallback - rerank 401 (auth) → re-thrown; operator must fix API key - empty query → throws (programmer error) Suppression: both FTS-side and semantic-side default to excludeSuppressed. Rerank input is post-suppression union, so no post-rerank filter needed. NOT YET WIRED into lcm_grep tool. Next commit (C.02b) extends the tool with mode='hybrid' that calls runHybridSearch with summaryStore.searchSummaries adapted to FtsHit shape. Coverage: 8 tests (vec0-gated, mock fetch — NO live API): - merges FTS + semantic, rerank produces top-N - dedupe overlap (FTS + semantic both find same doc) - vec0 unavailable → FTS-only with degraded flag - rerank 500 → RRF fallback with degraded flag - rerank 401 → re-thrown - rerank=false explicit → RRF mode, no Voyage rerank call - empty query rejected - no candidates → empty hits Tests: 1082 → 1090 (+8). Resolves: foundation for hybrid retrieval. Used by C.02b (lcm_grep mode='hybrid') AND C.04 (lcm_synthesize_around window_kind='semantic').

…paths (C.03) v4.1 §10 invariant: every agent-facing retrieval surface defaults to exclude-suppressed. Adds `WHERE suppressed_at IS NULL` to four search code paths in SummaryStore: 1. searchFullText (FTS5 path) — alias `s.suppressed_at IS NULL` 2. searchLike (LIKE-fallback path) — `suppressed_at IS NULL` 3. searchCjkTrigram (CJK FTS path) — alias `s.suppressed_at IS NULL` 4. searchRegex — `suppressed_at IS NULL` These four functions back the existing `lcm_grep` tool's regex / full_text modes (and the new C.02b hybrid mode via the ftsSearch callback). Suppressed leaves now never surface to agents through any search-side path. The vec0 retrieval surfaces (semantic-search, hybrid-search) already filter via metadata pre-filter (vec0 `suppressed=0`) AND defense-in- depth JOIN to summaries.suppressed_at IS NULL. Both layers are independently tested. What this DOESN'T change: - getSummary(id), getSummaryParents/Children/Subtree, getSummaryMessages, context-item reads — these are structural lookups used by lineage / expansion / assembler. The architecture's "7 read paths" cascade handles them by suppressing-at-source (assembler builds context from latest non-suppressed leaves; expansion respects contains_suppressed_leaves flag for condensed). A per-method excludeSuppressed default param refactor was considered but deferred. - lcm-doctor / lcm-command operator paths — operator tooling intentionally sees ALL rows including suppressed (for cleanup, audit, doctor checks). Coverage: 4 new tests (LIKE/full_text path, regex path, restore-on- unsuppress, multiple-suppression). Tests: 1090 → 1094 (+4). Resolves: v4.1 §10 invariant for SummaryStore search paths.

Wires the semantic-search service from src/embeddings/ into a new agent-callable tool. lcm_semantic_recall is the purely-semantic counterpart to lcm_grep; agents use it for paraphrastic queries that exact-match FTS would miss. Hybrid (keyword + semantic) is reserved for lcm_grep mode='hybrid' (Group C.02b). The tool resolves conversation scope via the existing resolveLcmConversationScope helper, parses since/before like lcm_grep, and gracefully degrades when sqlite-vec is missing or when VOYAGE_API_KEY is not set — both surfaces return jsonResult errors that direct the agent back to lcm_grep instead of throwing. A small public getDb() accessor is added to LcmContextEngine so tools can call runSemanticSearch(db, opts) directly without plumbing a new dependency through the LcmDependencies surface. Mirrors the existing getRetrieval() / getConversationStore() / getSummaryStore() pattern. Manifest contracts.tools updated to match the new register call site (guarded by manifest.test.ts). Tests cover input validation (empty query, bad timestamps, missing scope), graceful degradation (vec0 unavailable, missing API key), happy path with mocked Voyage fetch, conversationId scope filter, and since/before passthrough — vec0-dependent tests skip cleanly when the extension isn't installed. Refs: architecture v4.1 §13.

… collision (B.fix2) Resolves Group B adversarial-pass HIGH/BLOCKER findings: ## Gap 1 (BLOCKER) — backfill heartbeat vs Voyage retry budget src/embeddings/backfill.ts: was using Voyage client's default retry + timeout (3 retries × 60s = ~4 min worst-case per batch). With WORKER_LOCK_TTL_MS=90s, a stuck batch can let another worker GC the lock and start backfilling the same docs → Voyage double-bill + duplicate vec0 rows (auxiliary cols have no UNIQUE constraint to catch this). Fix: introduce `voyageMaxRetries` default = 1 + `voyageTimeoutMs` default = 30s in BackfillOptions. Worst-case per batch now: 2 attempts × 30s + ~0.5s backoff ≈ 60.5s Comfortably under 90s lock TTL → another worker can't preempt mid-batch. Caller can override either knob (e.g. for first-run backfill where contention is low and longer Voyage tolerance is acceptable). Tests that need to surface 5xx immediately use voyageMaxRetries: 0. ## Gap 2 (HIGH) — slug collision silently corrupts KNN src/embeddings/store.ts: registerEmbeddingProfile() didn't check that the new model_name's sluggified form was already in use. Two profiles like `voyage-4-large` and `voyage_4_large` both sluggify to `voyage4large` → same vec0 table → inserts from both profiles route to one table → KNN cross-contaminates. Fix: scan existing profiles for slug equality BEFORE INSERT OR IGNORE. Throws with explanatory message identifying the existing model_name that already owns the slug. The existing `MODEL_NAME_PATTERN = /^[A-Za-z0-9._-]{1,64}$/` allows `-`, `_`, `.` — all of which are stripped by sluggification — so false-collision risk is real, not hypothetical. ## Gap 8 (LOW, folded in) — dim upper bound consistency ensureEmbeddingsTable rejects dim > 4096; registerEmbeddingProfile had no upper bound, leaving an orphaned profile if caller did register-then-ensure. Aligned both functions to reject dim > 4096 in registerEmbeddingProfile too. ## Coverage: 8 new tests in v41-group-b-fix2.test.ts - Slug collision rejected: dash↔underscore↔dot↔case variants - Genuinely-different slug allowed - Re-registering same model still idempotent - Collision detection order-independent - Dim > 4096 rejected (matching ensureEmbeddingsTable) - Dim = 4096 accepted (boundary) - Backfill default voyageMaxRetries=1 (proven by call count = 2) - Backfill caller can override voyageMaxRetries: 0 Tests: 1094 → 1112 (+18 — also includes 10 from C.01b subagent). Group B adversarial Gaps 3-7 (3 MED + 1 LOW remaining) are doc/comment polish; deferred to cycle-2 review.

Extends lcm_grep with a third mode='hybrid' that blends FTS + semantic vector search via Voyage rerank. The schema enum picks up the new value, and the tool description points agents at lcm_semantic_recall for purely-semantic exploration so the two surfaces stay distinguishable. The hybrid path delegates to runHybridSearch (src/embeddings/), passing a small adapter that wraps summaryStore.searchSummaries(mode:'full_text' sort:'relevance') and hydrates the snippets back to full FtsHit shape via a single batched SELECT against summaries by summary_id. We could have piped each hit through getSummary, but the IN(...) batch is one round-trip and the values we need (session_key, content, token_count, created_at, conversation_id) are already on the row. Output format mirrors the regex/full_text branch — same '## LCM Grep Results' header, '**Mode:** hybrid' line, conversation scope + time filter — but with hybrid-specific extras: - per-hit provenance flag: [from FTS+semantic] / [from FTS only] / [from semantic only] - rerank/RRF score - degraded warnings: '*(semantic search unavailable; degraded to FTS-only)*' when vec0 is missing, '*(rerank failed; using RRF fusion fallback)*' when rerank network errors and we fall back to reciprocal-rank-fusion Auth errors from Voyage surface as a jsonResult error message that points the agent at mode='full_text' as the keyword-only fallback. Tests cover schema enum + description metadata, the degraded-vec0-missing path (FTS-only mode with the warning + FTS-only provenance flag), happy path with mocked Voyage embed + rerank (mixed provenance flags + score-ordered hits), and the rerank-failed RRF fallback path. Refs: architecture v4.1 §13.

Versioned prompt templates per (memory_type, tier_label, pass_kind). Append-only — old versions stay archived (active=0); new versions inserted with active=1, previous-active row deactivated atomically. Backed by lcm_prompt_registry (created in A.04, NULL-tier UNIQUE patched in B.fix Gap 2). Schema: (prompt_id PK, memory_type, tier_label NULLABLE, pass_kind, version, template, model_recommendation, active, bundle_version, notes) API: - getActivePrompt(db, {memoryType, tierLabel, passKind}) → PromptRecord | null - getPromptById(db, promptId) → PromptRecord | null (used by synthesis-cache to verify the prompt_id is still current or look up the archived version that was used) - registerPrompt(db, opts) → string (the new prompt_id) Atomic: deactivates previous + inserts new in BEGIN IMMEDIATE. Auto-versions (max(version) + 1 within triple). - listActivePrompts(db) → for /lcm health - bumpBundleVersion(db) → for voice-consistency rebuilds NULL tierLabel handling: matched literally (not coerced to "") in both lookup and update. Aligns with B.fix Gap 2's NULL-safe UNIQUE index on (memory_type, COALESCE(tier_label, ''), pass_kind, version) — the registry treats NULL and '' as DIFFERENT for purposes of routing, even though the UNIQUE index treats them as the same for collision detection. Why versioning matters for cache invalidation: lcm_synthesis_cache (D.02 next commit) will FK on prompt_id. When a prompt is updated: - Old cache entries reference the now-archived prompt_id → stale - New synthesis calls write rows with the new prompt_id → fresh - Cache invalidation can be SELECTIVE (only entries with archived prompt_id need rebuild) — never touches durable summaries.content Coverage: 11 tests - register + getActivePrompt happy path - re-register same triple deactivates previous + bumps version - per-triple version isolation (different triples independent) - NULL tierLabel matched literally - getActivePrompt returns null when none registered - promptIdOverride respected - modelRecommendation/bundleVersion/notes round-trip - listActivePrompts excludes archived - bumpBundleVersion increments active prompts only - atomic transaction rolls back on PK collision Tests: 1112 → 1123 (+11). Resolves: foundation for v4.1 §3 synthesis. Next (D.02): synthesis dispatch that uses this registry for prompt selection.

Extends the lcm_describe summary payload with two fields agents need when reasoning across session families: - sessionKey: pulled from the parent conversations row (which holds the same value as summaries.session_key per the Gap 8 / B.05 atomic-write invariant). The SummaryRecord public store API doesn't carry session_key through, so retrieval.describeSummary() fans out a parallel conversationStore.getConversation(conversationId) alongside the existing parents/children/messages/subtree fetches. Empty string when the parent conversation has no session_key. - timeRange: a normalized {earliestAt, latestAt, createdAt} struct that mirrors the three time fields already present on the summary. Convenience for callers that prefer one bracket over three siblings. Both fields are also surfaced in the text rendering — the meta line now carries 'sessionKey=...' and 'created=...' alongside the existing 'range=earliest..latest', so agents inspecting summaries get the session affiliation and creation time visible without parsing the JSON details. Tests cover both the populated path (sessionKey appears verbatim, timeRange struct round-trips through details) and the empty path (sessionKey rendered as '-' for missing values). Refs: architecture v4.1 §13.

…D.02) Per-tier dispatch on top of D.01's prompt registry. Picks model + pass strategy per tier label, runs the LLM call(s), records every pass to lcm_synthesis_audit, returns final synthesized text. Per-tier strategies (per architecture-v4.1 §3 + literature consensus that critique-revise underperforms single-pass for summarization): daily → single-pass (mini model) weekly → single-pass (mid model) monthly → single + verify_fidelity (premium model) — verify_fidelity prompt asks "are there claims in the summary that aren't in the source?" — separate model call, returns 'OK' or 'HALLUCINATION: <details>' yearly → best-of-N (N=3) + judge (premium-thinking) — N candidates run in parallel; judge prompt picks the best by index (0..N-1) custom → single-pass (mid model) filtered → single-pass (mid model) Default models: claude-haiku-4-5 (daily), claude-sonnet-4-5 (weekly, custom, filtered), claude-opus-4-7 (monthly), claude-opus-4-7-thinking (yearly). Override per-prompt via lcm_prompt_registry.model_recommendation or per-call via SynthesizeRequest.{modelOverride, forceModel}. API: - dispatchSynthesis(db, llmCall, req: SynthesizeRequest) → Promise<SynthesizeResult> - LlmCall is INJECTED — production wires to existing pi-ai infrastructure (Group F integration); tests inject deterministic mocks. Keeps dispatch decoupled from the existing summarize.ts (which is geared to per-leaf compaction in the gateway hot path — different concerns). SynthesizeRequest covers: tier, memoryType, sourceText, target (summary_id OR cache_id), passSessionId (groups multi-pass audit rows), bestOfN override (yearly), model overrides. SynthesizeResult: output, primaryPromptId, audit IDs, total latency, total cost cents, hallucinationFlagged (monthly), bestOfN detail (yearly: n + selectedIndex + all candidates). Audit trail: every pass writes a 'started' row up-front (forensic record even if LLM crashes mid-call), then UPDATEs to 'completed' or 'failed' with output + latency + cost + last_error. Error handling: - missing_prompt: thrown if the (memoryType, tier, single|judge) triple has no active prompt registered. Operator must register via /lcm command (Group F) or seed in deployment. - llm_failure: re-thrown after writing audit row with status='failed' and last_error set. Caller (synthesis worker) decides whether to retry or surface to operator. - judge_failure: yearly tier judge returned malformed output (no digit, or out-of-range). Indicates a bad judge prompt — the candidate outputs are intact in audit rows for manual recovery. Template rendering: simple {{source_text}}, {{tier}}, {{memory_type}} substitutions for the primary template; {{candidate_summary}} for verify; {{candidates}} (rendered as numbered list) for judge. Coverage: 16 tests - DEFAULT_MODEL_BY_TIER + PASS_STRATEGY_BY_TIER constants - daily / weekly: single-pass, audit row, default model - monthly: single + verify; hallucinationFlagged true vs false vs skipped (no verify prompt) - yearly: 3 candidates + judge picks 1; bestOfN=5 override; judge output without digit → judge_failure; missing judge prompt → missing_prompt - missing primary prompt → missing_prompt - LLM call exception → llm_failure + audit row.status='failed' + last_error captured - prompt model_recommendation overrides tier default - forceModel + modelOverride wins - template substitution Tests: 1130 → 1146 (+16; subagent's C.05 already merged). Resolves: foundation for v4.1 §3 synthesis. Next (D.03): eval harness for measuring retrieval recall + synthesis quality on Eva's stratified N=100 query corpus.

Heuristic gate before procedure clustering. Most leaves are conversational; only a small fraction look like procedures. We pre-filter by the SHAPE of the content (not by FTS verb regex, which 3 adversarial agents flagged as too noisy + many false negatives). Three structural signals (compose with OR): numbered-steps — 3+ lines starting with "1.", "Step 1:", "1)", "(1)", etc. Strict counting (no "1. ... only 2 ...") Score weight: 0.4 command-block — 2+ shell-command-shaped lines: - $-prompt, ❯-prompt, %-prompt, > -prompt - lines inside ```bash/sh/zsh/shell``` fences - lines starting with recognized tools (git/npm/pnpm/yarn/docker/kubectl/terraform/aws/ gcloud/az/gh/cargo/python/node/psql/mysql/redis-cli) Score weight: 0.4 how-to-marker — 2+ unambiguous markers like "how to ", "the procedure for ", "steps to ", "in order to ", "first/then/finally,". Conservative — single marker is too noisy (lots of conversational uses). Score weight: 0.3 A leaf is a clustering CANDIDATE if any one signal fires. The score (sum of fired weights, capped at 1) is exposed for downstream ranking — Group E's clustering call may threshold on it. API: - prefilterContent(content) → {isCandidate, signals[], score} - prefilterLeaves<T>(leaves[]) → only the candidate rows, with {signals, score} attached Pure module: no DB, no LLM, no async. Safe to call inline. Coverage: 18 tests - numbered-steps: markdown, "Step N:", "N)", insufficient count, prose with embedded numbers - command-block: $ prompt, fenced bash, line-start tool names, single-command rejection - how-to-marker: 2+ markers fire, single marker doesn't - composite: multi-signal stack, score cap at 1, plain conversation - input edges: empty, undefined, null - prefilterLeaves batch helper Tests: 1146 → 1164 (+18). Resolves: foundation for v4.1 §6.2 procedure clustering. Next (E.02): clustering pass that runs ml-hclust over candidate leaves' embeddings.

…tim per-hit cap Two related changes in lcm-grep-tool.ts. Methodology: Research → Run → Debate → Decide. Both flipped after adversarial review caught my mistakes. # F5 — wrapper migration Adversarial review counted 12 untapped return paths total (across grep + describe), not the 4 I claimed. In grep alone: - Line 392: regex/full_text success - Lines 590, 598, 604: hybrid error returns (in runHybridLcmGrep) - Line 661: hybrid success - Lines 761, 774, 779: semantic error returns (in runSemanticLcmGrep) - Line 854: semantic success - Line 1063: verbatim success Spot-tap was whack-a-mole. Wave-9 → Wave-12 has hit the same antipattern twice already. The structural fix is the wrapper migration. Removed: inline `evaluateNeedsCompactGate` + 4 `tapResultForTokenAccounting` calls in execute body (early-error paths). Added: single `runWithTokenGate` wrapper around the entire body. All return paths — including helper functions' internal error returns — now flow through the wrapper's auto-tap exit. Single return funnel, can't skip a tap. # F6 — verbatim per-hit content cap (5K chars) Live-DB validation showed 5/5 plausible verbatim queries leak 6-12× the markdown disclosure via `details.hits[].content`: markdown caps at 25-33K chars while details carries 200-385K chars per call. Empirical single hits up to 200K chars exist (5× the entire markdown budget). Adversarial review caught my original "metadata-only details" (Option D) recommendation as factually wrong: I had claimed "verified zero callers" but actual grep found 20+ active callers including: - test/lcm-grep-verbatim-mode.test.ts (canonical contract test) - test/v41-five-questions.test.ts (entire Type-C citation suite) - test/v41-adversarial-scenarios.test.ts (defense-in-depth regressions) - scripts/v41-qa-runner.mjs (live-DB harness, "critical" severity) Decision flipped to Option A: keep `content` field but cap each hit at 5K chars, slice `details.hits` to `renderedRowCount` (rows actually emitted into markdown). 5K is the 96th percentile of message lengths in the observed corpus — typical messages fit fine, the long-tail tool-output dumps get capped with `contentTruncated: true` + `fullContentLength` flag pointing at lcm_describe(messageId, expandMessages=true) for the full body. New fields in details: - truncated: bool (markdown loop broke early) - hits[i].contentTruncated: bool (this hit's content was capped) - hits[i].fullContentLength: number (so caller can decide if follow-up via lcm_describe is worth it) # Tests 10 verbatim tests pass (was 8): 2 new invariants pin the cap behavior + the renderedRowCount slicing. - "INVARIANT: per-hit content cap at 5K chars + truncation flags" - "INVARIANT: details.hits sliced to renderedRowCount when markdown truncates" The 20+ existing callers all still pass (verified): they assert against substrings + messageIds, not full-content equality. LOC: ~50 (F5 wrapper migration) + ~30 (F6 cap + flags) + ~50 (new tests). Documents: - /tmp/adversarial-f5.md - /tmp/adversarial-f6.md - /tmp/decision-phase2-final.md - /tmp/research-f2-f6-data.md (F6 message-length distributions) - /tmp/validation-f2-f5-f6.md (F6 dual-channel leak measurements)

Wave-12 reviewer F4 landed the suppression-aware aggregate CTE in lcm_get_entity AND lcm_search_entities via parallel edits — byte-identical SQL maintained in two places, a parallel-edit drift hazard. The first-principles-architectural-decision methodology run (research + adversarial debate + reach-for analysis) chose Option B (extract shared helper) over Option A (merge into lcm_entity { mode }) for the entity axis: - Both adversarial agents independently recommended B (helper) over A - Reach-for v1 (25 scenarios) found search_entities orphaned (0 reaches) but reach-for v2 (30 scenarios incl. browse/fuzzy F1-F5) found it REACHABLE when scenarios target its niche (3 first-reaches on F1, F2, F4) - The original "consolidate" verdict was a scenario-coverage artifact, not tool orphaning. Both tools have earned their keep. Helper at src/tools/lcm-entity-shared.ts exports: - VISIBLE_MENTIONS_CTE — the WITH visible_mentions AS (...) clause - entityAggCte({ includeFirstIn }) — the , entity_agg AS (...) clause, with the get-entity-only first_in column toggleable Both tools now build their query as: ${VISIBLE_MENTIONS_CTE}${entityAggCte({ includeFirstIn: true|false })} SELECT ... FROM lcm_entities e JOIN entity_agg ea ON ... WHERE ... Surface unchanged. Tests unchanged (20/20 pass). Documents: - /tmp/research-entity-consolidation.md (Step 1) - /tmp/step2-entity-consolidation-options.md (Step 2) - /tmp/adversarial-entity-A.md, /tmp/adversarial-entity-C.md (Step 3) - /tmp/reach-for-analysis.md (Step 1.7 v1) - /tmp/reach-for-analysis-v2.md (Step 1.7 v2)

…ic' (9→8 tools) # Wave-12 consolidation SA — final ship The first-principles-architectural-decision methodology run produced a nuanced verdict for tool consolidation. The semantic axis got consolidated; the entity axis did not. ## Decision: drop lcm_semantic_recall, fold capabilities into lcm_grep Reach-for analysis (Step 1.7) showed: - v1 (25 scenarios): 0 first-reaches for lcm_semantic_recall - v2 (30 scenarios incl. F1-F5 browse/fuzzy/cost-cheap): 1 narrow first-reach - Even with its tailor-made F5 scenario, it only barely beat lcm_grep mode='semantic'. No durable niche. Code archeology (Step 1.5) found the introducing commit `1e09df9` itself admitted "lcm_semantic_recall kept distinct (**same cost** as mode='semantic'; both exposed for clarity per challenger C2 verdict)." The "for clarity" justification was invalidated by circular descriptions that defer to each other ("for purely-semantic exploration prefer lcm_semantic_recall" inside lcm_grep, vs "reserve lcm_semantic_recall for purely semantic exploration" inside recall). Changes: 1. **Schema**: added `summaryKinds` filter to lcm_grep (was the only recall-only differentiator). Honored only by mode='semantic' / 'hybrid'; ignored elsewhere. 2. **Implementation**: deleted src/tools/lcm-semantic-recall-tool.ts. Plumbing through runSemanticLcmGrep already shared underlying `runSemanticSearch` + confidence-band logic. 3. **Manifest**: removed from openclaw.plugin.json. 9 → 8 tools. 4. **Plugin index**: removed import + registerTool call. 5. **needs-compact-gate.ts**: removed lcm_semantic_recall case in estimateResultTokens (folded into lcm_grep semantic estimator). 6. **Tests**: removed lcm-semantic-recall-tool.test.ts; updated 4 tests that referenced recall (parity-invariants, adversarial-scenarios, five-questions, tool-budget-guardrail) to use lcm_grep mode='semantic'. 7. **Description fix**: lcm_grep description no longer cross-defers to recall; tells the agent semantic mode is the standalone pure-vector path with optional summaryKinds filter. ## Decision: KEEP lcm_search_entities (axis-different from earlier plan) Reach-for v1 had also flagged lcm_search_entities as orphaned (0 first-reaches in 25 scenarios). v2 with F1-F5 added flipped this: - F1 (browse all entities of a type): reached for lcm_search_entities - F2 (fuzzy-name lookup): reached for lcm_search_entities - F4 (filter by entity_type): reached for lcm_search_entities - 3 first-reaches across F-scenarios where the description fits The original v1 zero was a SCENARIO COVERAGE artifact — THE_FIVE_QUESTIONS was biased toward expert queries that already named the canonical entity. Adding browse/fuzzy/type-filter scenarios revealed the tool serves a real niche. Eva's intuition that the v1 reach-for picture was incomplete was correct. Description rewrite leads with the browse-first niche so the gravity matches the just-validated reach-for. ## Tests - 1587 tests pass (was 1599; net -12 from deleted recall test file and consolidated parity tests) - 0 new TS errors (671 vs pre-fix baseline 679 — actually -8 from deleting recall tool's compile errors) - Live DB harness: all substantive checks pass (semantic, hybrid, suppression cascade, extraction). The 3 reported "fails" are the pre-existing "corpus already fully embedded" no-op messages. ## Ancillary changes - Added F1-F5 scenarios to THE_FIVE_QUESTIONS.md (browse / fuzzy-name / vague-summary / type-filter / paraphrastic-cheap) - Baked F1-F5 into scripts/v41-qa-runner.mjs as permanent test coverage - Updated lcm_search_entities to allow empty `query` when `entityType` is provided (browse-by-type use case the new description promises) - Updated operator-facing log messages in lcm-command.ts and semantic-infra-init.ts to drop stale lcm_semantic_recall references ## Methodology lesson (encoded into the skill) Step 1.7 (reach-for validation) MUST be paired with scenario-coverage audit. Tool absence in reach-for ≠ tool orphaning. Could be scenario gap. Verify by adding scenarios that exercise the tool's claimed niche before declaring it dead. Documents: - /tmp/research-entity-consolidation.md, /tmp/research-semantic-consolidation.md (Step 1) - /tmp/step2-entity-consolidation-options.md, /tmp/step2-semantic-consolidation-options.md (Step 2) - /tmp/adversarial-{entity-A,entity-C,semantic-SA,semantic-SB}.md (Step 3, 4 of 5) - /tmp/ripple-id-prefix-consolidation.md (Step 3 ripple analysis) - /tmp/reach-for-analysis.md (Step 1.7 v1) - /tmp/reach-for-analysis-v2.md (Step 1.7 v2 — verdict C)

…, stale refs) Wave-1 audit (8 parallel agents over today's 25 commits + 5200 LOC delta) surfaced 2 P0 + 6 P1 + several P2/P3 findings. This commit batches the P0 + P1 fixes; P2/P3 to follow. # P0 — QA runner crashes on startup (W1A5 + W1A6 converged) `scripts/v41-qa-runner.mjs` still imported the deleted `lcm-semantic-recall-tool.js` and had 4 `tool: "lcm_semantic_recall"` case strings. Runner exited with ERR_MODULE_NOT_FOUND before parsing any args. Bug introduced when F1-F5 added without re-running qa-runner. Fixes: - Drop the deleted-tool import (line 227-229) - Migrate 4 cases (smoke-semantic-cosine-band, smoke-filtered-knn-windowed, adv-low-confidence-warning, adv-cosine-on-entity-only) to `lcm_grep` with `mode: "semantic"` + `pattern` arg # P0/P1 — F5 + F3 QA predicates (W1A6 NEW) F5 predicate had inverted logic: `if (r.error) return "errored:"` short-circuited BEFORE the graceful-degradation regex check. Since qa-runner flattens `r.error = r.details.error` (line 1132), the Voyage-unavailable allowance was unreachable — F5 always failed in offline mode. F3 had no LLM-unavailable allowance like A-cases do, so synthesize_around without summarizer creds always failed F3. Fixes: - F5 checks regex BEFORE bare error; matches v41-tool-parity-invariants pattern - F3 allows `summarization model|summaryModel|summaryProvider|LCM_SUMMARY_MODEL` errors as graceful degradation # P1 — inferTokenBudget bypass for unknown models (W1A1) `inferTokenBudget` recognized only ~7 model families (opus-4-5/6/7, gpt-5.4/5.5, sonnet-4-5/6, haiku). For every other model — gpt-4 / gpt-4o / claude-3.x / o1 / Gemini / Mistral / Ollama — it returned `undefined`, which `evaluateNeedsCompactGate` treats as a bypass signal. needsCompact gate was silently disabled for the majority of operators outside the recognized list. Fix: conservative 200K default for unknown models. Per-call MAX_RESULT_CHARS still bounds worst case at 10K tokens. Tests expanded to cover gpt-4 / gpt-4o / claude-3-5-sonnet / o1-preview. # P1 — summaryKinds plumbing on hybrid mode (W1A5) Schema description claimed "Honored only by mode='semantic' / 'hybrid'" but the dispatch only passed `summaryKinds` to `runSemanticLcmGrep`. The hybrid branch silently ignored it — documented-but-broken contract. Fix: resolve `summaryKindsParam` once at the dispatch, pass to both helpers. `runHybridLcmGrep` now accepts `summaryKinds` in its options and threads through to `runHybridSearch` (which already supports it). The FTS-arm closure already post-filtered on summaryKinds (line 576); the semantic-arm via runHybridSearch was the only gap. # P1 — Stale lcm_semantic_recall refs (W1A5 + W1A8 converged, ~12 places) Sweep of agent-facing prose, operator scripts, and changeset: - `.changeset/lcm-v41-omnibus.md` — corrected tool list (was missing expand/expand_query/compact; still listed deleted recall) - `docs/agent-tools.md` — Type-B routing table, decision tree, removed recall section + cost-table row - `docs/v4.1/THE_FIVE_QUESTIONS.md` — Type-B header, F3/F5 references, the false "F-scenarios not yet baked into qa-runner" note - `docs/v4.1/PR_DESCRIPTION.md` — Mermaid routing diagram, recall section header (now points to mode='semantic'), cost table - `docs/v4.1/KNOWLEDGE_DUMP.md` — shipped-tools list, debugging playbook header - `scripts/v41-vs-rollup-comparison.mjs` — operator output prose - `scripts/lcm-tool-call.mjs` — dropped recall dispatch case + JSDoc - `scripts/v41-live-db-harness.mjs` — log line + section header - `scripts/v41-agent-harness-preflight.mjs` — JSDoc Historical references in audit reports (HARNESS_REPORT_2026-05-06.md, TEST_ANTIPATTERNS.md) kept as-is — they document the state at that time, accurate for the audit trail. # Verification - `npm test` → 1587/1587 pass (no regressions) - QA runner now starts and runs 34/35 cases successfully - 1 remaining failure: `smoke-filtered-knn-windowed` — pre-existing consolidation regression where mode='semantic' returns 0 hits when queried with a seed leaf's own content + tight ±1h window. Same code path as the other 5 passing semantic cases. Investigation deferred to Wave 2. - F1-F5 ALL PASS # Deferred - P2 fixes: post-compact cache reset, LCM_TOOL_RESULT_TOKEN_BUDGET to other 7 tools, estimator HARD_CAP env honor, comment drift - P3 cosmetic batch - `smoke-filtered-knn-windowed` regression investigation

…k leaves Wave-12 audit traced the lone smoke failure to a brittle test, not a consolidation regression. The test picked the latest-mtime embedded leaf as a query seed; today the latest leaf was an `[LCM fallback summary — model unavailable]` (summarizer was down when it was written). Fallback text has no specific semantic neighbors in a tight ±1h window, so the test always returned 0 hits regardless of the semantic pipeline being healthy. Fix: pick a seed with real content (`NOT LIKE '[LCM fallback summary%'` + length > 200). Stable across snapshots regardless of recent summarizer health. Verification: smoke 8/8 pass. Underlying runSemanticSearch contract unchanged — the prior `lcm_semantic_recall` would have hit the same brittle-test issue if the snapshot contained a fallback as latest leaf.

Wave-2 cross-cutting audit (4 parallel agents: token-state-integration, schema/suppression, test/manifest/harness, fresh-eyes) caught 2 P0s + 1 P1 the per-file Wave-1 sweep missed. P0 — token-state cache + accounting bus - Post-compact stale cache: noteSuccessfulCompact() clears the entry on successful lcm_compact so the very next wrapped call re-bootstraps from the post-compact ground truth instead of refusing on the stale pre-compact snapshot. Without this, the agent could loop compact→ refuse→compact until the 2/5min cap blocks further attempts. - lcm_synthesize_around was OFF the runWithTokenGate accounting bus — the prior "self-protecting via 50K source cap" comment covered SOURCE input bounds, not OUTPUT (4K-8K markdown rollup flowed past the cache silently and drifted gate decisions low). Wrapped it; wired getRuntimeContext through registration in src/plugin/index.ts. P1 — runWithTokenGate error path - Tool throws (e.g. "LCM engine is unavailable" — present in 6+ tools + 13 throw sites in lcm_expand_query) skipped tapResultForTokenAccounting entirely. The runtime-serialized error message DOES cost tokens, so the cache drifted low by exactly the size of the error message every time. Added try/catch tap-then-rethrow. Manifest drift fix - registerTool comment placement: moved the W2A1 P0 #2 comment from between `=>` and `createLcmSynthesizeAroundTool` (where the manifest test's regex /=>\s*\{?\s*(?:return\s+)?(create...)/ couldn't match) to ABOVE the api.registerTool block. Re-runs 8/8 against the manifest. Cosmetic - README tool inventory: removed lcm_semantic_recall line, added lcm_compact + Wave-12 SA consolidation note (was: 9 listed minus 1 removed but +1 missing = count cancels out, hidden bug). - THE_FIVE_QUESTIONS.md: coverage 22/25 → 27/30 (post F1-F5 addition). - 7 stale lcm_semantic_recall comment refs in src/embeddings/semantic-search.ts, src/engine.ts, src/store/summary-store.ts, src/tools/lcm-synthesize-around-tool.ts, test/v41-stress-fixture.test.ts, test/v41-tool-budget-guardrail.test.ts. Verified - 1587/1587 vitest passing (Wave-2 batch added regressions for the new noteSuccessfulCompact + try/catch tap behaviors). - 35/35 QA harness against live-DB snapshot at \$0.11; F1/F4 args swap fix confirmed (F1 catalog browse, F4 PR filter).

…describe cap W1A1 #2 — estimator HARD_CAP was hard-coded at 10_000 but the per-tool char cap (LCM_TOOL_RESULT_TOKEN_BUDGET) is operator-tunable. With env raised to 30K, tools could emit 30K but the gate's projection still capped at 10K — needsCompact decisions drifted low (refusals missed when they should fire) by up to 3×. W1A8 #3 — lcm_describe was truly unbounded. Worst case (Wave-12 estimator already noted this in a code comment): a single describe(condensed_id, expandChildren=true) on a wide condensed could emit ~210K tokens (10K base + 20×10K children). Sub-agent grant ledger (consumeTokenBudget, Wave-9 P1) protected delegated sessions; main- agent calls had no per-tool char cap. Single source of truth - New src/plugin/result-budget.ts owns the env knob resolution. Exports: - MAX_RESULT_TOKENS — used by needs-compact-gate as HARD_CAP_TOKENS - MAX_RESULT_CHARS — used by tools for truncation - truncationNotice(reasonHint) — standard message format - needs-compact-gate.ts pulls HARD_CAP from MAX_RESULT_TOKENS so the estimator and per-tool cap stay in lockstep. - lcm-grep-tool.ts drops its local resolveMaxResultChars (now imports from result-budget). Behavior identical at the default; no change to truncation messages. (Existing per-grep messages preserved.) lcm_describe truncation - truncateLinesToCap helper at top of file. Mirrors lcm_grep's pattern: walk lines, accumulate char count (incl. join newlines), append the truncation notice and stop when over cap. - Applied at both return sites (summary describe + file describe). - details.manifest.truncated boolean flag exposed for programmatic callers; details.truncated on the file branch. Tests (6 new, total 15 in suite) - env=30000 → MAX_RESULT_TOKENS=30K, MAX_RESULT_CHARS=120K, estimator projection rises above 10_000 for verbatim mode (proves no longer pinned at the old hard-coded ceiling) - env unset → 10_000 default - env=100 → clamped UP to 2_000 floor (anti-misconfig) - env=garbage → falls back to 10_000 default - describe with 30K-char content + env=2000 → bounded under 10K + emits truncation marker - describe with small content → emits full content, no truncation marker Verified - 1593/1593 vitest passing (was 1587, added 6 regression tests)

Wave-12 found 9 of 10 bugs that escaped 1593 tests. Each bug was hidden by a distinct antipattern. This commit adds 4 new test layers that pin the antipatterns so each bug class fails LOUDLY on regression. A. Wiring/registration smoke (14 tests) - test/v41-tool-wiring-smoke.test.ts - For each tool documented as wrapped in needs-compact-gate.ts: assert the factory file calls runWithTokenGate(. For each documented-exempt tool: assert it does NOT call runWithTokenGate(. Catches the W2A1 P0 bug class (synthesize_around silently dropped off the bus). - For each registered tool in plugin/index.ts: assert getRuntimeContext is wired. Catches the half of the bug where the wrapper is present but not given runtime context. B. Adversarial output bounds (3 tests) - test/v41-adversarial-output-bounds.test.ts - lcm_get_entity with 200 mentions × 1000-char surface_forms: bound check - lcm_search_entities with 500 entities × 200-char canonical: bound check - lcm_search_entities respects schema-bounded limit even with caller=500 - Catches W1A8 #3 sister cases (any tool that emits content without per-tool char cap). C. Cross-module invariants (6 tests) - test/v41-cross-module-invariants.test.ts - estimateResultTokens projection ceiling === MAX_RESULT_TOKENS (caller-tunable env knob). Catches the W1A1 #2 bug class where two modules pin the same constant in isolation and drift apart. - MAX_RESULT_CHARS = MAX_RESULT_TOKENS × 4 ratio - REFUSAL_THRESHOLD calibration sanity vs MAX_RESULT_TOKENS - Every src/tools/lcm-*-tool.ts factory referenced in plugin/index.ts - summaryKinds reaches BOTH semantic and hybrid dispatch (W1A5 #1 schema-vs-implementation drift) - Sub-agent expansion-auth gate consistency (lcm_expand + lcm_describe both consult same manager) D. QA-runner antipattern static scan (26 tests) - test/v41-qa-runner-antipatterns.test.ts - Extracts each `expect: (r) => {...}` closure from qa-runner.mjs. For tools with external deps (Voyage / LLM), assert the graceful- degradation regex check appears BEFORE bare `if (r.error) return`. Catches the W1 F5 bug class (inverted predicate making graceful branch dead code). - Pins F1 has no entityType filter (catalog browse) AND F4 has entityType: pr_number (W1 F1/F4 args swap regression). Verified - 1642/1642 vitest passing (was 1593, +49 new tests; 0 bugs surfaced by the new layers — the patterns pin the existing post-Wave-12 fixes rather than uncovering new issues).

… notes Retro review of today's 4 fix commits (f9a15d9, ae55691, a9f10cf, 37cdabb) ran 4 parallel agents through the first-principles methodology and surfaced 4 issues + 3 unknown-unknowns from my self-audit. This commit closes the in-scope fixes; A1 (LcmConfig promotion) deferred to a focused follow-up PR. L1 — estimator self-contradiction (already-shipped bug) - needs-compact-gate.ts:162 returned 3_000 tokens for synthesize_around - but the file's docstring at line 26-29 documents synthesize OUTPUT as "4K-8K tokens of LLM-generated rollup" - 3000 was a ~50% under-estimate vs documented behavior; the estimator was a self-contradiction in the same file - Fixed: returns 6_000 (midpoint of the 4K-8K range). Added regression test that pins estimate ∈ [4K, 8K] with a comment tying the two sides together so a future docstring drift breaks both. N2 — agent-facing contract drift on details.truncated - lcm_describe shipped truncated flag at details.manifest.truncated for the summary branch but details.truncated for the file branch; asymmetric placement - lcm_grep didn't expose details.truncated at all (regex/full_text + hybrid + semantic paths) despite emitting the same truncation prose - Generic "did this tool truncate?" callers had no consistent field to read across the surface - Fixed: standardize top-level details.truncated as the canonical agent-facing contract field; mirrored across describe (top-level + manifest dup for back-compat) and all 3 lcm_grep paths. M1 — inferTokenBudget per-provider defaults - 200K uniform default for unknown models was too generous for sub-200K models (gpt-4 8K-32K, ollama 8K-128K). Gate stayed silent until projected context was 6× over real budget on these. - Adversarial agent argued: "if real budget is 8K and assumed is 200K, the gate fires at projected 184K but real engine OOMs at 8K" - Fixed per-family defaults: - claude-3.x family → 200K (was already) - gpt-4o / o1 family → 128K (was 200K) - Gemini 1.5/2.x → 1M (was 200K) - Legacy gpt-4 → 32K (was 200K — this was the worst gap) - Ollama / Mistral / OpenRouter / unknown → 32K floor (was 200K) - Added LCM_DEFAULT_TOKEN_BUDGET env var as escape hatch for operators on larger unrecognized models (clamped to [8K, 2M] sanity range). - 7 new tests cover the per-family branches + env override behavior. N1 + N3 — documentation of contract invariants - token-state.ts: declare "tools may import named lifecycle hooks; do NOT reach into tokensBySession directly. lcm_compact's import of noteSuccessfulCompact is the precedent — future cache-aware tools follow the same pattern, do not add new ones to underlying map." - result-budget.ts: declare truncationNotice() prose is now agent-facing contract (test regex pinned + tool descriptions reference it); cosmetic edits will silently break tests AND surprise agents. A1 — follow-up note (deferred to separate PR) - result-budget.ts: declare the architectural inconsistency in a comment (this module bypasses the resolveLcmConfigWithDiagnostics pattern that every other LCM env knob uses). Follow-up PR should promote to LcmConfig.toolResultTokenBudget. Inline note flags it for visibility until then. Verified - 1649/1649 vitest passing (was 1642, +7 new tests for M1 + L1 regression)

…ig field Wave-12 retro flagged the architectural inconsistency: every LCM env knob flows through `resolveLcmConfigWithDiagnostics` (env→pluginConfig→default) with diagnostics + plugin.json schema + docs row. `LCM_TOOL_RESULT_TOKEN_BUDGET` was the only knob bypassing this pattern. Closes the inconsistency in the same PR that introduced result-budget.ts. LcmConfig integration - src/db/config.ts: new optional field `toolResultTokenBudget?: number` on LcmConfig; resolution `env.LCM_TOOL_RESULT_TOKEN_BUDGET` → `pluginConfig.toolResultTokenBudget` → undefined (default applied downstream). Standard precedence pattern, mirrors `maxAssemblyTokenBudget`. result-budget.ts converted to live bindings - Module-level `MAX_RESULT_TOKENS` and `MAX_RESULT_CHARS` are now `let` exports (was `const`). ESM live-binding semantics: consumers with `import { MAX_RESULT_CHARS }` see updates from inside the module. - New `applyResultBudgetConfig(toolResultTokenBudgetFromConfig)` setter. Called from plugin init AFTER `resolveLcmConfigWithDiagnostics` runs. No-op when env was set at module load (env wins, same as every other LcmConfig field). - Module load resolves env-only (no config available yet); the setter raises the cap if env wasn't set but config is. - Two test helpers: `__resolveResultTokenBudgetFromEnvForTesting()` for re-resolving env without config; `__resetResultBudgetForTesting()` for `afterEach` resets when a test calls applyResultBudgetConfig. Plugin init wiring - src/plugin/index.ts: import + call `applyResultBudgetConfig(config.toolResultTokenBudget)` immediately after `resolveLcmConfigWithDiagnostics`. Idempotent. Operator-facing surface - openclaw.plugin.json: uiHint + JSON schema entry (minimum: 2000) - docs/configuration.md: new row in the assembly/budgets table - truncationNotice() prose updated to mention BOTH the env knob and the LcmConfig field as cap-raising paths Tests (4 new in cross-module-invariants) - env wins over config when both are set - config honored when env unset - both unset → undefined (default applied downstream) - applyResultBudgetConfig updates live bindings when env wasn't set Verified - 1653/1653 vitest passing (was 1649, +4 new precedence tests)

100yenadmin · 2026-05-08T17:56:13Z

@jalehman this is tested, soaked and works. I recommend trying it on your end for a few days (use voyage free embeddings takes 1 hour in background to fully embed and doesn't affect usage)

…agent drills down via lcm_describe(file_xxx) Squashed v4.2 patch applied directly onto main (independent of PR Martian-Engineering#613). Same feature, same tests, same Opus-validated behavior — just rebased onto the v3.x main baseline so maintainers can review/test v4.2 without needing Martian-Engineering#613 to land first. Architecture: per-row sidecar `messages.large_content` stores the externalized `file_xxx` id pointing to a payload file in `large_files` (existing v4.1 storage table). Assembler replaces evictable tool-result rows with the v4.1 `[LCM Tool Output: file_xxx | tool=… | N bytes]` reference + `Tool: <name> | Command: <input>` disambiguator (via `exploration_summary`). Drilldown via existing `lcm_describe(id="file_xxx")`. Empirical bench (live-DB snapshot, conv 0cb8928b, 258K budget): baseline: 333 items / 252,288 tokens / 0 stubs v4.2: 689 items / 257,849 tokens / 86 stubs → ~2× wall-clock context coverage (74min → 130min) at same budget. → tool_result count identical (101 in both); v4.2 doesn't displace tool outputs, it stubs heavy ones and reuses budget for older history. Drilldown validation (Claude Opus 4.1 subagent A/B): - Conversational summary ("what did we work on?"): substantive answer, zero tool calls needed, no confabulation. - Specific elided-content probe (with tool_input disambiguator): found correct fileId, wrote correct lcm_describe(id="file_xxx"), refused to fabricate. Quote: "the command string contained sed -n '1,260p' scripts/evaos-support/selfheal.sh literally — that's an unambiguous keyword match. The mapping was one grep away." What's NOT stubbed: - Fresh tail (last ~64 turns / 24K tokens) — agent's working memory - Assistant turns — narrative of what was done is always intact - Tool messages without large_content — legacy/unmigrated rows - Tool messages whose runtime role degraded to assistant — phantom drilldown risk avoided Default OFF (config.stubLargeToolPayloads=false). Architecturally additive (new column + new on-disk file path), reversible (UPDATE messages SET large_content = NULL + rm -rf storage-dir + flag off). Mitigations evaluated through first-principles-architectural-decision skill (research / run-the-system / where-it-lives / adversarial debate at ≥95% confidence): REJECT all four (recency cue, semantic stub wrapping, empty-assistant collapsing, resolution markers). Decision record in audit/v42-bench/DECISION-mitigations.md. Tests: 868/868 pass on main (added 5 new v4.2 unit tests including end-to-end drilldown round-trip). Files: src/db/migration.ts — ensureMessageLargeContentColumn (idempotent ALTER) + busy_timeout src/store/conversation-store.ts — MessageRecord.largeContent + projection src/assembler.ts — buildToolPayloadStub + applyStubSubstitution + ResolvedItem.fileId src/engine.ts — config.stubLargeToolPayloads forwarded src/tools/lcm-describe-tool.ts — strengthened description for [LCM Tool Output:] pattern scripts/lcm-blob-migrate.mjs — idempotent, chunked, busy_timeout-protected migration scripts/v42-assemble-bench.mjs — token/item bench scripts/v42-drilldown-harness.mjs — real-LLM drilldown harness (OpenRouter) test/v42-stub-tier.test.ts — 5 unit tests (boundary, pairing, legacy, multi-block, drilldown round-trip) Companion PR: stacked-on-Martian-Engineering#613 version at Martian-Engineering#626.

…rebased on main, independent of #613) (#628) Squash merge PR #628 after local verification and required checks passed.

100yenadmin · 2026-05-31T21:05:55Z

Maintainer triage correction: park this as v4.1 feature-stack work, not as a P1 bug-fix candidate.

The previous comment over-promoted this because the PR is large and can hide important work. Per maintainer direction, the v4.1 stack is a separate feature beast and should be skipped during the current P0/P1/P2 bug triage unless a standalone core bugfix is extracted from it.

Current state still matters if the stack is resumed: dirty branch with failing/skipped checks. For the current cleanup lane, the next step is not deep review of this PR; it is to identify any non-v4.1 core bugfix hidden inside it and split that into a focused issue/PR.

Eva added 30 commits May 6, 2026 00:47

chore: remove stray Group B adversarial-review sanity scripts

a9a3e40

100yenadmin mentioned this pull request May 7, 2026

Epic: LCM-native Continuity Capsules / Saved Workstates electricsheephq/lossless-claw-test#68

Open

Eva added 3 commits May 8, 2026 01:33

100yenadmin mentioned this pull request May 7, 2026

(ignore) feat(v4.2): stub-tier stratification — externalize old tool results, agent drills down via lcm_describe(file_xxx) #626

Draft

5 tasks

Eva added 2 commits May 8, 2026 02:43

100yenadmin mentioned this pull request May 7, 2026

feat(v4.2): stub-tier stratification — externalize old tool results (rebased on main, independent of #613) #628

Merged

3 tasks

Eva added 2 commits May 8, 2026 03:12

100yenadmin mentioned this pull request May 7, 2026

Epic: LCM-native Continuity Capsules / Saved Workstates #629

Open

Eva added 2 commits May 8, 2026 13:06

100yenadmin changed the title ~~feat(lcm): v4.1 — agent memory that actually works (replaces #516; companion #616 deferred)~~ feat(lcm): v4.1 —LCM V2 (replaces #516; companion #616 deferred) May 8, 2026

100yenadmin changed the title ~~feat(lcm): v4.1 —LCM V2 (replaces #516; companion #616 deferred)~~ feat(lcm): v4.1 —LCM V2(replaces #516; companion #616 deferred) May 8, 2026

100yenadmin changed the title ~~feat(lcm): v4.1 —LCM V2(replaces #516; companion #616 deferred)~~ feat(lcm): v4.1 —LCM V2 (replaces #516; companion #616 deferred) May 8, 2026

jalehman pushed a commit that referenced this pull request May 11, 2026

feat(v4.2): stub-tier stratification — externalize old tool results (…

13780e9

…rebased on main, independent of #613) (#628) Squash merge PR #628 after local verification and required checks passed.

This was referenced May 13, 2026

feat: Codex OAuth profile + interceptCompaction — accordion cadence for autonomous loops #665

Open

feat: v4.1 foundation — schema + concurrency primitives #719

Open

100yenadmin mentioned this pull request May 20, 2026

feat: v4.1 — agent compaction + remaining v4.1 infrastructure (last in stack) #732

Open

3 tasks

100yenadmin added enhancement New feature or request priority:P1 High-impact bug/security/stability issue labels May 31, 2026

100yenadmin mentioned this pull request May 31, 2026

Lossless Claw Issue and PR Triage Report - 2026-05-30 #771

Open

100yenadmin added priority:P4 Low-priority enhancement, docs, or cleanup stale-check Stale issue/PR being checked with the original reporter and removed priority:P1 High-impact bug/security/stability issue labels May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lcm): v4.1 —LCM V2 (replaces #516; companion #616 deferred)#613

feat(lcm): v4.1 —LCM V2 (replaces #516; companion #616 deferred)#613
100yenadmin wants to merge 110 commits into
Martian-Engineering:mainfrom
electricsheephq:feat/lcm-v4.1-omnibus

100yenadmin commented May 5, 2026 •

edited

Loading

Uh oh!

100yenadmin commented May 8, 2026

Uh oh!

100yenadmin commented May 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

100yenadmin commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LCM v2 (iteration experiments v4.1) — Lossless Agent Memory

Table of contents

TL;DR — what merges and why

The problem this solves

1. Compression of compression

2. No way to ask sideways questions

3. Stale-output trap

The decision

Architecture

Storage pyramid (the lossless bedrock)

Agent tool routing — 5 question types → 8 tools

Suppression cascade (the "soft purge" mechanism)

Synthesis dispatch (per-tier model selection)

Concurrency model (gateway vs worker)

The 8 agent tools

lcm_grep — multi-modal search

lcm_semantic_recall — pure semantic search

lcm_synthesize_around — time-anchored synthesis (the lcm_recent replacement)

lcm_describe — drilldown by ID

lcm_expand — sub-agent-only deep expansion

lcm_expand_query — main-agent wrapper for delegated expansion

lcm_get_entity — entity catalog lookup by canonical name

lcm_search_entities — entity catalog browse by query

Why Voyage embeddings

Phase A spike data (real eval, not gut feel)

Why voyage-4-large + rerank-2.5

Why not OpenAI / Cohere / local

Cost reality

Worker auto-ticks

Backfill autostart

Entity coreference autostart

Lock semantics

Operator commands

Cost discipline

What was CUT and why

Test infrastructure

8 of 9 antipattern classes have automated detection

Test layer cost profile

THE_FIVE_QUESTIONS as executable tests

Synthesis quality — closed via mock LLM

Audit history

Verification

Test counts (final)

Live-DB verification (real corpus)

QA runner against real DB

Sample mutation testing

Migration safety

Operator setup walkthrough

Non-goals

Related PRs

Reviewer checklist

Uh oh!

100yenadmin commented May 8, 2026

Uh oh!

100yenadmin commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

100yenadmin commented May 5, 2026 •

edited

Loading

`lcm_grep` — multi-modal search

`lcm_semantic_recall` — pure semantic search

`lcm_synthesize_around` — time-anchored synthesis (the `lcm_recent` replacement)

`lcm_describe` — drilldown by ID

`lcm_expand` — sub-agent-only deep expansion

`lcm_expand_query` — main-agent wrapper for delegated expansion

`lcm_get_entity` — entity catalog lookup by canonical name

`lcm_search_entities` — entity catalog browse by query

100yenadmin commented May 31, 2026 •

edited

Loading