feat(lcm): v4.1 —LCM V2 (replaces #516; companion #616 deferred)#613
feat(lcm): v4.1 —LCM V2 (replaces #516; companion #616 deferred)#613100yenadmin wants to merge 110 commits into
Conversation
First commit of the v4.1 omnibus implementation. Smallest possible slice: introduces the cross-process concurrency model module and the `lcm_worker_lock` table that enables a sidecar worker process for cold maintenance work (condensation, extraction, embedding backfill, theme consolidation, eval, profile rebuild). Resolves v4.1.1 amendment A9 (`last_heartbeat_at` column required by §0.5 fallback rule: gateway can take over only when BOTH `expires_at < now` AND `last_heartbeat_at < now - 300s`). Changes: - src/concurrency/model.ts (NEW) — single source of truth for §0 invariants, busy_timeout constants, worker job-kind catalogue, and defensive assertion helpers (assertForeignKeysEnabled, assertBusyTimeoutForRole). Documents the no-LLM-in-write-tx invariant and the worker_threads heartbeat requirement (v4.1.1 A9). - src/db/migration.ts (+25 lines) — new `ensureLcmWorkerLockTable` migration step. Idempotent CREATE TABLE IF NOT EXISTS, runs after FTS setup, before the BEGIN EXCLUSIVE COMMIT. - test/concurrency-model.test.ts (NEW, 10 tests) — verifies invariant ordering (worker timeout < gateway, TTL ≥ 3× heartbeat, fallback soak > TTL), job-kind catalogue, and assertion helpers. - test/lcm-worker-lock.test.ts (NEW, 4 tests) — verifies migration creates the table with the right columns (including A9's last_heartbeat_at), is idempotent, supports basic acquire/heartbeat, and supports stale-lock GC. Verification: - npm run build: passes - npm test --run: 48 files / 872 tests passing (up from 858 baseline, +14 new tests, zero regressions) - Live DB ground-truth check: ran the new DDL against a copy of /Users/lume/.openclaw/lcm.db (2.5GB, 762 conversations, 3771 leaf summaries). Migration succeeds; existing data untouched; acquire pattern works; PK conflict throws as expected. Notes: - Code-as-ground-truth pivot: per the v4.1.1 plan, each commit cites the amendment(s) it resolves and is verified against live data. - v4.1.1 A6 finding (PRAGMA foreign_keys = OFF on Eva's CLI test) partially superseded: src/db/connection.ts:configureConnection() already sets it ON for every connection that goes through the standard path. The new assertForeignKeysEnabled() is a defensive guardrail for future code paths that bypass configureConnection.
…_feature_flags (A.02)
Resolves v4.1.1 amendments A2 (suppress_reason + superseded_by columns)
and A8 (feature-flag storage). Adds the v3.1 columns the v4.1 spec
depends on (session_key, suppressed_at, entity_index,
contains_suppressed_leaves) since v3.1 never shipped to upstream.
Changes:
- src/db/migration.ts (+104 LOC):
- ensureSummaryV41Columns(db) — adds 7 columns to summaries via the
existing PRAGMA table_info / ADD COLUMN pattern (matches
ensureSummaryDepthColumn / ensureSummaryMetadataColumns / etc.):
session_key TEXT NOT NULL DEFAULT '' (v3.1 A1)
suppressed_at TEXT (v3.1 A3)
entity_index TEXT (v3.1 §7.2)
contains_suppressed_leaves INTEGER NOT NULL DEFAULT 0 (v3.1 A3)
suppress_reason TEXT (v4.1.1 A2)
superseded_by TEXT REFERENCES summaries (v4.1.1 A2/A4)
ON DELETE SET NULL
leaf_summarizer_cap_was INTEGER (v4.1)
- ensureMessageSuppressedAtColumn(db) — adds messages.suppressed_at
(v3.1 A3 cascade target for lcm_quote / lcm_factcheck filtering)
- ensureLcmFeatureFlagsTable(db) — clean new table
`lcm_feature_flags(flag PK, value NOT NULL, updated_at NOT NULL)`
- lcm_worker_lock TEXT PK explicitly NOT NULL (SQLite legacy quirk
allows NULL in TEXT PK columns without it).
- test/v41-summaries-columns.test.ts (NEW, 12 tests):
- Per-column verifications (NOT NULL, default value, FK target/action)
- lcm_feature_flags schema + basic set/read pattern
- Legacy `lcm_migration_flags` coexistence verified
Verification:
- npm run build: passes
- npm test --run: 49 files / 884 tests passing (+12 from A.01's 872, 0 regressions)
- Live DB ground-truth check on copy of /Users/lume/.openclaw/lcm.db:
summaries 14 → 21 columns; 7 v4.1 cols added.
messages gains suppressed_at; 3774 leaves preserved.
lcm_worker_lock + lcm_feature_flags created.
Eva's legacy lcm_rollups* + lcm_migration_flags untouched.
4187 summaries now have session_key='' (A.08 backfill target).
Code-as-ground-truth findings (revising v4.1.1 spec):
1. v4.1.1 A8 originally said "extend lcm_migration_flags with value column."
That table doesn't exist in upstream src/ — it only exists on Eva's
live DB from old fork-side code. Replaced with a clean new
`lcm_feature_flags` table. Eva's legacy table stays alongside, untouched.
2. v4.1.1 A6 (PRAGMA foreign_keys = OFF) is partly misleading: the
codebase's src/db/connection.ts:configureConnection() already sets
foreign_keys = ON for every connection through the standard path.
Eva's earlier sqlite3 CLI test was using a different connection, not
the production path. The new src/concurrency/model.ts already provides
assertForeignKeysEnabled() as a defensive guardrail.
3. SQLite TEXT PRIMARY KEY columns do NOT auto-enforce NOT NULL (legacy
behavior). Both new tables (lcm_worker_lock, lcm_feature_flags) now
have explicit NOT NULL on their PK column. Caught by tests.
4. SQLite ADD COLUMN with REFERENCES requires NULL default — verified
`superseded_by TEXT REFERENCES summaries(summary_id) ON DELETE SET NULL`
works as ALTER TABLE ADD COLUMN (no NOT NULL allowed). Documented in
ensureSummaryV41Columns docstring.
… + audit (A.03)
Adds the four "support tables" the worker process and operator surface
need before the heavy schema (synthesis cache, embeddings, entities,
themes) lands. Each is a clean idempotent CREATE TABLE IF NOT EXISTS.
Resolves v4.1.1:
- A3 — `lcm_extraction_queue`: gateway atomically inserts a queue row
with every leaf write; worker drains it for entity coreference and
procedure-recheck. CHECK constraint on `kind` ('entity' |
'procedure-recheck'). Indexes on pending (queued_at WHERE picked_at
IS NULL) and dead-letter (attempts >= 5).
- B2 (partial) — `lcm_purge_rebuild_queue`: persistent rebuild queue
for `lcm_purge --immediate`. T1 fires suppression cascade + enqueues;
worker drains using A4 forwarder pattern. Indexes on pending +
purge_session_id.
- B3 (partial) — `lcm_voyage_rate_state`: cross-process rate-limit
budget for Voyage embed + rerank. SQLite serializes BEGIN IMMEDIATE
naturally so gateway + worker coordinate via this shared row. CHECK
constraint on bucket ('embed' | 'rerank'). Seeded with both rows
idempotently (`INSERT OR IGNORE`). Spec note: HTTP call MUST happen
AFTER the COMMIT — wrapping HTTP in BEGIN IMMEDIATE would serialize
every gateway query embed and add 200-2000ms latency.
- §C item — `lcm_session_key_audit`: reversibility log for §2.1 step 1
re-key of 5 legacy convs. Allows operator `/lcm
undo-session-key-rekey <conv_id>` if the spike's identification was
wrong for any of those convs.
Changes:
- src/db/migration.ts (+90 LOC): four `runMigrationStep` blocks added
inline after the v3.1+v4.1 column work from A.02
- test/v41-support-tables.test.ts (NEW, 9 tests): per-table schema
verification (columns, FKs, indexes, CHECK constraints), CHECK
rejection paths, idempotent re-run verification, brief-tx update
pattern verification for rate state
Verification:
- npm test --run: 50 files / 893 tests passing (+9 from A.02's 884,
zero regressions)
- Live DB ground-truth check on copy of /Users/lume/.openclaw/lcm.db:
PRE lcm_ tables: 5 (legacy lcm_migration_flags + lcm_migration_state
+ 3 lcm_rollups* from Eva's fork)
POST lcm_ tables: 9 (5 legacy preserved + 4 new)
voyage rate state seeded with embed + rerank rows
3774 leaves preserved, 762 conversations preserved
Eva's lcm_rollups* untouched (out-of-scope for v4.1; v4.1 replaces
its functionality via lcm_synthesis_cache landing in A.04)
Notes:
- All four FKs use the production summaries / conversations tables;
CASCADE on DELETE is the right semantics (queue/audit rows are
derived; if their parent is genuinely deleted, they should follow).
- Per v4.1.1 A6 (now confirmed code-side): connection.ts already
enforces foreign_keys = ON, so these CASCADEs work in production.
… cache_leaf_refs + synthesis_audit (A.04)
Adds the four-table synthesis layer per v4.1 §3 + §1.3 + v4.1.1 B1/B4.
Tables created in dependency order so FKs work on first run:
prompt_registry → synthesis_cache (FK on prompt_id) → cache_leaf_refs
(FK on cache_id) → synthesis_audit (FK on prompt_id + either summary_id
or cache_id).
Resolves v4.1.1:
- B1 — `lcm_synthesis_audit` schema: pass_output is NULLable (insert
with NULL before LLM call, UPDATE on return). Adds `status` column
('started' | 'completed' | 'failed') for orphan-row tracking. Started-
GC index supports the 1-hour orphan cleanup query.
- B4 — UNIQUE lookup index on `lcm_synthesis_cache` enables cross-
process single-flight via INSERT OR IGNORE pattern (loser of race
reads back in-flight row, polls for status='ready').
- §3 + §1.3 — prompt registry with versioning per (memory_type,
tier_label, pass_kind, version) tuple. Append-only; bundle_version
groups prompt sets for synchronized voice-consistency rebuild.
- §3 — synthesis cache with status='building' single-flight, prompt_id
FK enables prompt-selective invalidation (NEVER touches durable
summaries.content rows — closes v3 design principle 4 violation that
v4 had introduced).
- v3.1 A3 extension — cache_leaf_refs inverse index for proactive purge
on lcm_suppress (cascades both directions: ref deleted when either
cache_id OR leaf_summary_id parent is deleted).
Changes:
- src/db/migration.ts (+150 LOC): four runMigrationStep blocks, all
idempotent, all in dependency order.
- test/v41-synthesis-tables.test.ts (NEW, 14 tests):
- prompt_registry: CHECK constraint enforcement (memory_type, pass_kind),
UNIQUE constraint on (memory_type, tier_label, pass_kind, version)
- synthesis_cache: status + tier_label CHECK enforcement,
INSERT OR IGNORE single-flight pattern (ON CONFLICT DO NOTHING)
- cache_leaf_refs: bidirectional CASCADE behavior verified
- synthesis_audit: pass_output NULLable, started→completed pattern,
CHECK requiring at least one target column, started-GC index exists
Verification:
- npm test --run: 51 files / 907 tests passing (+14 from A.03's 893,
zero regressions)
- Live DB ground-truth check on copy of /Users/lume/.openclaw/lcm.db:
PRE: 5 lcm_ tables (legacy)
POST A.01-A.04 cumulative: 15 lcm_ tables
= 5 legacy preserved + 10 new
(worker_lock, feature_flags, extraction_queue, purge_rebuild_queue,
voyage_rate_state, session_key_audit, prompt_registry,
synthesis_cache, cache_leaf_refs, synthesis_audit)
3774 leaves preserved, 762 conversations preserved.
PRAGMA foreign_keys=1.
Notes:
- DB copies for end-to-end verification moved to /Volumes/LEXAR/lcm-tmp
(the live DB is 2.5GB; /tmp filled up after a few iterations).
- B4 UNIQUE index uses COALESCE(grep_filter, '') so SQLite can index the
expression deterministically (NULL-grep_filter rows would otherwise
not be uniquely-indexed since NULL ≠ NULL in SQL semantics).
… (A.05) Per v4.1 §11 + v4.1.1 (revising v4 design): - N≥100 stratified queries (50% fts-easy, 25% fts-medium, 25% paraphrastic). - 2× empirical SD threshold (calibrate by 5x repeated baseline runs). - Ensemble judge (3 different model families). - Mixed absolute+pairwise scoring per dimension. - Drift index for cumulative regression. - Measures BOTH retrieval_recall AND synthesis_quality (separate metrics per v4.1.1 — closes the v4 gap where eval collapsed them). Tables (dependency order): - lcm_eval_query_set: query set registry (e.g. 'eva-baseline-v2') - lcm_eval_query: per-query rows with stratum CHECK constraint, optional reference_summary for gold-standard comparison, must_not_regress flag for critical Eva queries - lcm_eval_run: per-run rows with separate retrieval_recall_score AND synthesis_quality_score, ensemble judge_models JSON, noise_floor_sd for drift calibration, trigger CHECK constraint - lcm_eval_drift: cumulative-delta drift index per query_set All cascade via FK on query_set_id deletion. Verified: - 52 files / 915 tests passing (+8 from A.04, zero regressions) - Live DB copy: 15 → 19 lcm_ tables. 3774 leaves preserved.
…ions + procedures + intentions (A.06)
Per v4.1 §7 + v4.1.1 B5/B6/B7/B8/B11. Five tables for the extraction
layer (entity coreference + procedures + intentions tracking).
Tables (all idempotent, dependency-ordered):
- lcm_entity_type_registry: freeform entity_type catalogue (Eva domain
has session_key, config_flag, R-XXX agent IDs, error_code, etc. —
no closed CHECK enum, per v4.1.1 §C).
- lcm_entities: simplified schema (no separate aliases table per
v4.1.1 B5; alternate surface forms denormalized into JSON column).
UNIQUE index (session_key, canonical_text COLLATE NOCASE) enables
case-insensitive cross-process single-flight (B4 pattern). FK to
summaries(first_seen_in_summary_id) ON DELETE SET NULL.
- lcm_entity_mentions: tracks each mention site. CASCADE on both
entity_id and summary_id deletion (basis for v4.1.1 §C suppression
cascade — when leaf gets suppressed, mentions cascade-delete).
- lcm_procedures: status lifecycle ('draft'|'active'|'stale'|
'archived'|'deprecated'); extraction_source distinguishes auto
(clustering pipeline) from 'manual' (lcm_remember_procedure tool,
v4.1.1 B8 fix for one-shot procedures).
- lcm_intentions: 3 statuses ('pending'|'fulfilled'|'cancelled' per
B11); resolution_text + resolved_at columns for capture context.
source_leaf_id is NULL-allowed since ON DELETE SET NULL requires it.
Verified:
- 53 files / 929 tests passing (+14 from A.05, zero regressions)
- All 5 tables created, FK + CHECK constraints enforced.
….07)
Per v4.1 §1 + v4.1.1 A5/A7. The MANAGED tables only — vec0 virtual
table itself defers to Group B (requires sqlite-vec extension load,
best-effort per A7's two-transaction pattern).
- lcm_embedding_profile: model registry (model_name PK, dim, active flag,
archive_after for graceful retirement). Group B startup seeds
voyage-4-large after successful sqlite-vec load.
- lcm_embedding_meta: sidecar with composite PK
(embedded_id, embedded_kind, embedding_model) enabling parallel rows
during model-bump cutover. CHECK on embedded_kind ('summary' | 'entity'
| 'theme'). FK to lcm_embedding_profile prevents orphan model refs.
No FK on embedded_id — polymorphic per v4.1.1 §C item; orphan cleanup
via idle pass in Group B.
Verified:
- 54 files / 934 tests passing (+5 from A.06, zero regressions)
…4.1 read patterns (A.08) Per v4.1 — adds 5 partial/composite indexes that the new retrieval + suppression + idle-rebuild paths need. All CREATE INDEX IF NOT EXISTS, all idempotent, all conditional on the v4.1 columns added by A.02. Indexes: - summaries_session_key_kind_latest_idx: cross-conv assemble + retrieval scope filter. Partial WHERE session_key != '' (skips pre-A.09 backfill rows so the index stays compact during the cleanup window). - summaries_suppressed_idx: WHERE suppressed_at IS NOT NULL — small footprint partial index for the suppression filter on every retrieval. - summaries_contains_suppressed_idx: WHERE contains_suppressed_leaves = 1 AND superseded_by IS NULL — §8.1 idle-rebuild candidate scan. - messages_suppressed_idx: WHERE suppressed_at IS NOT NULL — for lcm_quote / lcm_factcheck filtering. - conversations_session_key_v41_idx: WHERE session_key IS NOT NULL — boosts the cross-conv JOIN path that legacy:conv_<id> session_keys use (existing conversations_session_key_active_created_idx is on the active flag too, which legacy convs don't satisfy). Verified: - 55 files / 942 tests passing (+7 from A.07, zero regressions)
…lowup) The optimizer picks full table scan for tiny test datasets (3 rows), not the new index — that's the right query plan for that data size, just not what the test asserted. Index PRESENCE verification (the other 6 tests in this file) covers what unit tests can; index USE in production data shape is verified by A.09's live-DB run-script.
…JOIN backfill (A.09) Per v4.1 §2.1 (universal cleanup; per-user re-keying like Eva's 5-legacy-convs → agent:main:main is OPERATOR-DRIVEN via Group F's `/lcm reconcile-session-keys`, NOT hardcoded into upstream migration). Three idempotent migration steps: 1. backfillConversationSessionKeys: every NULL conversations.session_key gets backfilled to 'legacy:conv_<id>'. Each re-key writes a row to lcm_session_key_audit (deterministic audit_id derived from conv_id ensures idempotent re-runs don't duplicate audit rows). Closes v4.1.1 A5 (NULL collapse to empty bucket would destroy cross-conv identity for legacy data). 2. backfillSummarySessionKeys: every summary still at the A.02 default session_key='' gets backfilled from the parent conversation via JOIN. After step 1 ran, conversations.session_key is non-NULL for all rows. Idempotent: condition is WHERE session_key = '' so already- set rows are preserved. 3. backfillForkRollupsSessionKeys: forward-compat for Eva's fork-side lcm_rollups table (created by PR Martian-Engineering#516, not in upstream src). Only touches the table if it exists AND has session_key column. No-op on fresh upstream installs. Verified on copy of Eva's live DB (/Volumes/LEXAR/lcm-tmp/lcm-test.db): PRE: 762 convs, 522 NULL session_keys, 4 agent:main:main, 0 legacy: POST: 762 convs, 0 NULL, 4 agent:main:main preserved, 522 legacy:conv_* 4187 summary session_key backfills (all summaries now keyed) 522 audit rows recorded 5 legacy convs identified as having leaves (target for Eva's future `/lcm reconcile-session-keys` to merge into agent:main:main) - 56 files / 947 tests passing (+6 from A.08, zero regressions)
… (A.10) Per v4.1 §2.2 — fixes the leaf-summarizer cap bug. The empirical-spike-agent found 543 leaves on Eva's live DB pegged at exactly 2,415 tokens (the LLM hitting the old 2400 default and producing artificially-truncated summaries). This commit raises the default in two places that share the constant: - src/summarize.ts:50 DEFAULT_LEAF_TARGET_TOKENS: 2400 → 4000 - src/db/config.ts:464 fallback default for pc.leafTargetTokens: 2400 → 4000 Comment added to both locations citing the empirical finding so future readers see the rationale. Voyage embedding (Group B) supports 32K input context, so 4000-token leaves are well within budget. Average leaf on Eva's corpus is 1,167 tokens (most leaves don't approach the cap); the change only affects leaves where the source content is dense enough to need it. Existing 543 capped leaves on Eva's DB stay as-is — regenerating them from source messages is expensive (LLM calls) and is operator-driven, not a migration step. Leaves are immutable per v3 design principle 4. Tests: - test/v41-leaf-cap.test.ts (NEW, 3 tests): verifies new constant + rationale comment present - test/config.test.ts: updated existing assertion 2400 → 4000 950/950 tests passing.
Raw fetch wrapper for Voyage AI. We do NOT use the voyageai npm SDK:
v0.2.1 has an ESM resolution bug confirmed during Phase A spike (see
docs/projects/lcm-rollup-overhaul/voyage-spike-results.md).
Two entry points: embedTexts() and rerankCandidates(). Both:
- Send `truncation: false` so over-cap docs are surfaced as 400 errors
rather than silently clipped (lossless invariant — a truncated
embedding produces a vector that doesn't reflect the source, with
no signal in the vector itself that anything was dropped).
- Throw typed VoyageError on every failure mode (auth/bad_request/
rate_limit/server_error/network/unexpected) so callers can react
appropriately. Backfill cron will use `kind` to decide whether to
park, requeue, or surface to operator.
- Retry on 5xx + network errors with exponential backoff (capped 30s).
NOT on 4xx (caller bug — retrying just spends quota).
- Honor Retry-After header on 429 (seconds OR HTTP-date).
- Support mock fetch injection for tests — no module-level state,
no globals, no live API calls in CI.
Token budget constants exported for callers:
- MAX_TOKENS_PER_EMBED_BATCH = 80K (Voyage caps at 120K, tokenizer
counts ~9.5% higher than our token_count, so 80K leaves margin).
- MAX_TOKENS_PER_EMBED_DOC = 30K (voyage-4-large per-doc cap is 32K).
- MAX_TOKENS_PER_RERANK_CALL = 600K (rerank-2.5 per-call total).
Privacy: error messages strip Voyage-echoed input from 400 responses
(some Voyage 400s include the input verbatim — could leak PII to logs
that aren't supposed to see it). Raw responseBody preserved on the
VoyageError for callers that need it.
Coverage: 22 tests, all mock fetch:
- embed happy path (input_type, ordering, empty input, truncation flag)
- rerank happy path (top_k, sorting, id join)
- all 6 error kinds + retry behavior
- VOYAGE_API_KEY env var resolution
Resolves: foundation for v4.1 §13 (embedding generation + reranking).
Next (B.02): per-model vec0 table creation.
…(B.02)
Centralizes all sqlite-vec interaction in src/embeddings/store.ts. Callers
never touch vec0 SQL directly. Reasons documented in module header, but
short version:
1. sqlite-vec is best-effort. tryLoadSqliteVec() searches candidate
paths (env, plugin node_modules, ~/.openclaw/extensions) and returns
boolean. If false, the rest of LCM still works (FTS-only retrieval).
Aligned with v4.1.1 A7 graceful-degrade amendment.
2. vec0 has class-of-column quirks that bite: INTEGER metadata cols
reject JS number literals (need BigInt at the binding site), and
auxiliary cols throw "illegal WHERE constraint" if filtered inside
MATCH queries. Schema choice:
embedding float[<dim>] -- the vector
+embedded_id text -- AUX (never WHERE-filtered)
embedded_kind text -- METADATA (filterable in MATCH)
suppressed integer -- METADATA (filterable in MATCH)
Empirically verified: WHERE on +embedded_kind crashes vec0; WHERE
on plain `embedded_kind text` (metadata) works. Centralizing this
here so future code can't accidentally pick wrong column class.
3. Profile dim is immutable. registerEmbeddingProfile() throws on
mismatch. To switch dim, bump the model name (e.g. add a suffix)
and run cutover — never silently change dim of an existing profile.
API surface:
- tryLoadSqliteVec(db, opts) → boolean
- vec0Version(db) → "v0.1.9" | null
- candidateVec0Paths() → string[] (for diagnostics)
- embeddingsTableName(modelName) → "lcm_embeddings_<slug>"
- embeddingsTableExists(db, modelName) → boolean
- registerEmbeddingProfile(db, modelName, dim)
- ensureEmbeddingsTable(db, modelName, dim)
- recordEmbedding(db, {modelName, embeddedId, embeddedKind, vector,
suppressed?, sourceTokenCount}) — vec0 INSERT + meta UPSERT
- replaceEmbedding(...) — DELETE-then-INSERT (for re-embed)
- deleteEmbedding(...) — for purge cascade
- markEmbeddingSuppressed(...) — UPDATE metadata (works on metadata
cols; would corrupt if used on PARTITION KEY per v4.1.1 finding)
- searchSimilar(db, {modelName, queryVector, k, embeddedKinds,
excludeSuppressed}) — KNN with default exclude-suppressed
- isEmbedded(db, {embeddedId, embeddedKind, modelName}) → boolean
Coverage: 28 tests
- 15 always-on: name validation, candidate paths, graceful degrade,
profile registration with dim mismatch / bad-input rejection
- 13 vec0-gated: load extension, ensure table, record/replace/delete
embedding, KNN with kind filter, KNN with suppression, mark
suppressed flips visibility, two independent models per DB
The vec0-gated suite uses LCM_TEST_VEC0_PATH env var override (or
defaults to /Users/lume/.openclaw/... on dev). vitest.config.ts
overrides $HOME so homedir() inside tests doesn't see the dev install
— this gate accommodates that.
Build: dist/index.js = 708.4kb (was 708.4kb pre-B.02 — empty plugin
import boundary, store module is tree-shaken from index.ts which doesn't
import it yet; gateway picks up via Group B.05 leaf-time embed wire-up).
Tests: 1000 passing (was 972 before B.02; +28 new).
Resolves: foundation for v4.1 §13 (vec0 storage layer).
Next (B.03): AFTER DELETE TRIGGER on summaries → cascades suppression
+ deletion into vec0 (since FK from vec0 → summaries corrupts vec0).
…B.03)
Three new SQLite triggers, each with a specific job:
1. Per-model `lcm_embed_suppress_<slug>` (in src/embeddings/store.ts):
AFTER UPDATE OF suppressed_at ON summaries
WHEN (NEW.suppressed_at IS NULL) != (OLD.suppressed_at IS NULL)
→ mirrors the NULL-vs-not transition into vec0.suppressed metadata
column for the corresponding embedded_id (kind='summary').
Why a trigger: suppression can be set from any path — operator's
/lcm purge, agent tool, manual SQL, future migration cleanup. A
trigger guarantees the cascade by-DB rather than by-convention.
Why metadata col + WHEN clause: the trigger fires only on actual
transitions, not on every other UPDATE; vec0 metadata column is
pre-filterable in KNN MATCH queries (auxiliary cols throw "illegal
WHERE constraint" — verified empirically).
2. Per-model `lcm_embed_delete_<slug>` (in src/embeddings/store.ts):
AFTER DELETE ON summaries
→ DELETE matching vec0 row.
Why a trigger and not FK CASCADE: vec0 corrupts under FK
(v4.1.1 finding from upstream review). Trigger is the only safe
path to keep vec0 + summaries in sync on hard-delete.
3. Shared `lcm_embedding_meta_cleanup_summary` (in src/db/migration.ts):
AFTER DELETE ON summaries
→ DELETE matching lcm_embedding_meta row WHERE kind='summary'.
Why this is in migration not store: lcm_embedding_meta exists once
regardless of how many vec0 model tables exist (it's a cross-model
sidecar). The kind='summary' filter prevents accidental cleanup of
polymorphic entity/theme rows. Entity/theme cleanup triggers will
land in Groups E/G when those embeddings ship.
Per-model triggers are created idempotently when ensureEmbeddingsTable
is called for a model. dropEmbeddingsTriggers() is exported for the
model-archival cutover path (Group F operator surface).
Coverage: 9 new tests (3 always-on, 6 vec0-gated):
- meta-table cleanup trigger only deletes kind='summary' (entity row
untouched)
- meta cleanup trigger is idempotent across re-migration
- suppression cascade NULL → not-NULL hides row from KNN
- un-suppression cascade not-NULL → NULL restores visibility
- WHEN clause skips no-op transitions (NULL → NULL, or content updates)
- delete cascade removes vec0 row + meta row
- two-model setup: cleanup hits both vec0 tables
- dropEmbeddingsTriggers stops cascade firing
- re-creating triggers is idempotent
Live-DB verification: copied Eva's lcm.db (4187 summaries, 762
conversations) to /Volumes/LEXAR; migration completes in 3.9s; meta
cleanup trigger created cleanly.
Tests: 1009 passing (was 1000 before B.03; +9 new).
Resolves: v4.1 §10 suppression cascade for vec0 retrieval surfaces.
Next (B.fix): fold Group A adversarial-pass fixes (Gap 2 NULL UNIQUE
on lcm_prompt_registry; Gap 7 wire concurrency assertions; Gap 9 add
live-DB regression test).
Resolves Gaps 2, 7, 9 from the Group A adversarial code review: Gap 2 (MED) — lcm_prompt_registry NULL tier_label deduplication. SQLite treats multiple NULL values as distinct in UNIQUE constraints, so the original UNIQUE(memory_type, tier_label, pass_kind, version) admits duplicate rows when tier_label IS NULL. The synthesis spec requires singletons-per-version, so add a follow-up migration step (ensureLcmPromptRegistryNullSafeUniqueIdx) that creates a COALESCE-based UNIQUE INDEX. Same pattern is already used for lcm_synthesis_cache_lookup_uniq. The original UNIQUE constraint stays (catches non-NULL collisions); the new index catches NULL collisions. Gap 7 (LOW) — wire assertForeignKeysEnabled into configureConnection. src/concurrency/model.ts already exports assertForeignKeysEnabled(db) but nothing in production calls it. Add a call after the existing PRAGMA foreign_keys = ON in src/db/connection.ts:configureConnection so any future regression that opens a connection without FK enforcement (which would silently degrade every ON DELETE CASCADE in the schema) fails fast. assertBusyTimeoutForRole wiring is intentionally deferred to Group B.05 (worker startup) per the Group A reviewer's recommendation. Gap 9 (MED) — live-DB-shape regression test. All other v41-*.test.ts files start from a fresh :memory: and run the full migration on an empty DB. None tested the migration against a partially pre-existing schema (where conversations / summaries / messages already exist with rows but lcm_* tables don't yet). The Eva-live-DB verification was one-off and not in CI. New test v41-pre-existing-schema-migration.test.ts seeds the upstream pre-v4.1 baseline shape, inserts conversations + summaries + messages, runs runLcmMigrations, and verifies: NULL session_keys are backfilled, audit rows exist, summaries.session_key is JOIN-backfilled, all 21 v4.1 tables exist, the new lcm_prompt_registry_uniq_lookup index exists, and re-runs are idempotent.
Helper module on top of A.01's lcm_worker_lock table. Acquisition is
atomic via PRIMARY KEY uniqueness on (job_kind) — INSERT OR IGNORE
returns 1 if we got it, 0 if someone else holds it.
API:
- acquireLock(db, jobKind, {workerId, ttlMs?, jobSessionKey?, jobMetadata?})
→ boolean. GC's expired locks BEFORE acquiring (≤ datetime('now')
so ttl=0 is immediately reclaimable; race-safe via INSERT OR IGNORE).
- releaseLock(db, jobKind, workerId) → boolean. Only frees if the
workerId matches (prevents accidental cross-worker release).
- heartbeatLock(db, jobKind, workerId, ttlMs?) → boolean. Updates
expires_at + last_heartbeat_at. Returns false if the lock was
preempted (caller MUST abort to avoid double-processing).
- lockInfo(db, jobKind) → LockInfo | null. Used by /lcm health.
- generateWorkerId(role) → string. Format `<role>-<pid>-<ms>-<6hex>`.
Used by Group B.04 backfill cron (next commit) and Groups E (extraction)
+ G (themes consolidation) + worker scaffolding (B.05).
Coverage: 13 tests (single-process acquire/release, TTL+GC behavior,
heartbeat semantics including preemption-detection, metadata round-trip,
multi-kind isolation, generateWorkerId uniqueness).
Tests: 1017 → 1030 (+13).
Resolves: §0 cross-process lock primitive used by all worker jobs.
Next (B.04b): backfill cron module that uses these primitives.
…(E.spike)
Wraps ml-hclust (mljs ecosystem) for use by Group E procedure clustering.
Library choice rationale (full notes in module header):
- ESM-native (this plugin ships ESM only)
- MIT licensed, actively maintained (v4.0.0 published 2025-11-26)
- Small footprint (~48KB unpacked); esbuild tree-shakes most transitive
deps. Bundle delta: 708.7kb → 709.4kb (+0.7KB; index.ts doesn't import
yet — Group E will pull it in)
- Accepts precomputed distance matrix (we pass cosine distance), so we
can do Ward+cosine without hacking the lib's internal euclidean
- Cluster.cut(height) AND Cluster.group(K) both supported, satisfying
both "let dendrogram decide" and "force K" use cases
Architecture choice notes:
- Ward + cosine on precomputed matrix: same approximation scipy gives
you (linkage(method="ward", metric="cosine")). Mathematically loose
(Ward assumes squared Euclidean) but conventional for text embeddings.
Fallback method: "average" (UPGMA) — no Euclidean assumption — if
empirical eval shows wonky merges.
- Pre-normalize each vector once → cosine distance becomes (1 - dot).
Halves the inner-loop cost and centralizes float-drift clamping.
- O(N^2 D) distance build + O(N^3) agnes. For N=2000 D=1024 that's
~few seconds in JS — comfortably within the worker-process budget.
Alternatives considered + rejected:
- hierarchical-clustering-js: 404 on npm
- density-clustering: wrong algorithm family (DBSCAN/k-means only)
- clusterfck: deprecated
- clustering-js: abandoned
API:
- clusterHierarchical({vectors, cutHeight?, numClusters?}) → ClusterResult
Coverage: 11 tests
- empty input, single vector, identical vectors, separable groups
- force-K mode, mixed-dim rejection, non-Float32Array rejection,
cutHeight validation, internal coverage check
- 100-vector perf sanity (<2s)
Built (subagent: a1e8a944580405a69) — research + library survey done in
parallel with Group B.04 work; spec checked + tests verified before
committing.
Tests: 1030 → 1041 (+11).
Resolves: foundation for Group E procedure clustering. Group E will:
(1) pre-filter leaves (structural — numbered steps / commands /
explicit "how to" markers, NOT FTS verb regex)
(2) call clusterHierarchical() over voyage-4-large embeddings
(3) filter to clusters with ≥8 members + LLM-judge confidence > 0.9
(4) write to lcm_procedures with status='active'
…idempotent (B.04b)
Walks unembedded leaves, batches by token budget, calls Voyage, writes
vec0 + meta. Designed as a single-tick API: caller (worker scheduler)
invokes once per tick; the function acquires lcm_worker_lock, processes
up to perTickLimit documents, releases lock, returns BackfillResult.
API:
- runBackfillTick(db, opts) → Promise<BackfillResult>
- countPendingDocs(db, args) → number (for /lcm health and tick-scheduling)
BackfillOptions covers: model + Voyage model dispatch, input_type
(MUST be 'document' for backfill), API key + mock fetch, RPS pacing
(default 0.5 = one call per 2s), batch token cap (default 80K),
per-tick doc cap (default 200), token-count min/max (default 1 .. 30K),
worker_id override (for stable IDs across ticks), onBatchComplete hook
for telemetry, skipLock for tests.
BackfillResult tracks: embeddedCount, skippedOverCap (rows above the
30K cap, requiring operator attention), skipped[] (per-row failures
with kind='voyage_400'/'voyage_other'/'over_cap'), perTickLimitReached
(scheduler reschedules if true), lockNotAcquired (scheduler skips this
tick), voyageTokensConsumed (API usage telemetry), durationMs.
Invariants:
1. NO LLM/network in any DB write tx. Each Voyage HTTP call lives
OUTSIDE the per-batch transaction; rate-state UPDATE (when added
in B.04c follow-up) will be a brief BEGIN IMMEDIATE that COMMITs
before the HTTP call (never holds a write lock through HTTP latency).
2. Single-flight via worker lock — gateway-fallback safe.
3. Resumable — each batch's writes commit independently. Crash
mid-tick loses one in-flight batch worth of Voyage spend at most.
Next tick picks up still-unembedded rows.
4. Idempotent on per-row basis. SELECT pre-filters rows that already
have a non-archived `lcm_embedding_meta` entry; a duplicate-write
would just be a no-op via INSERT OR REPLACE.
5. Suppression-aware: rows where `summaries.suppressed_at IS NOT NULL`
are excluded.
6. Per-tick failure blocklist — failed_summary_ids set excludes them
from subsequent SELECTs within the same tick. Next tick re-attempts
(Voyage may have recovered). Without this, a persistent 400 would
spin the loop until perTickLimit.
7. Auth errors are FATAL — re-thrown so the operator gets surfaced.
Still releases the lock via try/finally.
Heartbeat: lock heartbeat fires every batch. If preempted (heartbeat
returns false), tick aborts cleanly without partial state.
Coverage: 13 tests (all vec0-gated, mock fetch — NO live API):
- basic embed-all, isEmbedded reflects state
- skip suppressed leaves (no Voyage call for them)
- idempotent on second tick (zero new Voyage calls)
- over-cap leaves filtered at SELECT (countPendingDocs verifies)
- perTickLimit caps work + perTickLimitReached flag
- 400 records skipped doc, no abort
- 401 (auth) re-thrown, lock released via finally
- 500 records skipped, continues with other batches
- lockNotAcquired when another worker holds (no Voyage call)
- lock released on success
- lock released even on auth error
- batches packed to maxBatchTokens (greedy bin-pack)
- countPendingDocs accurate
Tests: 1041 → 1054 (+13).
Resolves: foundation for v4.1 §13 backfill — first-run embedding of
existing summaries on Eva's live DB. Group B.05 (next) wires async
leaf-time embed for new leaves so the cron only handles backfill of
the 4187-row corpus, not new ongoing leaves.
….05)
Two pieces, both foundation for Group F's `/lcm worker` operator surface
(later) and to close Group A adversarial-review Gap 8.
## 1. Worker loop (src/concurrency/worker-loop.ts)
Generic single-process worker loop. One Node process running multiple
background jobs cooperatively, single-threaded, each with its own
cadence. Cross-process safety via lcm_worker_lock from B.04a.
API:
- new WorkerLoop(db, {jobs: WorkerJob[], onJobComplete?})
- loop.start() → idempotent, schedules setInterval per job
- loop.stop({gracefulTimeoutMs?: 30000}) → waits for in-flight ticks
- loop.runOnce(kind) → outside-schedule manual tick (used by leaf-write
hooks to nudge backfill, and by `/lcm worker tick` operator command)
- loop.isRunning() / loop.inFlightCount() — for /lcm health
Design choices:
- setInterval (not setTimeout chain): predictable cadence, dispatcher
skips overlapping ticks rather than queuing — extra ticks lose, not
queued forever.
- Errors in jobs captured via onJobComplete, never propagate to loop —
one bad tick doesn't crash the worker.
- generationId guard: stop()-then-start() doesn't run leftover ticks
from the old loop.
- validateJobs() at construction: duplicate kinds + invalid intervalMs
rejected up-front (programmer error).
NOT yet wired into plugin lifecycle. Group F's /lcm worker [start|stop]
operator command will instantiate it with the actual job list. Until
then, the loop is a library — the embedding store + backfill modules
are usable standalone.
NOT using worker_threads. v4.1.1 A9 foresees true heartbeat-isolation
via worker_threads, but that's a future commit. setInterval-driven
dispatch is fine for our cadences (5-60s).
## 2. Leaf-write session_key fix (Gap 8 from Group A adversarial review)
src/store/summary-store.ts:411 — INSERT INTO summaries now atomically
populates session_key from a sub-SELECT of conversations.session_key.
Closes the gap where new summaries inserted between gateway boots had
session_key='' until next boot's JOIN-backfill ran. The COALESCE
defends against (theoretically impossible) NULL conversations.session_key.
This means every newly-written summary IMMEDIATELY participates in
session_key-filtered partial indexes (summaries_session_key_kind_latest_idx
from A.08), without waiting for migration boot.
All 1054 existing tests still pass — change is additive (default still
'' if conversation has no session_key, but the migration ensures every
conv has one).
Coverage: 13 new worker-loop tests
- start/stop idempotency
- schedules at cadence (timing-based)
- two jobs with different intervals
- overlapping ticks skipped (not queued)
- errors in jobs captured + loop continues
- graceful stop waits for in-flight
- graceful stop returns false on timeout
- runOnce returns result, throws on unknown kind, throws on in-flight
- validates duplicate kinds + bad intervalMs
Tests: 1054 → 1067 (+13).
Resolves: foundation for v4.1 §0 worker scheduling + Group A Gap 8.
Group B is now complete (B.01 Voyage client, B.02 vec0, B.03 cascade
triggers, B.fix polish, B.04a worker-lock, B.04b backfill cron, B.05
worker loop + session_key fix). Next: Group B adversarial pass, then
Group C retrieval (hybrid lcm_grep, lcm_semantic_recall).
… join (C.01)
Wraps the embed-query → vec0 KNN → JOIN-back-to-summaries flow used by
both `lcm_semantic_recall` (Group C) AND the hybrid mode of `lcm_grep`
(C.02). Centralizing here so the two callers can't drift on suppression
semantics, kind filtering, or session-key scope.
API:
- getActiveEmbeddingModel(db) → {modelName, dim} | null
Picks active=1 + archive_after IS NULL row, most-recent registered_at
on ties (handles model-cutover gracefully).
- runSemanticSearch(db, opts) → Promise<SemanticSearchResult>
Throws SemanticSearchUnavailableError if vec0 not loaded OR no
active profile OR vec0 table missing — caller decides whether to
degrade (FTS-only) or surface error.
SemanticSearchOptions covers: query (text) OR queryVector (precomputed),
session_keys / conversation_ids / since / before / summary_kinds filters,
embedded_kinds default ['summary'], excludeSuppressed default true,
all Voyage knobs (apiKey/fetch/maxRetries/inputType — default 'query'
for asymmetric retrieval).
Suppression filtered at TWO layers (defense in depth — race between
trigger fire and KNN call could leak a stale row through metadata):
1. vec0 metadata `suppressed = 0` pre-filter inside MATCH
2. Final JOIN to summaries WHERE `suppressed_at IS NULL`
session_key scope uses the column populated atomically at write time
per Group A Gap 8 fix (in B.05). conversation_id, time, and kind
filters all bind via parameterized SQL — no injection vectors.
Coverage: 15 tests
- getActiveEmbeddingModel: null when no profile, picks active+
most-recent, excludes archived
- SemanticSearchUnavailableError when vec0 not loaded / no profile
- input validation: requires query OR queryVector; dim mismatch
- happy path: ranked hits, joined content + metadata
- suppression filter (default + opt-in to include)
- session_keys filter restricts to matching sessions
- conversation_ids filter restricts to matching conversations
- since/before time filter
- Voyage call with input_type='query' verified, voyageTokensConsumed
tracked
- summary_kinds filter (leaf vs condensed)
Tests: 1067 → 1082 (+15).
Resolves: foundation for v4.1 §13 retrieval pipeline. Next (C.02):
new lcm_semantic_recall tool + hybrid mode for lcm_grep that calls
this service alongside FTS and merges with Voyage rerank-2.5.
…rank (C.02a)
Combines FTS5 candidates with vec0 KNN candidates, deduplicates by
summary_id, then either:
- Reranks via Voyage rerank-2.5 (default) — produces final relevance
scoring across the union, taking advantage of the spike-validated
+52.5pp lift on paraphrastic queries
- OR reciprocal-rank-fusion (RRF) when rerank=false OR when Voyage
rerank fails (transient 5xx; auth re-thrown for operator surfacing)
API:
- runHybridSearch(db, opts) → Promise<HybridSearchResult>
opts: query, kFts (default 50), kSemantic (default 50), topN (default
20), filters (sessionKeys/conversationIds/since/before/summaryKinds),
excludeSuppressed default true, rerank default true, voyage HTTP knobs.
Caller injects ftsSearch() so this module doesn't take ownership of FTS5
sanitization or hybrid-recency sort logic — that lives in the existing
SummaryStore/RetrievalEngine path.
HybridHit returned with:
- {summaryId, conversationId, sessionKey, kind, content, tokenCount, createdAt}
- score (rerank score OR RRF score)
- fromFts / fromSemantic provenance flags
- semanticDistance (cosine), ftsRank — for diagnostics + caller display
Graceful degrade:
- vec0 not loaded → degradedToFtsOnly=true, FTS-only result
- rerank 5xx → degradedSkippedRerank=true, RRF fallback
- rerank 401 (auth) → re-thrown; operator must fix API key
- empty query → throws (programmer error)
Suppression: both FTS-side and semantic-side default to excludeSuppressed.
Rerank input is post-suppression union, so no post-rerank filter needed.
NOT YET WIRED into lcm_grep tool. Next commit (C.02b) extends the tool
with mode='hybrid' that calls runHybridSearch with summaryStore.searchSummaries
adapted to FtsHit shape.
Coverage: 8 tests (vec0-gated, mock fetch — NO live API):
- merges FTS + semantic, rerank produces top-N
- dedupe overlap (FTS + semantic both find same doc)
- vec0 unavailable → FTS-only with degraded flag
- rerank 500 → RRF fallback with degraded flag
- rerank 401 → re-thrown
- rerank=false explicit → RRF mode, no Voyage rerank call
- empty query rejected
- no candidates → empty hits
Tests: 1082 → 1090 (+8).
Resolves: foundation for hybrid retrieval. Used by C.02b (lcm_grep
mode='hybrid') AND C.04 (lcm_synthesize_around window_kind='semantic').
…paths (C.03)
v4.1 §10 invariant: every agent-facing retrieval surface defaults to
exclude-suppressed. Adds `WHERE suppressed_at IS NULL` to four search
code paths in SummaryStore:
1. searchFullText (FTS5 path) — alias `s.suppressed_at IS NULL`
2. searchLike (LIKE-fallback path) — `suppressed_at IS NULL`
3. searchCjkTrigram (CJK FTS path) — alias `s.suppressed_at IS NULL`
4. searchRegex — `suppressed_at IS NULL`
These four functions back the existing `lcm_grep` tool's regex /
full_text modes (and the new C.02b hybrid mode via the ftsSearch
callback). Suppressed leaves now never surface to agents through any
search-side path.
The vec0 retrieval surfaces (semantic-search, hybrid-search) already
filter via metadata pre-filter (vec0 `suppressed=0`) AND defense-in-
depth JOIN to summaries.suppressed_at IS NULL. Both layers are
independently tested.
What this DOESN'T change:
- getSummary(id), getSummaryParents/Children/Subtree, getSummaryMessages,
context-item reads — these are structural lookups used by lineage /
expansion / assembler. The architecture's "7 read paths" cascade
handles them by suppressing-at-source (assembler builds context
from latest non-suppressed leaves; expansion respects
contains_suppressed_leaves flag for condensed). A per-method
excludeSuppressed default param refactor was considered but deferred.
- lcm-doctor / lcm-command operator paths — operator tooling
intentionally sees ALL rows including suppressed (for cleanup,
audit, doctor checks).
Coverage: 4 new tests (LIKE/full_text path, regex path, restore-on-
unsuppress, multiple-suppression).
Tests: 1090 → 1094 (+4).
Resolves: v4.1 §10 invariant for SummaryStore search paths.
Wires the semantic-search service from src/embeddings/ into a new agent-callable tool. lcm_semantic_recall is the purely-semantic counterpart to lcm_grep; agents use it for paraphrastic queries that exact-match FTS would miss. Hybrid (keyword + semantic) is reserved for lcm_grep mode='hybrid' (Group C.02b). The tool resolves conversation scope via the existing resolveLcmConversationScope helper, parses since/before like lcm_grep, and gracefully degrades when sqlite-vec is missing or when VOYAGE_API_KEY is not set — both surfaces return jsonResult errors that direct the agent back to lcm_grep instead of throwing. A small public getDb() accessor is added to LcmContextEngine so tools can call runSemanticSearch(db, opts) directly without plumbing a new dependency through the LcmDependencies surface. Mirrors the existing getRetrieval() / getConversationStore() / getSummaryStore() pattern. Manifest contracts.tools updated to match the new register call site (guarded by manifest.test.ts). Tests cover input validation (empty query, bad timestamps, missing scope), graceful degradation (vec0 unavailable, missing API key), happy path with mocked Voyage fetch, conversationId scope filter, and since/before passthrough — vec0-dependent tests skip cleanly when the extension isn't installed. Refs: architecture v4.1 §13.
… collision (B.fix2)
Resolves Group B adversarial-pass HIGH/BLOCKER findings:
## Gap 1 (BLOCKER) — backfill heartbeat vs Voyage retry budget
src/embeddings/backfill.ts: was using Voyage client's default retry +
timeout (3 retries × 60s = ~4 min worst-case per batch). With
WORKER_LOCK_TTL_MS=90s, a stuck batch can let another worker GC the
lock and start backfilling the same docs → Voyage double-bill +
duplicate vec0 rows (auxiliary cols have no UNIQUE constraint to
catch this).
Fix: introduce `voyageMaxRetries` default = 1 + `voyageTimeoutMs`
default = 30s in BackfillOptions. Worst-case per batch now:
2 attempts × 30s + ~0.5s backoff ≈ 60.5s
Comfortably under 90s lock TTL → another worker can't preempt mid-batch.
Caller can override either knob (e.g. for first-run backfill where
contention is low and longer Voyage tolerance is acceptable). Tests
that need to surface 5xx immediately use voyageMaxRetries: 0.
## Gap 2 (HIGH) — slug collision silently corrupts KNN
src/embeddings/store.ts: registerEmbeddingProfile() didn't check that
the new model_name's sluggified form was already in use. Two profiles
like `voyage-4-large` and `voyage_4_large` both sluggify to
`voyage4large` → same vec0 table → inserts from both profiles route
to one table → KNN cross-contaminates.
Fix: scan existing profiles for slug equality BEFORE INSERT OR IGNORE.
Throws with explanatory message identifying the existing model_name
that already owns the slug.
The existing `MODEL_NAME_PATTERN = /^[A-Za-z0-9._-]{1,64}$/` allows
`-`, `_`, `.` — all of which are stripped by sluggification — so
false-collision risk is real, not hypothetical.
## Gap 8 (LOW, folded in) — dim upper bound consistency
ensureEmbeddingsTable rejects dim > 4096; registerEmbeddingProfile
had no upper bound, leaving an orphaned profile if caller did
register-then-ensure. Aligned both functions to reject dim > 4096
in registerEmbeddingProfile too.
## Coverage: 8 new tests in v41-group-b-fix2.test.ts
- Slug collision rejected: dash↔underscore↔dot↔case variants
- Genuinely-different slug allowed
- Re-registering same model still idempotent
- Collision detection order-independent
- Dim > 4096 rejected (matching ensureEmbeddingsTable)
- Dim = 4096 accepted (boundary)
- Backfill default voyageMaxRetries=1 (proven by call count = 2)
- Backfill caller can override voyageMaxRetries: 0
Tests: 1094 → 1112 (+18 — also includes 10 from C.01b subagent).
Group B adversarial Gaps 3-7 (3 MED + 1 LOW remaining) are doc/comment
polish; deferred to cycle-2 review.
Extends lcm_grep with a third mode='hybrid' that blends FTS + semantic
vector search via Voyage rerank. The schema enum picks up the new
value, and the tool description points agents at lcm_semantic_recall
for purely-semantic exploration so the two surfaces stay
distinguishable.
The hybrid path delegates to runHybridSearch (src/embeddings/), passing
a small adapter that wraps summaryStore.searchSummaries(mode:'full_text'
sort:'relevance') and hydrates the snippets back to full FtsHit shape
via a single batched SELECT against summaries by summary_id. We could
have piped each hit through getSummary, but the IN(...) batch is one
round-trip and the values we need (session_key, content, token_count,
created_at, conversation_id) are already on the row.
Output format mirrors the regex/full_text branch — same '## LCM Grep
Results' header, '**Mode:** hybrid' line, conversation scope + time
filter — but with hybrid-specific extras:
- per-hit provenance flag: [from FTS+semantic] / [from FTS only] /
[from semantic only]
- rerank/RRF score
- degraded warnings: '*(semantic search unavailable; degraded to
FTS-only)*' when vec0 is missing, '*(rerank failed; using RRF
fusion fallback)*' when rerank network errors and we fall back to
reciprocal-rank-fusion
Auth errors from Voyage surface as a jsonResult error message that
points the agent at mode='full_text' as the keyword-only fallback.
Tests cover schema enum + description metadata, the
degraded-vec0-missing path (FTS-only mode with the warning + FTS-only
provenance flag), happy path with mocked Voyage embed + rerank (mixed
provenance flags + score-ordered hits), and the rerank-failed RRF
fallback path.
Refs: architecture v4.1 §13.
Versioned prompt templates per (memory_type, tier_label, pass_kind).
Append-only — old versions stay archived (active=0); new versions
inserted with active=1, previous-active row deactivated atomically.
Backed by lcm_prompt_registry (created in A.04, NULL-tier UNIQUE
patched in B.fix Gap 2). Schema:
(prompt_id PK, memory_type, tier_label NULLABLE, pass_kind, version,
template, model_recommendation, active, bundle_version, notes)
API:
- getActivePrompt(db, {memoryType, tierLabel, passKind}) → PromptRecord | null
- getPromptById(db, promptId) → PromptRecord | null
(used by synthesis-cache to verify the prompt_id is still current
or look up the archived version that was used)
- registerPrompt(db, opts) → string (the new prompt_id)
Atomic: deactivates previous + inserts new in BEGIN IMMEDIATE.
Auto-versions (max(version) + 1 within triple).
- listActivePrompts(db) → for /lcm health
- bumpBundleVersion(db) → for voice-consistency rebuilds
NULL tierLabel handling: matched literally (not coerced to "") in
both lookup and update. Aligns with B.fix Gap 2's NULL-safe UNIQUE
index on (memory_type, COALESCE(tier_label, ''), pass_kind, version) —
the registry treats NULL and '' as DIFFERENT for purposes of routing,
even though the UNIQUE index treats them as the same for collision
detection.
Why versioning matters for cache invalidation: lcm_synthesis_cache
(D.02 next commit) will FK on prompt_id. When a prompt is updated:
- Old cache entries reference the now-archived prompt_id → stale
- New synthesis calls write rows with the new prompt_id → fresh
- Cache invalidation can be SELECTIVE (only entries with archived
prompt_id need rebuild) — never touches durable summaries.content
Coverage: 11 tests
- register + getActivePrompt happy path
- re-register same triple deactivates previous + bumps version
- per-triple version isolation (different triples independent)
- NULL tierLabel matched literally
- getActivePrompt returns null when none registered
- promptIdOverride respected
- modelRecommendation/bundleVersion/notes round-trip
- listActivePrompts excludes archived
- bumpBundleVersion increments active prompts only
- atomic transaction rolls back on PK collision
Tests: 1112 → 1123 (+11).
Resolves: foundation for v4.1 §3 synthesis. Next (D.02): synthesis
dispatch that uses this registry for prompt selection.
Extends the lcm_describe summary payload with two fields agents need
when reasoning across session families:
- sessionKey: pulled from the parent conversations row (which holds
the same value as summaries.session_key per the Gap 8 / B.05
atomic-write invariant). The SummaryRecord public store API
doesn't carry session_key through, so retrieval.describeSummary()
fans out a parallel conversationStore.getConversation(conversationId)
alongside the existing parents/children/messages/subtree fetches.
Empty string when the parent conversation has no session_key.
- timeRange: a normalized {earliestAt, latestAt, createdAt} struct
that mirrors the three time fields already present on the summary.
Convenience for callers that prefer one bracket over three siblings.
Both fields are also surfaced in the text rendering — the meta line
now carries 'sessionKey=...' and 'created=...' alongside the existing
'range=earliest..latest', so agents inspecting summaries get the
session affiliation and creation time visible without parsing the
JSON details.
Tests cover both the populated path (sessionKey appears verbatim,
timeRange struct round-trips through details) and the empty path
(sessionKey rendered as '-' for missing values).
Refs: architecture v4.1 §13.
…D.02)
Per-tier dispatch on top of D.01's prompt registry. Picks model + pass
strategy per tier label, runs the LLM call(s), records every pass to
lcm_synthesis_audit, returns final synthesized text.
Per-tier strategies (per architecture-v4.1 §3 + literature consensus
that critique-revise underperforms single-pass for summarization):
daily → single-pass (mini model)
weekly → single-pass (mid model)
monthly → single + verify_fidelity (premium model)
— verify_fidelity prompt asks "are there claims in the
summary that aren't in the source?" — separate model
call, returns 'OK' or 'HALLUCINATION: <details>'
yearly → best-of-N (N=3) + judge (premium-thinking)
— N candidates run in parallel; judge prompt picks
the best by index (0..N-1)
custom → single-pass (mid model)
filtered → single-pass (mid model)
Default models: claude-haiku-4-5 (daily), claude-sonnet-4-5 (weekly,
custom, filtered), claude-opus-4-7 (monthly), claude-opus-4-7-thinking
(yearly). Override per-prompt via lcm_prompt_registry.model_recommendation
or per-call via SynthesizeRequest.{modelOverride, forceModel}.
API:
- dispatchSynthesis(db, llmCall, req: SynthesizeRequest)
→ Promise<SynthesizeResult>
- LlmCall is INJECTED — production wires to existing pi-ai
infrastructure (Group F integration); tests inject deterministic
mocks. Keeps dispatch decoupled from the existing summarize.ts
(which is geared to per-leaf compaction in the gateway hot path
— different concerns).
SynthesizeRequest covers: tier, memoryType, sourceText, target
(summary_id OR cache_id), passSessionId (groups multi-pass audit
rows), bestOfN override (yearly), model overrides.
SynthesizeResult: output, primaryPromptId, audit IDs, total latency,
total cost cents, hallucinationFlagged (monthly), bestOfN detail
(yearly: n + selectedIndex + all candidates).
Audit trail: every pass writes a 'started' row up-front (forensic
record even if LLM crashes mid-call), then UPDATEs to 'completed'
or 'failed' with output + latency + cost + last_error.
Error handling:
- missing_prompt: thrown if the (memoryType, tier, single|judge)
triple has no active prompt registered. Operator must register
via /lcm command (Group F) or seed in deployment.
- llm_failure: re-thrown after writing audit row with status='failed'
and last_error set. Caller (synthesis worker) decides whether to
retry or surface to operator.
- judge_failure: yearly tier judge returned malformed output
(no digit, or out-of-range). Indicates a bad judge prompt — the
candidate outputs are intact in audit rows for manual recovery.
Template rendering: simple {{source_text}}, {{tier}}, {{memory_type}}
substitutions for the primary template; {{candidate_summary}} for
verify; {{candidates}} (rendered as numbered list) for judge.
Coverage: 16 tests
- DEFAULT_MODEL_BY_TIER + PASS_STRATEGY_BY_TIER constants
- daily / weekly: single-pass, audit row, default model
- monthly: single + verify; hallucinationFlagged true vs false vs
skipped (no verify prompt)
- yearly: 3 candidates + judge picks 1; bestOfN=5 override; judge
output without digit → judge_failure; missing judge prompt →
missing_prompt
- missing primary prompt → missing_prompt
- LLM call exception → llm_failure + audit row.status='failed' +
last_error captured
- prompt model_recommendation overrides tier default
- forceModel + modelOverride wins
- template substitution
Tests: 1130 → 1146 (+16; subagent's C.05 already merged).
Resolves: foundation for v4.1 §3 synthesis. Next (D.03): eval harness
for measuring retrieval recall + synthesis quality on Eva's stratified
N=100 query corpus.
Heuristic gate before procedure clustering. Most leaves are
conversational; only a small fraction look like procedures. We
pre-filter by the SHAPE of the content (not by FTS verb regex, which
3 adversarial agents flagged as too noisy + many false negatives).
Three structural signals (compose with OR):
numbered-steps — 3+ lines starting with "1.", "Step 1:", "1)",
"(1)", etc. Strict counting (no "1. ... only 2 ...")
Score weight: 0.4
command-block — 2+ shell-command-shaped lines:
- $-prompt, ❯-prompt, %-prompt, > -prompt
- lines inside ```bash/sh/zsh/shell``` fences
- lines starting with recognized tools
(git/npm/pnpm/yarn/docker/kubectl/terraform/aws/
gcloud/az/gh/cargo/python/node/psql/mysql/redis-cli)
Score weight: 0.4
how-to-marker — 2+ unambiguous markers like "how to ", "the procedure
for ", "steps to ", "in order to ", "first/then/finally,".
Conservative — single marker is too noisy (lots of
conversational uses).
Score weight: 0.3
A leaf is a clustering CANDIDATE if any one signal fires. The score
(sum of fired weights, capped at 1) is exposed for downstream
ranking — Group E's clustering call may threshold on it.
API:
- prefilterContent(content) → {isCandidate, signals[], score}
- prefilterLeaves<T>(leaves[]) → only the candidate rows, with
{signals, score} attached
Pure module: no DB, no LLM, no async. Safe to call inline.
Coverage: 18 tests
- numbered-steps: markdown, "Step N:", "N)", insufficient count, prose
with embedded numbers
- command-block: $ prompt, fenced bash, line-start tool names,
single-command rejection
- how-to-marker: 2+ markers fire, single marker doesn't
- composite: multi-signal stack, score cap at 1, plain conversation
- input edges: empty, undefined, null
- prefilterLeaves batch helper
Tests: 1146 → 1164 (+18).
Resolves: foundation for v4.1 §6.2 procedure clustering. Next (E.02):
clustering pass that runs ml-hclust over candidate leaves' embeddings.
…tim per-hit cap
Two related changes in lcm-grep-tool.ts. Methodology: Research → Run →
Debate → Decide. Both flipped after adversarial review caught my
mistakes.
# F5 — wrapper migration
Adversarial review counted 12 untapped return paths total (across grep
+ describe), not the 4 I claimed. In grep alone:
- Line 392: regex/full_text success
- Lines 590, 598, 604: hybrid error returns (in runHybridLcmGrep)
- Line 661: hybrid success
- Lines 761, 774, 779: semantic error returns (in runSemanticLcmGrep)
- Line 854: semantic success
- Line 1063: verbatim success
Spot-tap was whack-a-mole. Wave-9 → Wave-12 has hit the same antipattern
twice already. The structural fix is the wrapper migration.
Removed: inline `evaluateNeedsCompactGate` + 4 `tapResultForTokenAccounting`
calls in execute body (early-error paths). Added: single
`runWithTokenGate` wrapper around the entire body. All return paths —
including helper functions' internal error returns — now flow through
the wrapper's auto-tap exit. Single return funnel, can't skip a tap.
# F6 — verbatim per-hit content cap (5K chars)
Live-DB validation showed 5/5 plausible verbatim queries leak 6-12× the
markdown disclosure via `details.hits[].content`: markdown caps at
25-33K chars while details carries 200-385K chars per call. Empirical
single hits up to 200K chars exist (5× the entire markdown budget).
Adversarial review caught my original "metadata-only details" (Option D)
recommendation as factually wrong: I had claimed "verified zero
callers" but actual grep found 20+ active callers including:
- test/lcm-grep-verbatim-mode.test.ts (canonical contract test)
- test/v41-five-questions.test.ts (entire Type-C citation suite)
- test/v41-adversarial-scenarios.test.ts (defense-in-depth regressions)
- scripts/v41-qa-runner.mjs (live-DB harness, "critical" severity)
Decision flipped to Option A: keep `content` field but cap each hit at
5K chars, slice `details.hits` to `renderedRowCount` (rows actually
emitted into markdown). 5K is the 96th percentile of message lengths
in the observed corpus — typical messages fit fine, the long-tail
tool-output dumps get capped with `contentTruncated: true` +
`fullContentLength` flag pointing at lcm_describe(messageId,
expandMessages=true) for the full body.
New fields in details:
- truncated: bool (markdown loop broke early)
- hits[i].contentTruncated: bool (this hit's content was capped)
- hits[i].fullContentLength: number (so caller can decide if follow-up
via lcm_describe is worth it)
# Tests
10 verbatim tests pass (was 8): 2 new invariants pin the cap behavior +
the renderedRowCount slicing.
- "INVARIANT: per-hit content cap at 5K chars + truncation flags"
- "INVARIANT: details.hits sliced to renderedRowCount when markdown
truncates"
The 20+ existing callers all still pass (verified): they assert against
substrings + messageIds, not full-content equality.
LOC: ~50 (F5 wrapper migration) + ~30 (F6 cap + flags) + ~50 (new tests).
Documents:
- /tmp/adversarial-f5.md
- /tmp/adversarial-f6.md
- /tmp/decision-phase2-final.md
- /tmp/research-f2-f6-data.md (F6 message-length distributions)
- /tmp/validation-f2-f5-f6.md (F6 dual-channel leak measurements)
Wave-12 reviewer F4 landed the suppression-aware aggregate CTE in
lcm_get_entity AND lcm_search_entities via parallel edits — byte-identical
SQL maintained in two places, a parallel-edit drift hazard.
The first-principles-architectural-decision methodology run (research +
adversarial debate + reach-for analysis) chose Option B (extract shared
helper) over Option A (merge into lcm_entity { mode }) for the entity
axis:
- Both adversarial agents independently recommended B (helper) over A
- Reach-for v1 (25 scenarios) found search_entities orphaned (0 reaches)
but reach-for v2 (30 scenarios incl. browse/fuzzy F1-F5) found it
REACHABLE when scenarios target its niche (3 first-reaches on F1, F2, F4)
- The original "consolidate" verdict was a scenario-coverage artifact,
not tool orphaning. Both tools have earned their keep.
Helper at src/tools/lcm-entity-shared.ts exports:
- VISIBLE_MENTIONS_CTE — the WITH visible_mentions AS (...) clause
- entityAggCte({ includeFirstIn }) — the , entity_agg AS (...) clause,
with the get-entity-only first_in column toggleable
Both tools now build their query as:
${VISIBLE_MENTIONS_CTE}${entityAggCte({ includeFirstIn: true|false })}
SELECT ... FROM lcm_entities e JOIN entity_agg ea ON ... WHERE ...
Surface unchanged. Tests unchanged (20/20 pass).
Documents:
- /tmp/research-entity-consolidation.md (Step 1)
- /tmp/step2-entity-consolidation-options.md (Step 2)
- /tmp/adversarial-entity-A.md, /tmp/adversarial-entity-C.md (Step 3)
- /tmp/reach-for-analysis.md (Step 1.7 v1)
- /tmp/reach-for-analysis-v2.md (Step 1.7 v2)
…ic' (9→8 tools)
# Wave-12 consolidation SA — final ship
The first-principles-architectural-decision methodology run produced a
nuanced verdict for tool consolidation. The semantic axis got
consolidated; the entity axis did not.
## Decision: drop lcm_semantic_recall, fold capabilities into lcm_grep
Reach-for analysis (Step 1.7) showed:
- v1 (25 scenarios): 0 first-reaches for lcm_semantic_recall
- v2 (30 scenarios incl. F1-F5 browse/fuzzy/cost-cheap): 1 narrow first-reach
- Even with its tailor-made F5 scenario, it only barely beat lcm_grep
mode='semantic'. No durable niche.
Code archeology (Step 1.5) found the introducing commit `1e09df9`
itself admitted "lcm_semantic_recall kept distinct (**same cost** as
mode='semantic'; both exposed for clarity per challenger C2 verdict)."
The "for clarity" justification was invalidated by circular descriptions
that defer to each other ("for purely-semantic exploration prefer
lcm_semantic_recall" inside lcm_grep, vs "reserve lcm_semantic_recall
for purely semantic exploration" inside recall).
Changes:
1. **Schema**: added `summaryKinds` filter to lcm_grep (was the only
recall-only differentiator). Honored only by mode='semantic' /
'hybrid'; ignored elsewhere.
2. **Implementation**: deleted src/tools/lcm-semantic-recall-tool.ts.
Plumbing through runSemanticLcmGrep already shared underlying
`runSemanticSearch` + confidence-band logic.
3. **Manifest**: removed from openclaw.plugin.json. 9 → 8 tools.
4. **Plugin index**: removed import + registerTool call.
5. **needs-compact-gate.ts**: removed lcm_semantic_recall case in
estimateResultTokens (folded into lcm_grep semantic estimator).
6. **Tests**: removed lcm-semantic-recall-tool.test.ts; updated 4 tests
that referenced recall (parity-invariants, adversarial-scenarios,
five-questions, tool-budget-guardrail) to use lcm_grep mode='semantic'.
7. **Description fix**: lcm_grep description no longer cross-defers to
recall; tells the agent semantic mode is the standalone pure-vector
path with optional summaryKinds filter.
## Decision: KEEP lcm_search_entities (axis-different from earlier plan)
Reach-for v1 had also flagged lcm_search_entities as orphaned (0
first-reaches in 25 scenarios). v2 with F1-F5 added flipped this:
- F1 (browse all entities of a type): reached for lcm_search_entities
- F2 (fuzzy-name lookup): reached for lcm_search_entities
- F4 (filter by entity_type): reached for lcm_search_entities
- 3 first-reaches across F-scenarios where the description fits
The original v1 zero was a SCENARIO COVERAGE artifact — THE_FIVE_QUESTIONS
was biased toward expert queries that already named the canonical entity.
Adding browse/fuzzy/type-filter scenarios revealed the tool serves a real
niche. Eva's intuition that the v1 reach-for picture was incomplete was
correct.
Description rewrite leads with the browse-first niche so the gravity
matches the just-validated reach-for.
## Tests
- 1587 tests pass (was 1599; net -12 from deleted recall test file
and consolidated parity tests)
- 0 new TS errors (671 vs pre-fix baseline 679 — actually -8 from
deleting recall tool's compile errors)
- Live DB harness: all substantive checks pass (semantic, hybrid,
suppression cascade, extraction). The 3 reported "fails" are the
pre-existing "corpus already fully embedded" no-op messages.
## Ancillary changes
- Added F1-F5 scenarios to THE_FIVE_QUESTIONS.md (browse / fuzzy-name /
vague-summary / type-filter / paraphrastic-cheap)
- Baked F1-F5 into scripts/v41-qa-runner.mjs as permanent test coverage
- Updated lcm_search_entities to allow empty `query` when `entityType`
is provided (browse-by-type use case the new description promises)
- Updated operator-facing log messages in lcm-command.ts and
semantic-infra-init.ts to drop stale lcm_semantic_recall references
## Methodology lesson (encoded into the skill)
Step 1.7 (reach-for validation) MUST be paired with scenario-coverage
audit. Tool absence in reach-for ≠ tool orphaning. Could be scenario
gap. Verify by adding scenarios that exercise the tool's claimed niche
before declaring it dead.
Documents:
- /tmp/research-entity-consolidation.md, /tmp/research-semantic-consolidation.md (Step 1)
- /tmp/step2-entity-consolidation-options.md, /tmp/step2-semantic-consolidation-options.md (Step 2)
- /tmp/adversarial-{entity-A,entity-C,semantic-SA,semantic-SB}.md (Step 3, 4 of 5)
- /tmp/ripple-id-prefix-consolidation.md (Step 3 ripple analysis)
- /tmp/reach-for-analysis.md (Step 1.7 v1)
- /tmp/reach-for-analysis-v2.md (Step 1.7 v2 — verdict C)
…, stale refs)
Wave-1 audit (8 parallel agents over today's 25 commits + 5200 LOC delta)
surfaced 2 P0 + 6 P1 + several P2/P3 findings. This commit batches the
P0 + P1 fixes; P2/P3 to follow.
# P0 — QA runner crashes on startup (W1A5 + W1A6 converged)
`scripts/v41-qa-runner.mjs` still imported the deleted
`lcm-semantic-recall-tool.js` and had 4 `tool: "lcm_semantic_recall"`
case strings. Runner exited with ERR_MODULE_NOT_FOUND before parsing
any args. Bug introduced when F1-F5 added without re-running qa-runner.
Fixes:
- Drop the deleted-tool import (line 227-229)
- Migrate 4 cases (smoke-semantic-cosine-band, smoke-filtered-knn-windowed,
adv-low-confidence-warning, adv-cosine-on-entity-only) to
`lcm_grep` with `mode: "semantic"` + `pattern` arg
# P0/P1 — F5 + F3 QA predicates (W1A6 NEW)
F5 predicate had inverted logic: `if (r.error) return "errored:"`
short-circuited BEFORE the graceful-degradation regex check. Since
qa-runner flattens `r.error = r.details.error` (line 1132), the
Voyage-unavailable allowance was unreachable — F5 always failed in
offline mode. F3 had no LLM-unavailable allowance like A-cases do, so
synthesize_around without summarizer creds always failed F3.
Fixes:
- F5 checks regex BEFORE bare error; matches v41-tool-parity-invariants
pattern
- F3 allows `summarization model|summaryModel|summaryProvider|LCM_SUMMARY_MODEL`
errors as graceful degradation
# P1 — inferTokenBudget bypass for unknown models (W1A1)
`inferTokenBudget` recognized only ~7 model families (opus-4-5/6/7,
gpt-5.4/5.5, sonnet-4-5/6, haiku). For every other model — gpt-4 /
gpt-4o / claude-3.x / o1 / Gemini / Mistral / Ollama — it returned
`undefined`, which `evaluateNeedsCompactGate` treats as a bypass
signal. needsCompact gate was silently disabled for the majority of
operators outside the recognized list.
Fix: conservative 200K default for unknown models. Per-call
MAX_RESULT_CHARS still bounds worst case at 10K tokens. Tests expanded
to cover gpt-4 / gpt-4o / claude-3-5-sonnet / o1-preview.
# P1 — summaryKinds plumbing on hybrid mode (W1A5)
Schema description claimed "Honored only by mode='semantic' / 'hybrid'"
but the dispatch only passed `summaryKinds` to `runSemanticLcmGrep`.
The hybrid branch silently ignored it — documented-but-broken contract.
Fix: resolve `summaryKindsParam` once at the dispatch, pass to both
helpers. `runHybridLcmGrep` now accepts `summaryKinds` in its options
and threads through to `runHybridSearch` (which already supports it).
The FTS-arm closure already post-filtered on summaryKinds (line 576);
the semantic-arm via runHybridSearch was the only gap.
# P1 — Stale lcm_semantic_recall refs (W1A5 + W1A8 converged, ~12 places)
Sweep of agent-facing prose, operator scripts, and changeset:
- `.changeset/lcm-v41-omnibus.md` — corrected tool list (was missing
expand/expand_query/compact; still listed deleted recall)
- `docs/agent-tools.md` — Type-B routing table, decision tree, removed
recall section + cost-table row
- `docs/v4.1/THE_FIVE_QUESTIONS.md` — Type-B header, F3/F5 references,
the false "F-scenarios not yet baked into qa-runner" note
- `docs/v4.1/PR_DESCRIPTION.md` — Mermaid routing diagram, recall
section header (now points to mode='semantic'), cost table
- `docs/v4.1/KNOWLEDGE_DUMP.md` — shipped-tools list, debugging
playbook header
- `scripts/v41-vs-rollup-comparison.mjs` — operator output prose
- `scripts/lcm-tool-call.mjs` — dropped recall dispatch case + JSDoc
- `scripts/v41-live-db-harness.mjs` — log line + section header
- `scripts/v41-agent-harness-preflight.mjs` — JSDoc
Historical references in audit reports (HARNESS_REPORT_2026-05-06.md,
TEST_ANTIPATTERNS.md) kept as-is — they document the state at
that time, accurate for the audit trail.
# Verification
- `npm test` → 1587/1587 pass (no regressions)
- QA runner now starts and runs 34/35 cases successfully
- 1 remaining failure: `smoke-filtered-knn-windowed` — pre-existing
consolidation regression where mode='semantic' returns 0 hits when
queried with a seed leaf's own content + tight ±1h window. Same
code path as the other 5 passing semantic cases. Investigation
deferred to Wave 2.
- F1-F5 ALL PASS
# Deferred
- P2 fixes: post-compact cache reset, LCM_TOOL_RESULT_TOKEN_BUDGET
to other 7 tools, estimator HARD_CAP env honor, comment drift
- P3 cosmetic batch
- `smoke-filtered-knn-windowed` regression investigation
…k leaves Wave-12 audit traced the lone smoke failure to a brittle test, not a consolidation regression. The test picked the latest-mtime embedded leaf as a query seed; today the latest leaf was an `[LCM fallback summary — model unavailable]` (summarizer was down when it was written). Fallback text has no specific semantic neighbors in a tight ±1h window, so the test always returned 0 hits regardless of the semantic pipeline being healthy. Fix: pick a seed with real content (`NOT LIKE '[LCM fallback summary%'` + length > 200). Stable across snapshots regardless of recent summarizer health. Verification: smoke 8/8 pass. Underlying runSemanticSearch contract unchanged — the prior `lcm_semantic_recall` would have hit the same brittle-test issue if the snapshot contained a fallback as latest leaf.
Wave-2 cross-cutting audit (4 parallel agents: token-state-integration, schema/suppression, test/manifest/harness, fresh-eyes) caught 2 P0s + 1 P1 the per-file Wave-1 sweep missed. P0 — token-state cache + accounting bus - Post-compact stale cache: noteSuccessfulCompact() clears the entry on successful lcm_compact so the very next wrapped call re-bootstraps from the post-compact ground truth instead of refusing on the stale pre-compact snapshot. Without this, the agent could loop compact→ refuse→compact until the 2/5min cap blocks further attempts. - lcm_synthesize_around was OFF the runWithTokenGate accounting bus — the prior "self-protecting via 50K source cap" comment covered SOURCE input bounds, not OUTPUT (4K-8K markdown rollup flowed past the cache silently and drifted gate decisions low). Wrapped it; wired getRuntimeContext through registration in src/plugin/index.ts. P1 — runWithTokenGate error path - Tool throws (e.g. "LCM engine is unavailable" — present in 6+ tools + 13 throw sites in lcm_expand_query) skipped tapResultForTokenAccounting entirely. The runtime-serialized error message DOES cost tokens, so the cache drifted low by exactly the size of the error message every time. Added try/catch tap-then-rethrow. Manifest drift fix - registerTool comment placement: moved the W2A1 P0 #2 comment from between `=>` and `createLcmSynthesizeAroundTool` (where the manifest test's regex /=>\s*\{?\s*(?:return\s+)?(create...)/ couldn't match) to ABOVE the api.registerTool block. Re-runs 8/8 against the manifest. Cosmetic - README tool inventory: removed lcm_semantic_recall line, added lcm_compact + Wave-12 SA consolidation note (was: 9 listed minus 1 removed but +1 missing = count cancels out, hidden bug). - THE_FIVE_QUESTIONS.md: coverage 22/25 → 27/30 (post F1-F5 addition). - 7 stale lcm_semantic_recall comment refs in src/embeddings/semantic-search.ts, src/engine.ts, src/store/summary-store.ts, src/tools/lcm-synthesize-around-tool.ts, test/v41-stress-fixture.test.ts, test/v41-tool-budget-guardrail.test.ts. Verified - 1587/1587 vitest passing (Wave-2 batch added regressions for the new noteSuccessfulCompact + try/catch tap behaviors). - 35/35 QA harness against live-DB snapshot at \$0.11; F1/F4 args swap fix confirmed (F1 catalog browse, F4 PR filter).
…describe cap W1A1 #2 — estimator HARD_CAP was hard-coded at 10_000 but the per-tool char cap (LCM_TOOL_RESULT_TOKEN_BUDGET) is operator-tunable. With env raised to 30K, tools could emit 30K but the gate's projection still capped at 10K — needsCompact decisions drifted low (refusals missed when they should fire) by up to 3×. W1A8 #3 — lcm_describe was truly unbounded. Worst case (Wave-12 estimator already noted this in a code comment): a single describe(condensed_id, expandChildren=true) on a wide condensed could emit ~210K tokens (10K base + 20×10K children). Sub-agent grant ledger (consumeTokenBudget, Wave-9 P1) protected delegated sessions; main- agent calls had no per-tool char cap. Single source of truth - New src/plugin/result-budget.ts owns the env knob resolution. Exports: - MAX_RESULT_TOKENS — used by needs-compact-gate as HARD_CAP_TOKENS - MAX_RESULT_CHARS — used by tools for truncation - truncationNotice(reasonHint) — standard message format - needs-compact-gate.ts pulls HARD_CAP from MAX_RESULT_TOKENS so the estimator and per-tool cap stay in lockstep. - lcm-grep-tool.ts drops its local resolveMaxResultChars (now imports from result-budget). Behavior identical at the default; no change to truncation messages. (Existing per-grep messages preserved.) lcm_describe truncation - truncateLinesToCap helper at top of file. Mirrors lcm_grep's pattern: walk lines, accumulate char count (incl. join newlines), append the truncation notice and stop when over cap. - Applied at both return sites (summary describe + file describe). - details.manifest.truncated boolean flag exposed for programmatic callers; details.truncated on the file branch. Tests (6 new, total 15 in suite) - env=30000 → MAX_RESULT_TOKENS=30K, MAX_RESULT_CHARS=120K, estimator projection rises above 10_000 for verbatim mode (proves no longer pinned at the old hard-coded ceiling) - env unset → 10_000 default - env=100 → clamped UP to 2_000 floor (anti-misconfig) - env=garbage → falls back to 10_000 default - describe with 30K-char content + env=2000 → bounded under 10K + emits truncation marker - describe with small content → emits full content, no truncation marker Verified - 1593/1593 vitest passing (was 1587, added 6 regression tests)
Wave-12 found 9 of 10 bugs that escaped 1593 tests. Each bug was hidden by a distinct antipattern. This commit adds 4 new test layers that pin the antipatterns so each bug class fails LOUDLY on regression. A. Wiring/registration smoke (14 tests) - test/v41-tool-wiring-smoke.test.ts - For each tool documented as wrapped in needs-compact-gate.ts: assert the factory file calls runWithTokenGate(. For each documented-exempt tool: assert it does NOT call runWithTokenGate(. Catches the W2A1 P0 bug class (synthesize_around silently dropped off the bus). - For each registered tool in plugin/index.ts: assert getRuntimeContext is wired. Catches the half of the bug where the wrapper is present but not given runtime context. B. Adversarial output bounds (3 tests) - test/v41-adversarial-output-bounds.test.ts - lcm_get_entity with 200 mentions × 1000-char surface_forms: bound check - lcm_search_entities with 500 entities × 200-char canonical: bound check - lcm_search_entities respects schema-bounded limit even with caller=500 - Catches W1A8 #3 sister cases (any tool that emits content without per-tool char cap). C. Cross-module invariants (6 tests) - test/v41-cross-module-invariants.test.ts - estimateResultTokens projection ceiling === MAX_RESULT_TOKENS (caller-tunable env knob). Catches the W1A1 #2 bug class where two modules pin the same constant in isolation and drift apart. - MAX_RESULT_CHARS = MAX_RESULT_TOKENS × 4 ratio - REFUSAL_THRESHOLD calibration sanity vs MAX_RESULT_TOKENS - Every src/tools/lcm-*-tool.ts factory referenced in plugin/index.ts - summaryKinds reaches BOTH semantic and hybrid dispatch (W1A5 #1 schema-vs-implementation drift) - Sub-agent expansion-auth gate consistency (lcm_expand + lcm_describe both consult same manager) D. QA-runner antipattern static scan (26 tests) - test/v41-qa-runner-antipatterns.test.ts - Extracts each `expect: (r) => {...}` closure from qa-runner.mjs. For tools with external deps (Voyage / LLM), assert the graceful- degradation regex check appears BEFORE bare `if (r.error) return`. Catches the W1 F5 bug class (inverted predicate making graceful branch dead code). - Pins F1 has no entityType filter (catalog browse) AND F4 has entityType: pr_number (W1 F1/F4 args swap regression). Verified - 1642/1642 vitest passing (was 1593, +49 new tests; 0 bugs surfaced by the new layers — the patterns pin the existing post-Wave-12 fixes rather than uncovering new issues).
… notes Retro review of today's 4 fix commits (f9a15d9, ae55691, a9f10cf, 37cdabb) ran 4 parallel agents through the first-principles methodology and surfaced 4 issues + 3 unknown-unknowns from my self-audit. This commit closes the in-scope fixes; A1 (LcmConfig promotion) deferred to a focused follow-up PR. L1 — estimator self-contradiction (already-shipped bug) - needs-compact-gate.ts:162 returned 3_000 tokens for synthesize_around - but the file's docstring at line 26-29 documents synthesize OUTPUT as "4K-8K tokens of LLM-generated rollup" - 3000 was a ~50% under-estimate vs documented behavior; the estimator was a self-contradiction in the same file - Fixed: returns 6_000 (midpoint of the 4K-8K range). Added regression test that pins estimate ∈ [4K, 8K] with a comment tying the two sides together so a future docstring drift breaks both. N2 — agent-facing contract drift on details.truncated - lcm_describe shipped truncated flag at details.manifest.truncated for the summary branch but details.truncated for the file branch; asymmetric placement - lcm_grep didn't expose details.truncated at all (regex/full_text + hybrid + semantic paths) despite emitting the same truncation prose - Generic "did this tool truncate?" callers had no consistent field to read across the surface - Fixed: standardize top-level details.truncated as the canonical agent-facing contract field; mirrored across describe (top-level + manifest dup for back-compat) and all 3 lcm_grep paths. M1 — inferTokenBudget per-provider defaults - 200K uniform default for unknown models was too generous for sub-200K models (gpt-4 8K-32K, ollama 8K-128K). Gate stayed silent until projected context was 6× over real budget on these. - Adversarial agent argued: "if real budget is 8K and assumed is 200K, the gate fires at projected 184K but real engine OOMs at 8K" - Fixed per-family defaults: - claude-3.x family → 200K (was already) - gpt-4o / o1 family → 128K (was 200K) - Gemini 1.5/2.x → 1M (was 200K) - Legacy gpt-4 → 32K (was 200K — this was the worst gap) - Ollama / Mistral / OpenRouter / unknown → 32K floor (was 200K) - Added LCM_DEFAULT_TOKEN_BUDGET env var as escape hatch for operators on larger unrecognized models (clamped to [8K, 2M] sanity range). - 7 new tests cover the per-family branches + env override behavior. N1 + N3 — documentation of contract invariants - token-state.ts: declare "tools may import named lifecycle hooks; do NOT reach into tokensBySession directly. lcm_compact's import of noteSuccessfulCompact is the precedent — future cache-aware tools follow the same pattern, do not add new ones to underlying map." - result-budget.ts: declare truncationNotice() prose is now agent-facing contract (test regex pinned + tool descriptions reference it); cosmetic edits will silently break tests AND surprise agents. A1 — follow-up note (deferred to separate PR) - result-budget.ts: declare the architectural inconsistency in a comment (this module bypasses the resolveLcmConfigWithDiagnostics pattern that every other LCM env knob uses). Follow-up PR should promote to LcmConfig.toolResultTokenBudget. Inline note flags it for visibility until then. Verified - 1649/1649 vitest passing (was 1642, +7 new tests for M1 + L1 regression)
…ig field
Wave-12 retro flagged the architectural inconsistency: every LCM env
knob flows through `resolveLcmConfigWithDiagnostics` (env→pluginConfig→default)
with diagnostics + plugin.json schema + docs row. `LCM_TOOL_RESULT_TOKEN_BUDGET`
was the only knob bypassing this pattern. Closes the inconsistency in
the same PR that introduced result-budget.ts.
LcmConfig integration
- src/db/config.ts: new optional field `toolResultTokenBudget?: number`
on LcmConfig; resolution `env.LCM_TOOL_RESULT_TOKEN_BUDGET` →
`pluginConfig.toolResultTokenBudget` → undefined (default applied
downstream). Standard precedence pattern, mirrors `maxAssemblyTokenBudget`.
result-budget.ts converted to live bindings
- Module-level `MAX_RESULT_TOKENS` and `MAX_RESULT_CHARS` are now `let`
exports (was `const`). ESM live-binding semantics: consumers with
`import { MAX_RESULT_CHARS }` see updates from inside the module.
- New `applyResultBudgetConfig(toolResultTokenBudgetFromConfig)` setter.
Called from plugin init AFTER `resolveLcmConfigWithDiagnostics` runs.
No-op when env was set at module load (env wins, same as every other
LcmConfig field).
- Module load resolves env-only (no config available yet); the setter
raises the cap if env wasn't set but config is.
- Two test helpers: `__resolveResultTokenBudgetFromEnvForTesting()` for
re-resolving env without config; `__resetResultBudgetForTesting()`
for `afterEach` resets when a test calls applyResultBudgetConfig.
Plugin init wiring
- src/plugin/index.ts: import + call `applyResultBudgetConfig(config.toolResultTokenBudget)`
immediately after `resolveLcmConfigWithDiagnostics`. Idempotent.
Operator-facing surface
- openclaw.plugin.json: uiHint + JSON schema entry (minimum: 2000)
- docs/configuration.md: new row in the assembly/budgets table
- truncationNotice() prose updated to mention BOTH the env knob and
the LcmConfig field as cap-raising paths
Tests (4 new in cross-module-invariants)
- env wins over config when both are set
- config honored when env unset
- both unset → undefined (default applied downstream)
- applyResultBudgetConfig updates live bindings when env wasn't set
Verified
- 1653/1653 vitest passing (was 1649, +4 new precedence tests)
|
@jalehman this is tested, soaked and works. I recommend trying it on your end for a few days (use voyage free embeddings takes 1 hour in background to fully embed and doesn't affect usage) |
…agent drills down via lcm_describe(file_xxx) Squashed v4.2 patch applied directly onto main (independent of PR Martian-Engineering#613). Same feature, same tests, same Opus-validated behavior — just rebased onto the v3.x main baseline so maintainers can review/test v4.2 without needing Martian-Engineering#613 to land first. Architecture: per-row sidecar `messages.large_content` stores the externalized `file_xxx` id pointing to a payload file in `large_files` (existing v4.1 storage table). Assembler replaces evictable tool-result rows with the v4.1 `[LCM Tool Output: file_xxx | tool=… | N bytes]` reference + `Tool: <name> | Command: <input>` disambiguator (via `exploration_summary`). Drilldown via existing `lcm_describe(id="file_xxx")`. Empirical bench (live-DB snapshot, conv 0cb8928b, 258K budget): baseline: 333 items / 252,288 tokens / 0 stubs v4.2: 689 items / 257,849 tokens / 86 stubs → ~2× wall-clock context coverage (74min → 130min) at same budget. → tool_result count identical (101 in both); v4.2 doesn't displace tool outputs, it stubs heavy ones and reuses budget for older history. Drilldown validation (Claude Opus 4.1 subagent A/B): - Conversational summary ("what did we work on?"): substantive answer, zero tool calls needed, no confabulation. - Specific elided-content probe (with tool_input disambiguator): found correct fileId, wrote correct lcm_describe(id="file_xxx"), refused to fabricate. Quote: "the command string contained sed -n '1,260p' scripts/evaos-support/selfheal.sh literally — that's an unambiguous keyword match. The mapping was one grep away." What's NOT stubbed: - Fresh tail (last ~64 turns / 24K tokens) — agent's working memory - Assistant turns — narrative of what was done is always intact - Tool messages without large_content — legacy/unmigrated rows - Tool messages whose runtime role degraded to assistant — phantom drilldown risk avoided Default OFF (config.stubLargeToolPayloads=false). Architecturally additive (new column + new on-disk file path), reversible (UPDATE messages SET large_content = NULL + rm -rf storage-dir + flag off). Mitigations evaluated through first-principles-architectural-decision skill (research / run-the-system / where-it-lives / adversarial debate at ≥95% confidence): REJECT all four (recency cue, semantic stub wrapping, empty-assistant collapsing, resolution markers). Decision record in audit/v42-bench/DECISION-mitigations.md. Tests: 868/868 pass on main (added 5 new v4.2 unit tests including end-to-end drilldown round-trip). Files: src/db/migration.ts — ensureMessageLargeContentColumn (idempotent ALTER) + busy_timeout src/store/conversation-store.ts — MessageRecord.largeContent + projection src/assembler.ts — buildToolPayloadStub + applyStubSubstitution + ResolvedItem.fileId src/engine.ts — config.stubLargeToolPayloads forwarded src/tools/lcm-describe-tool.ts — strengthened description for [LCM Tool Output:] pattern scripts/lcm-blob-migrate.mjs — idempotent, chunked, busy_timeout-protected migration scripts/v42-assemble-bench.mjs — token/item bench scripts/v42-drilldown-harness.mjs — real-LLM drilldown harness (OpenRouter) test/v42-stub-tier.test.ts — 5 unit tests (boundary, pairing, legacy, multi-block, drilldown round-trip) Companion PR: stacked-on-Martian-Engineering#613 version at Martian-Engineering#626.
|
Maintainer triage correction: park this as v4.1 feature-stack work, not as a P1 bug-fix candidate. The previous comment over-promoted this because the PR is large and can hide important work. Per maintainer direction, the v4.1 stack is a separate feature beast and should be skipped during the current P0/P1/P2 bug triage unless a standalone core bugfix is extracted from it. Current state still matters if the stack is resumed: dirty branch with failing/skipped checks. For the current cleanup lane, the next step is not deep review of this PR; it is to identify any non-v4.1 core bugfix hidden inside it and split that into a focused issue/PR. |
LCM v2 (iteration experiments v4.1) — Lossless Agent Memory
77 commits · 1502 tests passing · 10 audit waves closed · live-DB verified against the user's 2.6 GB / 4187-leaf corpus
This PR rebuilds LCM the way a person actually remembers: keep the raw conversation forever, embed it for similarity search, and synthesize fresh views on demand. The v3
lcm_recentrollup approach (summaries-of-summaries-of-summaries) is removed because it produced repetitive, lossy output that got worse the further back you looked.After merge, the agent can answer every one of these without operator intervention:
lcm_synthesize_aroundwithperiod: 'yesterday'(timezone-aware)lcm_grep --mode hybrid(FTS + Voyage rerank, +52.5pp recall on paraphrases)lcm_grep --mode verbatim(full message rows, no summary paraphrase)lcm_get_entity('operator-VM customer')lcm_describe(id, expandChildren: true)thenlcm_expand_queryThe operator gets
/lcm health,/lcm purge(soft-suppression cascading through 10+ read paths),/lcm reconcile-session-keys,/lcm eval(recall + drift),/lcm workerlifecycle, and a real lossless raw bedrock.Table of contents
TL;DR — what merges and why
Headline numbers:
~/.openclaw/lcm.db(2.6 GB, 4187 leaves)What you should care about as a maintainer:
The problem this solves
We shipped
lcm_recent(PR #516) in v3 — the rollup tool. Plan: every period (day/week/month) gets summarized into one rollup; the rollup is what the agent reads back. Cheap, deterministic.In production it broke in three ways:
1. Compression of compression
By the time a query reached the monthly view, the same fact had been summarized three times. The model started saying "as discussed earlier" referencing a discussion that wasn't in the rollup at all — it was 3 layers down, paraphrased away.
2. No way to ask sideways questions
lcm_recentonly told you about a time window. If you wanted to ask "have we ever discussed X?" — you couldn't. The rollups were time-indexed, not topic-indexed.3. Stale-output trap
If the rollup was generated yesterday and a leaf got suppressed today, the rollup still reflected the suppressed content. Every rollup needed to be invalidated and regenerated on every leaf change — which negated the precomputed-cost savings.
The decision
Stop building rollups entirely. Build a system where:
lcm_synthesize_around) when an agent asks for a window, working from the original leaves rather than re-summarizing summaries.suppressed_atflip makes content invisible everywhere.That's v4.1.
Architecture
Storage pyramid (the lossless bedrock)
flowchart BT subgraph Bedrock["Lossless bedrock — never compressed away"] M[(messages<br/>raw user/assistant/tool turns<br/>+ suppressed_at flag)] end subgraph Layer1["Layer 1: deterministic chunking"] L[(summaries kind=leaf<br/>~600 tokens each<br/>+ suppressed_at flag)] M -->|leaf-summarizer| L end subgraph Layer2["Layer 2: condensed views"] C[(summaries kind=condensed<br/>depth 1-2<br/>+ contains_suppressed_leaves flag)] L -->|condensation worker| C end subgraph OnDemand["Layer 3+: synthesized on demand"] SC[(lcm_synthesis_cache<br/>tier-tagged, prompt-tagged<br/>rebuildable)] L -->|dispatchSynthesis<br/>per-tier model| SC C -->|dispatchSynthesis| SC end subgraph Sidecars["Sidecars built async"] E[(lcm_entities<br/>+ lcm_entity_mentions)] VEC[(vec0:<br/>lcm_embeddings_voyage4large)] L -.->|entity coreference worker| E L -.->|backfill worker → Voyage| VEC endKey invariants:
lcm_synthesis_cachecan be wiped without data loss; everything regenerates from leaves.Agent tool routing — 5 question types → 8 tools
flowchart LR Q[Agent's question] Q --> A{What kind?} A -->|Time-anchored<br/>'yesterday' / 'last week'| TA[lcm_synthesize_around<br/>period mode] A -->|Topic-anchored<br/>'discussed X?'| TB[lcm_grep --mode hybrid<br/>OR lcm_semantic_recall] A -->|Verbatim<br/>'exact wording'| TC[lcm_grep --mode verbatim] A -->|Pattern entity<br/>'history of X'| TD[lcm_get_entity<br/>+ lcm_search_entities] A -->|Drilldown<br/>'where did this come from?'| TE[lcm_describe<br/>+ lcm_expand_query] TA -->|tier dispatch<br/>haiku/sonnet/opus/thinking| LLM[LLM] TB -->|FTS + Voyage rerank<br/>+52.5pp paraphrase recall| Hits[Ranked hits] TC -->|Full message rows<br/>cap 20| Quotes[Verbatim quotes] TD -->|Read-only DB query| Entity[Entity record + mentions] TE -->|Sub-agent expansion<br/>via grant ledger| Sub[Synthesized answer]Suppression cascade (the "soft purge" mechanism)
flowchart LR OP[Operator: /lcm purge --reason X --apply] OP -->|sets suppressed_at| S[(summaries.suppressed_at)] OP -->|cascades to messages| MS[(messages.suppressed_at)] S -->|trigger| V[(vec0 metadata<br/>suppressed=1)] S -->|cascade DELETE| CI[(context_items)] S -->|cascade DELETE| CR[(lcm_synthesis_cache rows<br/>referencing suppressed leaves)] S -->|flag| CS[(parent condensed:<br/>contains_suppressed_leaves=1)] subgraph ReadPaths["10+ read paths filter suppressed_at IS NULL"] R1[FTS5 searches] R2[LIKE fallback] R3[CJK trigram] R4[Regex post-filter] R5[vec0 KNN pre-filter] R6[Semantic search JOIN] R7[Hybrid rerank input] R8[summaryStore.getById] R9[conversationStore.getMessageById] R10[Entity tools — EXISTS guard] end S -.->|filter applied| R1 & R2 & R3 & R4 & R5 & R6 & R7 & R8 & R9 & R10 MS -.->|filter applied| R9 V -.->|pre-filter| R5Wave-10 reviewer fix: lcm_get_entity / lcm_search_entities now require an
EXISTS (... unsuppressed mention)guard. If every mention of an entity gets purged, the entity row stops returning to the agent — closes the leak Wave-10 reviewer P2 found.Synthesis dispatch (per-tier model selection)
flowchart TB REQ[SynthesizeRequest:<br/>tier + memoryType + sourceText + targetSummaryId] REQ --> ROUTE{tier?} ROUTE -->|daily| D[Single pass<br/>haiku-4-5] ROUTE -->|weekly| W[Single pass<br/>sonnet-4-5] ROUTE -->|monthly| M[Single pass<br/>opus-4-7<br/>+ verify_fidelity check] ROUTE -->|yearly| Y[Best-of-3 candidates<br/>opus-4-7-thinking<br/>+ judge picks winner] ROUTE -->|custom/filtered| C[Single pass<br/>sonnet-4-5] D & W & M & Y & C --> AUDIT[(lcm_synthesis_audit<br/>per-pass row)] M -->|verify pass output| VR{OK marker?} VR -->|missing OR HALLUCINATION/UNSUPPORTED| FLAG[hallucinationFlagged=true<br/>conservative default<br/>Wave-4 P0] VR -->|present| OK[hallucinationFlagged=false] Y -->|all 3 in parallel<br/>Promise.allSettled<br/>tolerates 1 failure| CAND[Candidates] CAND --> JUDGE[Judge prompt picks winner]Concurrency model (gateway vs worker)
flowchart LR subgraph Gateway["Gateway process — hot path, sync per turn"] T[Turn] --> ASM[assemble pyramid] ASM --> LW[leaf write<br/>BEGIN IMMEDIATE] LW -->|enqueue async| EQ[(extraction queue)] LW -->|atomic| FTS[(FTS5)] LW -->|sync index| SUM[(summaries)] end subgraph Worker["Worker process — background"] BAC[backfill autostart<br/>5min interval<br/>gated on VOYAGE_API_KEY] EXC[entity-coref autostart<br/>60s interval<br/>gated on LCM_EXTRACTION_LLM_ENABLED] BAC -->|drains| EQ EXC -->|drains| EQ BAC -->|HTTP POST| VOY[Voyage API<br/>raw fetch, not SDK] EXC -->|LLM call| LLM[Worker LLM] end subgraph Locks["lcm_worker_lock — TTL + heartbeat"] L1[acquire INSERT OR IGNORE] L2[heartbeat every 30s<br/>WHERE expires_at > now] L3[release on shutdown] end BAC --> L1 & L2 & L3 EXC --> L1 & L2 & L3 Note["§0 invariant: NO LLM/network call inside any SQLite write tx"]Critical invariant (verified by
test/v41-concurrency-invariants.test.tswith worker_threads parallel writers): noawaiton an LLM/network call ever appears inside aBEGIN IMMEDIATE. Voyage HTTP calls happen OUTSIDE the transaction; results are committed in a separate write tx.The 8 agent tools
Each tool maps to one or more of the 5 question types. Cost profile is per-call against the user's typical corpus.
lcm_grep— multi-modal searchThe Swiss-army search tool. 5 modes for 5 different jobs.
regexfull_texthybridverbatimsemanticWave-9 P1.4 fix: verbatim mode handles CJK queries (Chinese/Japanese/Korean) via LIKE fallback. The FTS5 unicode61 tokenizer can't segment ideographs, so
messages_fts MATCH '机器学习'returned 0 rows silently. NowcontainsCjk()detection routes CJK queries directly to LIKE substring match.Example call:
lcm_semantic_recall— pure semantic searchSame cost profile as
lcm_grep --mode semantic(~$0.0001/query); kept as a distinct tool for clarity. Returns ranked snippets withcosineSimilarity+confidenceBand.lcm_synthesize_around— time-anchored synthesis (thelcm_recentreplacement)Three modes:
periodmode:period: 'yesterday' | 'today' | 'this-week' | 'last-week' | 'this-month' | 'last-month' | 'last-Nh' | 'last-Nd'. Target is OPTIONAL. Wave-10 P1 fix: day-boundary periods (today/yesterday/etc.) are computed in the operator's local timezone (lcm.timezone), not UTC. A Bangkok operator (UTC+7) at 02:00 local asking "yesterday" gets local-yesterday, not UTC-yesterday (which would be ~17 hours off).timemode:target: 'sum_xxx'+windowHours: N. Anchors on a known leaf's timestamp.semanticmode:target: 'sum_xxx'OR free-text query. Top-K most-similar leaves.Output is a fresh markdown synthesis using the per-tier model (haiku for
customandfilteredtiers; full per-tier dispatch in Synthesis dispatch above).Backed by
lcm_synthesis_cachewith Wave-10 P1 fix: cache UNIQUE index now keys on(session_key, range_start, range_end, leaf_fingerprint, grep_filter, tier_label, prompt_id). Previously ignoredtier_labelandprompt_id, so two correctness bugs:registerPrompt()changing the active prompt left cache serving stale textlcm_describe— drilldown by IDLook up a specific summary or file by its ID. Returns metadata, lineage manifest (parent chain + descendant counts), source messages (for leaves), and on-demand expansion via
expandChildren/expandMessagesflags.Wave-9 + Wave-10 fixes: when called from a sub-agent session, expansion now consumes the grant's token budget — previously sub-agents could drain context for free. The base summary's
s.contenttoken count is also charged (Wave-10 P1).lcm_expand— sub-agent-only deep expansionGated behind
runtimeExpansionAuthManager.grant(). Issues a token-budgeted expansion request that traverses the summary lineage and returns content under a hard cap. Main agents cannot call this directly — they go throughlcm_expand_querywhich delegates to a sub-agent that holds the grant.lcm_expand_query— main-agent wrapper for delegated expansionTakes a free-text query + token cap. Spawns a sub-agent holding a runtime expansion grant; sub-agent runs
lcm_expandrepeatedly to materialize source. Returns synthesized answer with citation IDs.Wave-9 P1.1 fix: citation-fabrication count (
citedIdsRejectedAsFabricated+citedIdsExceededValidationCap) now surfaces through theExpandQueryReply— previously computed internally but dropped at the API boundary. Agents can now distinguish "the LLM produced no citations" from "all citations were hallucinated and rejected."lcm_get_entity— entity catalog lookup by canonical nameReturns entity record + mention list with summary IDs. Mentions are filtered through
summaries.suppressed_at IS NULL.Wave-10 reviewer P2 fix: requires
EXISTS (... unsuppressed mention)for the entity itself. When every mention of an entity is suppressed via purge, the entity row no longer leakscanonical_text/alternate_surfaces/metadata. The "not found" branch is intentionally indistinguishable between "no such entity" and "all mentions suppressed" so an attacker can't infer existence by querying.lcm_search_entities— entity catalog browse by querySubstring/prefix/exact match across
canonical_text. Same suppression guard.Why Voyage embeddings
Phase A spike data (real eval, not gut feel)
A 31-query stratified eval against the user's snapshot DB measured per-stratum lift:
The +52.5pp paraphrastic lift was the threshold (decision gate was ≥30pp). Hybrid mode is the answer to "have we ever discussed X?" — paraphrase coverage is the differentiator.
Spike cost: $0.58 total (one-time eval).
Why voyage-4-large + rerank-2.5
Why not OpenAI / Cohere / local
Cost reality
A year of heavy use is ~$5-10 in Voyage costs. Not a budget concern.
Worker auto-ticks
Backfill autostart
VOYAGE_API_KEYpresent + sqlite-vec loaded + active embedding profile registered./lcm worker tick embedding-backfill.Entity coreference autostart
LCM_EXTRACTION_LLM_ENABLED=false.lcm_extraction_queue.Lock semantics
lcm_worker_locktable:INSERT OR IGNOREon PK gives single-flight; heartbeat every 30s withWHERE expires_at > nowprevents stealing live locks. TTL + heartbeat verified against parallel writers viaworker_threads(test/v41-concurrency-invariants.test.ts).Operator commands
/lcm status/lcm health/lcm worker [status|tick embedding-backfill]tickruns one backfill cycle/lcm reconcile-session-keys [--list-candidates|--apply ...]/lcm eval [--baseline|--mode hybrid|...]lcm_eval_run/lcm purge [--reason ... --apply]/lcm backup/lcm rotate/lcm doctor [clean [apply ...]|apply]Wave-9 P0 fix + Wave-10 reviewer P1 fix: every destructive command requires
senderIsOwner=true. Previously only/lcm purgehad the gate; Wave-9 added it to/lcm reconcile-session-keys --apply(cross-session data theft vector). Wave-10 added it to/lcm eval(the reviewer correctly challenged Wave-9's READ_ONLY classification — eval mutateslcm_eval_runand may use Voyage in hybrid mode). The authorization-invariant test (test/v41-authorization-invariants.test.ts) statically scanslcm-command.tsfor new cases and FAILS at test time if a destructive case is added without the gate.Cost discipline
lcm_grep --mode hybrid(per query)lcm_grep --mode semantic/lcm_semantic_recalllcm_grep --mode verbatim/regex/full_textPer-tier model dispatch is the cost lever: we don't pay opus-thinking prices for yesterday's summary, and we don't ask haiku to do yearly synthesis.
What was CUT and why
Per first-principles pass + 8 challenger agents (2026-05-06):
/lcm worker tick themes-consolidation. Tool error message itself admitted "auto-tick is cycle-3".runPurge --immediatemodelcm_voyage_rate_stateschemalcm_purge_rebuild_queueschema--immediatecut).lcm_describeconsolidation (entity_id / theme_id polymorphism)Net diff: ~2935 LOC removed from PR. Net change after capability adds (verbatim mode, semantic mode, expandChildren flags, doc updates): ~−2605 LOC.
The companion draft PR #616 preserves each cut with full context for focused future-cycle pickup. Each cut was assessed against THE_FIVE_QUESTIONS coverage; no question type lost coverage (procedure/theme sub-cases have adequate-fallback coverage via
lcm_grep --mode hybrid).Test infrastructure
This is what makes the PR shippable. Wave-10 pivoted from "more audits" to "build automated tests that catch every known antipattern class."
8 of 9 antipattern classes have automated detection
v41-adversarial-scenarios.test.tsv41-{authorization,suppression,tool-parity,schema-drift,concurrency}-invariants.test.tsfixtures/v41-mock-llm.ts+v41-synthesis-quality.test.tsfixtures/v41-{test,stress}-corpus.tsv41-adversarial-scenarios.test.ts+v41-synthesis-quality.test.tsv41-five-questions.test.tsstryker.config.jsonv41-concurrency-invariants.test.tsv41-schema-drift-invariants.test.tsA7 (mutation testing) is partial — stryker-mutator config is checked in but not run in CI (too slow for per-PR; ~5min per file). On-demand only.
Test layer cost profile
THE_FIVE_QUESTIONS as executable tests
The 25 scenarios in
docs/v4.1/THE_FIVE_QUESTIONS.mdare now executable against a deterministic synthetic fixture (test/fixtures/v41-test-corpus.ts, ~80 leaves with known content). Wave-10 sub-agent #3's audit found that the original 26 tests were 16 strong / 9 weak / 1 sentinel — strengthened the weak ones to assert specific summary IDs in results.Synthesis quality — closed via mock LLM
The single un-tested gap after Wave-9 was synthesis quality (real LLM tests are non-deterministic + cost money + need network). Wave-10 added a deterministic LlmCall mock with 10 response shapes including adversarial (fabricated citations, malformed JSON, hallucinated content). 12 synthesis-quality tests verify:
Promise.allSettledhallucinationFlagged=trueon garbled output (Wave-4 P0 conservatism)Audit history
10 audit waves over ~6 weeks, with ~140 unique bugs found and closed. The progression validates the test-infrastructure investment:
Wave-10 reviewer findings: 12 separate reviewer findings, verified one-by-one before fixing (the user explicitly said "wasn't sure if verified"). Result: 12-for-12 real bugs, including the timezone P1 (Bangkok-yesterday-is-not-UTC-yesterday) and the cache-staleness P1 (
prompt_idandtier_labelmissing from UNIQUE index).Wave-10 sub-agent discoveries: 4 parallel sub-agents building test infrastructure ALSO found additional real bugs:
summaries_ftsinsert was usingrowidbut schema declaressummary_id UNINDEXED. Original B1-B5 tests passed by accident matching at the messages layer.Total Wave-10 closed: 12 (reviewer) + 4 (sub-agent discoveries) = 16 source bugs, plus 89 new tests.
Verification
Test counts (final)
test/v41-*.test.ts)main; type-tightening fixes cascaded from source changes)Live-DB verification (real corpus)
Run twice in Wave-10 against a copy of
~/.openclaw/lcm.db(2.6 GB, 4187 leaves):QA runner against real DB
(After fixing 3 test bugs in Wave-10 — they were test-data-naming issues, not source bugs. Tools correctly rejected harness leaves with non-
sum_prefix; we updated harness + QA runner to use production naming.)Sample mutation testing
Two sample files measured (full mutation run is too slow for CI):
src/store/fts5-sanitize.ts: 82.35% mutation kill rate (well-tested utility)src/operator/purge.ts: 67.97% mutation kill rate, 17 uncovered mutants — recently-grown workflow file with measurable gapsThe gap is the diagnostic the user predicted ("tests show green while we keep finding real bugs"). Future work uses these sample numbers to prioritize where to add tests.
Migration safety
All schema changes are additive. Re-running
runLcmMigrationsis idempotent (verified in tests + live-DB twice). No column drops, no type changes. Cut tables are simply not created on fresh installs; existing operator DBs that already have them keep them as no-op residue (no FK breakage, no data loss).Wave-10 schema-drift invariant (
test/v41-schema-drift-invariants.test.ts) statically validates:{{placeholder}}in seeded prompts has a correspondingrenderPromptsubstitutiontier_labelCHECK constraint accepts every value in theTierLabelTS unionON DELETEclause (Wave-10 sub-agent feat: add v1 agent-scoped memory scope for LCM tools #2 finding)These run statically at test time, no DB needed. Future drift breaks the test before it breaks production.
Operator setup walkthrough
If
VOYAGE_API_KEYis missing, the plugin still works —lcm_grep --mode hybridreturns an error pointing to usemode='full_text'instead. Operator opts in by setting the key. (sqlite-vecis now anoptionalDependencyper Wave-10 reviewer P2 fix — install it explicitly to enable semantic.)Non-goals
What v4.1 is intentionally not:
assemble()pyramid is structural (fresh tail → recent leaves → last-week condensed → last-month condensed → last-year synthesis). It does NOT do per-turn semantic retrieval into the prompt. Semantic retrieval is an agent tool the model can call when the user asks for it.lcm_synthesize_around, not a precomputed nightly job.runPurgedoes soft-suppression only. The DB rows remain; onlysuppressed_atis set. For GDPR/erasure that requires byte-level deletion, the operator must run raw SQL DELETE + VACUUM out-of-band until [CUT FEATURES] [DRAFT] LCM v4.1 — deferred features (themes / procedures / hard-delete drainer / voyage rate-state / intentions / describe consolidation) #616 lands the drainer worker.Related PRs
lcm_describeconsolidation with full context for focused future-cycle pickup.Reviewer checklist
If you're reviewing for merge, focus on these in order:
suppressed_at IS NULL. The invariant test (v41-suppression-invariants.test.ts) loops over every read path on the storage stores. If you find a method that returns content without consulting the filter, that's a P0.senderIsOwnergated. The invariant test (v41-authorization-invariants.test.ts) statically scans for new cases and FAILS at test time if classification is missing.Promise.allSettledtolerance. All covered byv41-synthesis-quality.test.ts(12 tests, mock LLM).If those 6 sections check out, the rest of the PR is implementation detail that the test layer will catch regressions on.