Skip to content

feat: lcm_recent — temporal rollup layer for recency awareness#3

Closed
100yenadmin wants to merge 75 commits into
mainfrom
feat/lcm-recent
Closed

feat: lcm_recent — temporal rollup layer for recency awareness#3
100yenadmin wants to merge 75 commits into
mainfrom
feat/lcm-recent

Conversation

@100yenadmin

@100yenadmin 100yenadmin commented Apr 13, 2026

Copy link
Copy Markdown
Member

Summary

Adds a daily/weekly/monthly rollup system on top of the existing LCM summary DAG. Pre-built daily rollups give fast answers to temporal questions without requiring keyword search or LLM calls.

New tool: lcm_recent

lcm_recent(period="today")       → structured daily recap
lcm_recent(period="yesterday")   → prior day rollup
lcm_recent(period="7d")          → last 7 days
lcm_recent(period="date:2026-04-11") → specific date

Architecture

  • Schema: lcm_rollups, lcm_rollup_sources, lcm_rollup_state tables (additive, no existing table modifications)
  • RollupStore: Data access layer with proper UPSERT semantics (ON CONFLICT DO UPDATE, not INSERT OR REPLACE)
  • RollupBuilder: Timezone-aware daily builder with fingerprinted idempotent rebuilds, keyword-based outcome extraction
  • lcm_recent tool: Period-based temporal recall with direct timestamp-bounded fallback when no rollup exists

Safety

  • All writes use ON CONFLICT DO UPDATE (not INSERT OR REPLACE which does delete+insert in SQLite)
  • Rebuilds wrapped in BEGIN IMMEDIATE transactions for atomicity
  • Rollup tables created before FTS5 guard (FTS5-independent)
  • New partial index on summaries(conversation_id, kind) WHERE kind='leaf'
  • Adversarial review completed (R-463: 10 findings, all addressed)
  • Tested against 1.9GB production LCM database — schema safety verified
  • Scenario validation: 15 use cases evaluated (R-465)

Files

File Purpose Lines
src/store/rollup-store.ts CRUD, provenance, state management 371
src/rollup-builder.ts Daily rollup synthesis engine 553
src/tools/lcm-recent-tool.ts Tool implementation + fallback 631
src/db/migration.ts +68 lines (additive only)
src/plugin/index.ts +4 lines (import + registration)
src/store/index.ts +9 lines (re-export)

Not included (Phase 2)

  • Weekly/monthly rollup building (schema ready, builder not yet)
  • Topic clustering within daily rollups
  • Sub-day time ranges
  • Episode tracking across days
  • LLM-based synthesis (current: deterministic concatenation)

Open with Devin

Summary by CodeRabbit

Release Notes

  • New Features

    • Added bundled lossless-claw skill with comprehensive guides and /lcm, /lossless diagnostic commands
    • Introduced /lcm doctor and /lcm doctor clean for broken summary detection and repair
    • Added sorting options (relevance, hybrid) to recall tools for improved search results
    • Externalize large tool results to configurable storage directory
    • Enabled transcript garbage collection with configurable opt-in
  • Documentation

    • Added Chinese README with deployment guidance
    • Comprehensive skill documentation, configuration reference, and diagnostics guide
    • Updated README with new commands and environment variables
  • Configuration

    • Default model changed to OpenAI GPT
    • New settings for summary timeout, transcript GC, and large file storage
    • Support for fallback model providers

tmchow and others added 30 commits April 3, 2026 14:49
…t content (Martian-Engineering#235)

* fix: preserve text block structure when externalizing large toolResult content

When a toolResult message contains a plain-text content block
({type: "text", text: "..."}) that exceeds the externalization
threshold, interceptLargeToolResults now keeps {type: "text", text: ref}
instead of rewriting to {type: "tool_result", output: ref}.

This prevents the amazon-bedrock provider from crashing on
sanitizeSurrogates(c.text) when c.text is undefined.

The assembler path also reads rawType from stored metadata so
reassembled blocks reconstruct the correct part type.

Fixes Martian-Engineering#196

* fix: restore text blocks for externalized tool results

Make the assembler reconstruct externalized plain-text tool results as
`{ type: "text", text: ... }` instead of forcing them back through the
`tool_result`/`output` shape. Tighten the regression tests so they assert
the exact assembled block shape, and add assembler coverage for the
externalized-text path.

Regeneration-Prompt: |
  Review feedback on PR 235 showed the previous change only altered how
  large plain-text tool results were stored, not how they were assembled
  back into runtime messages. The bug report was that Bedrock reads
  `c.text` for plain text tool-result content, and the PR still rebuilt
  those externalized blocks as `tool_result` objects with `output`, so the
  provider would still see `undefined`.

  Fix the round-trip at the assembler layer with the smallest additive
  change. Preserve existing behavior for structured tool results and
  function_call_output blocks. Add regression tests that fail unless the
  assembled block is actually `type: "text"` with a `text` field, and add
  focused assembler coverage for the externalized plain-text case.

---------

Co-authored-by: Josh Lehman <josh@martian.engineering>
…-Engineering#248)

When tool-use-only assistant turns are stored with content='' and zero
message_parts, or when filterNonFreshAssistantToolCalls strips all
tool_use blocks from a non-fresh assistant message, the resulting
content array is empty ([]) or the content string is falsy.

Anthropic (and other providers) reject messages with empty content:
  'The content field in the Message object at messages.0 is empty'

Add an explicit filter in assemble() to remove these empty assistant
messages before passing to sanitizeToolUseResultPairing and the API.
The filter only targets assistant messages — user messages with empty
content are left untouched (provider may handle differently).

Closes Martian-Engineering#238

Co-authored-by: wujiaming88 <wujiaming88@example.com>
Martian-Engineering#258)

* fix: harden bootstrap budget against oversized messages and NaN config

Two bugs in the bootstrap budget cap introduced in Martian-Engineering#255:

1. A single oversized tail message bypasses the budget entirely.
   The trim loop condition 'if (kept.length > 0 && ...)' means the
   first message (newest) is always kept regardless of size. A 50K-token
   tool result as the last message will bypass a 6K budget. Fix: after
   the loop, check if the single kept message exceeds budget and return
   empty instead of silently bypassing.

2. NaN propagates through all numeric env config parsing.
   parseInt('oops', 10) returns NaN, which is not nullish, so
   ?? fallback never fires. Invalid env like LCM_LEAF_CHUNK_TOKENS=oops
   propagates NaN through leafChunkTokens, bootstrapMaxTokens, and every
   derived config value — effectively disabling all token budgets.

   Fix: add parseFiniteInt/parseFiniteNumber helpers that return undefined
   for non-finite results. Replace all 16 raw parseInt/parseFloat calls
   in resolveLcmConfig() with the safe helpers.

Both bugs were found and reproduced with minimal scripts during
adversarial review of a production incident.

* test: cover bootstrap and env fallback regressions

Add focused regression tests for the oversized singleton bootstrap tail case and invalid numeric env parsing fallback behavior. Add a patch changeset because this PR changes runtime behavior and should be reflected in release notes.

Regeneration-Prompt: |
  The open PR fixed two production regressions but still lacked the release and test follow-through needed to merge. Add targeted regression coverage instead of broad refactors: one config test that proves invalid numeric env values like LCM_LEAF_CHUNK_TOKENS=oops fall back through plugin/default resolution, and one bootstrap test that proves a single oversized tail message is dropped instead of bypassing bootstrapMaxTokens. Also add a patch changeset because the PR changes runtime behavior visible to users and maintainers expect release notes coverage for that.

---------

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>
…artian-Engineering#222)

* Initial plan

* fix: block concurrent expand-query delegation per origin session

Agent-Logs-Url: https://github.com/Martian-Engineering/lossless-claw/sessions/46499c08-a52b-4640-9235-d4505936b758

Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>

* test: simplify concurrent expand-query gate fixture

Agent-Logs-Url: https://github.com/Martian-Engineering/lossless-claw/sessions/46499c08-a52b-4640-9235-d4505936b758

Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>

* docs: add changeset for expand-query concurrency fix

Agent-Logs-Url: https://github.com/Martian-Engineering/lossless-claw/sessions/46499c08-a52b-4640-9235-d4505936b758

Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>

* fix: narrow expand-query concurrency gating

Delay origin-session concurrency slot acquisition until lcm_expand_query has
resolved scope and found summary IDs to delegate. This preserves the
concurrency block for real delegated sub-agent work without blocking
overlapping no-op or no-match requests that never touch the shared lane.

Add a regression test covering concurrent query calls that return no matches
so harmless probes remain unblocked.

Regeneration-Prompt: |
  Address the PR review finding that the new lcm_expand_query concurrency slot
  was acquired too early. Preserve the intended deadlock prevention for real
  delegated sub-agent runs, but do not serialize requests that exit before any
  delegation happens, such as missing-scope or no-match query paths. Keep the
  existing concurrency-block behavior for actual delegated expansions and add a
  regression test proving concurrent no-match requests both complete normally
  without any gateway agent calls.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Co-authored-by: Josh Lehman <josh@martian.engineering>
…artian-Engineering#180)

* feat: prompt-aware context assembly with BM25-lite relevance scoring

When the token budget is exceeded during context assembly, evictable items
are now scored by relevance to the current user prompt (BM25-lite TF keyword
scoring) rather than dropped in strict chronological order. This means
summaries matching the user's active query are preserved over irrelevant
but more recent content.

- Add `prompt?: string` to AssembleContextInput and LcmContextEngine.assemble()
- Add `text: string` to ResolvedItem for pre-extracted scoring content
- Implement scoreRelevance() using TF-based keyword overlap (no deps, no LLM)
- Fall back to existing chronological eviction when prompt is absent or empty
- Add 6 integration tests covering prompt-aware eviction, fallback, and edge cases

Refs OpenClaw PR #50848. Zero cost increase, fully backwards compatible.

* chore: gitignore CE plan artifacts and TASK.md

* test: add unit tests for BM25-lite scoreRelevance and tokenizeText

Export scoreRelevance and tokenizeText (with @internal JSDoc) for direct
unit testing. Add 13 new tests covering edge cases: empty inputs, no
overlap, case insensitivity, prompt term deduplication, single-char
filtering, and relative scoring. Fix inaccurate docstring that claimed
[0,1] bounded range.

* fix: fall back on unsearchable assembly prompts

Treat prompt-aware assembly as opt-in only when the prompt contains at least one searchable term. Blank or whitespace-only prompts now follow the existing chronological eviction path, and the integration suite covers that regression. Add a patch changeset because this fixes user-visible assembly behavior in the plugin.

Regeneration-Prompt: |
  Review found that prompt-aware context eviction switched behavior on any non-empty prompt string, even when the string had no searchable terms after tokenization. Preserve the new relevance feature, but make blank, whitespace-only, or otherwise unsearchable prompts fall back to the existing chronological eviction path so behavior matches the docs and tests. Keep the change minimal in the assembler, add an integration test that proves whitespace-only prompts keep the chronological result, update public comments to reflect the actual contract, and add a patch changeset because this affects user-visible context assembly behavior.

---------

Co-authored-by: Josh Lehman <josh@martian.engineering>
…an-Engineering#257)

* fix: harden afterTurn dedup guard against false-positive drops

Improves the replay dedup introduced in Martian-Engineering#246 with two fixes:

1. Replace hasMessage() fast-path with aligned-tail boundary check.
   The old approach checks if batch[0] exists *anywhere* in the DB,
   which false-positives on legitimate repeated first messages (e.g.
   user sends 'hello' again). The new check verifies the DB's last
   message aligns with the exact replay boundary position in the
   incoming batch.

2. Run dedup on newMessages before prepending autoCompactionSummary.
   The merged Martian-Engineering#246 deduplicates the full ingestBatch including the
   synthetic summary, which can interfere with replay detection when
   the summary content matches historical messages.

Both changes are conservative: any mismatch falls through to the
existing full ordered-prefix proof, and mismatches always preserve
the batch unchanged (no data loss on false negatives).

* fix: repair afterTurn dedup ingest batch

Fix the follow-up replay dedup change so afterTurn passes the constructed ingest batch into ingestBatch instead of referencing a removed variable. Add a regression test covering restart replay when auto-compaction summary text is prepended, and include a patch changeset for release notes.

Regeneration-Prompt: |
  Review PR 257 in lossless-claw and fix the blocking typo left in the
  afterTurn replay-dedup follow-up. Preserve the aligned-tail replay
  detection approach, keep the fix additive, and avoid changing unrelated
  behavior. Add targeted regression coverage for the summary-prepend edge
  case that the PR description calls out, then add a patch changeset so the
  data-loss hardening lands in release notes. Validate with the repo's
  existing vitest binary from the main checkout because the PR worktree does
  not have its own node_modules.

---------

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>
…neering#229)

* fix: parse SQLite UTC timestamps with explicit Z suffix

SQLite datetime('now') stores UTC timestamps without a Z suffix.
JavaScript's Date constructor parses bare datetime strings as local time
per ECMA-262, causing timestamps to shift by the local timezone offset.

This adds a parseUtcTimestamp() helper that appends 'Z' before parsing,
and applies it to all new Date(row.*) calls in conversation-store,
summary-store, and migration.

Fixes Martian-Engineering#216

* fix: preserve explicit timestamp offsets

Keep explicit timezone offsets intact in the shared timestamp parser while still normalizing bare SQLite datetime('now') values to UTC. Add focused parser coverage for bare, Z-suffixed, and offset-bearing timestamps, and include a patch changeset for the behavior fix.

Regeneration-Prompt: |
  Address the PR review finding on the shared SQLite timestamp parser introduced for issue Martian-Engineering#216. Preserve the intended fix for bare datetime('now') strings that lack a timezone suffix, but do not break timestamps that already include Z or an explicit offset like +02:00. Add narrow tests that prove all three cases still parse correctly, and include a patch changeset because this affects user-visible timestamp handling.

---------

Co-authored-by: Nemo (docs-sync) <nemo@caeli.ai>
Co-authored-by: Josh Lehman <josh@martian.engineering>
* docs: add Chinese README (README_zh.md)

* docs: 更新相關倉庫連結(新命名)

* feat: CJK trigram FTS search with OR semantics

FTS5 unicode61 tokenizer cannot segment CJK ideographs (Chinese, Japanese,
Korean), so CJK queries fall back to a LIKE path with AND logic. When the
user's phrasing doesn't exactly match the summary text (e.g. querying
"端到端测试结果" when the summary contains "端到端测试"), ALL terms
must match and the query returns zero candidates.

This commit adds:

1. A new FTS5 trigram-tokenized virtual table (summaries_fts_cjk) that
   indexes every 3-character substring, enabling native CJK substring
   matching.

2. searchCjkTrigram() — splits CJK segments into overlapping 4-char
   chunks and combines them with OR semantics via FTS5 MATCH. Non-CJK
   tokens (English, version numbers) are searched in the existing porter
   FTS table. Results are unioned and sorted by recency.

3. searchLikeCjk() — a fallback when the trigram table is unavailable.
   Splits CJK text into bigrams (2-char sliding window) and uses LIKE
   with OR instead of AND, so partial matches return results.

4. Auto-migration: creates summaries_fts_cjk and backfills from existing
   summaries on first run. New summaries are indexed on save.

Tested on 4 machines with Chinese query workloads:
- Before: "端到端测试结果" → 0 candidates
- After:  "端到端测试结果" → correct matches via trigram OR

Fixes CJK zero-result bug affecting all Chinese/Japanese/Korean users.
Related: Martian-Engineering#208 (search path for lcm_expand_query candidate resolution)

* fix: tighten CJK summary search semantics

Keep mixed CJK and Latin summary queries on full-intent matching while
preserving the new CJK-specific recall improvements. Route short CJK
segments through the LIKE fallback so one- and two-character queries do
not regress, and update fallback coverage plus a release note.

Regeneration-Prompt: |
  Address review feedback on the PR that added trigram-backed CJK summary
  search. Preserve the additive migration and the improved recall for CJK
  phrasing differences, but fix the cases where mixed-language queries were
  broadened from implicit AND to OR and where very short CJK queries could
  return no results. Keep the work localized to summary search behavior,
  add regression tests for mixed CJK plus Latin queries and single-character
  CJK queries, and include a changeset because this is user-facing search
  behavior.

---------

Co-authored-by: scott <scott@Scott4.local>
Co-authored-by: Scott Lin <catgodtw@users.noreply.github.com>
Co-authored-by: Josh Lehman <josh@martian.engineering>
…ian-Engineering#148)

* lossless-claw-3ea: add transcript GC maintenance for externalized tool results

Add a summarized-tool candidate query in SummaryStore and implement LcmContextEngine.maintain() for the conservative first transcript-GC pass. This pass only rewrites tool-result transcript entries that were already externalized into large_files during ingest, are linked through summary_messages, and are no longer present as raw context items. Rebuild replacement toolResult messages from stored message_parts, align them to transcript entries by stable toolCallId, and request runtime-owned rewrites in small batches. Also export the minimal assembler helpers needed for replacement reconstruction and add focused engine tests for candidate selection and maintain()-driven rewrite requests.

Regeneration-Prompt: |
  Implement Phase 2 of the tool-result externalization spec now that upstream OpenClaw has merged the transcript maintenance hook and rewrite helper. Keep this first pass conservative and additive: do not redesign compaction or add new schema unless required. Select transcript-GC candidates from LCM state only when a tool-result message was already externalized into large_files, is covered by summaries, and is no longer present as a raw context item. Rebuild the compact replacement message from stored message_parts so the placeholder content stays canonical, then align candidates to active transcript entries by stable toolCallId and ask the runtime to rewrite them in bounded batches. Skip anything ambiguous instead of trying to be clever. Add focused tests that prove candidate discovery works and that maintain() requests the expected rewrite payload for a summarized externalized tool result.

* docs: add transcript GC spec and changeset

Document the current state of tool-result externalization,
incremental bootstrap, and transcript GC in the repo spec.
Add a changeset for the new runtime-assisted transcript GC
behavior so release notes capture the user-visible impact.

Regeneration-Prompt: |
  OpenClaw upstream landed the transcript rewrite maintenance API,
  and this branch already implements the first pass of transcript GC
  for summarized externalized tool results. Add the missing repo-side
  documentation so the PR is self-contained: a spec in specs/ that
  explains what is already implemented, why it matters operationally,
  and what still remains to finish the design. Also add a changeset,
  because this changes user-visible runtime behavior by shrinking
  active transcripts after safe condensation. Do not pretend the
  implementation is complete; call out the remaining work explicitly,
  including legacy inline tool results, stronger transcript alignment,
  tighter eligibility/fresh-tail rules, and end-to-end integration
  coverage.
…g#243)

* feat: add bundled lossless-claw skill and /lcm diagnostics

Add the approved MVP operator surface for lossless-claw. This ships a bundled lossless-claw skill with focused references, registers a native /lcm command with /lossless as the alias, and exposes scan-only summary health diagnostics through /lcm doctor. It also updates package metadata so the skill is bundled and adds a changeset for the new user-facing surface.

Regeneration-Prompt: |\n  Implement the approved lossless-claw MVP operator surface inside the plugin package without depending on the Go TUI binary. Add a concrete plan doc first, then ship a bundled skill named lossless-claw with references covering configuration, architecture, diagnostics, and recall-tool usage. Register native plugin commands centered on /lcm with /lossless as the alias. Keep the command surface narrow: /lcm should report version, enabled and selected state, DB path and file size, summary counts, a defensible summarized-context metric, and whether broken or truncated summaries are present. /lcm doctor should be the only user-facing summary-health diagnostic entrypoint in MVP and should stay scan-only instead of exposing advanced repair or rewrite operations. Keep changes scoped, add tests for manifest metadata, registration, and command behavior, and update README plus release metadata for the new bundled skill and command surface.

* Polish lossless command status output

Keep /lossless as the surfaced native command while documenting /lcm as the hidden alias. Rework status and doctor output into compact section cards, split GLOBAL vs CURRENT CONVERSATION reporting, and fall back cleanly when the host does not expose session identity. Add focused tests for the fallback path and the forward-compatible session-key path.

Regeneration-Prompt: |
  Refine the lossless-claw command polish only. Keep `/lossless` as the visible native command and `/lcm` as an accepted hidden alias. Add built-in command docs that point users to `/lossless help`, reformat status and doctor output into compact emoji section cards, and split GLOBAL stats from CURRENT CONVERSATION stats. Investigate whether the plugin command handler can resolve the active LCM conversation from host-provided session identity; support hidden `sessionKey` or `sessionId` fields if they appear, but when the current OpenClaw command API does not expose them, show the nicest possible fallback explaining that only GLOBAL stats are available. Update targeted tests for the new help text, status layout, host-gap fallback, and forward-compatible session-key resolution.

* Use session-key resolution in /lossless status

Resolve the current LCM conversation from ctx.sessionKey first, with ctx.sessionId as a compatibility fallback when the active key is not stored yet. Keep mismatched session-id fallbacks unavailable so the status card does not show the wrong conversation, and add focused command tests for direct resolution, fallback, and mismatch handling.

Regeneration-Prompt: |
  Update the /lossless slash command status output so the CURRENT CONVERSATION section reflects the active LCM conversation for the OpenClaw plugin-command session. The host now passes PluginCommandContext.sessionKey and sessionId. Treat the active session key as authoritative, keep /lossless as the visible command and /lcm as the hidden alias, preserve the existing emoji/status-card formatting and lightweight help text, and fall back gracefully with explicit messaging when the current conversation cannot be resolved.

  If the active session key is not stored in the conversations table yet, use the active session id only as a compatibility fallback so older rows without session_key can still show current-conversation stats. Refuse that fallback when it points at a conversation already bound to a different stored session key, because that would show the wrong conversation. Add focused tests that cover direct session-key resolution, the session-id compatibility fallback, and the mismatch case, then verify the command tests and full suite still pass.

* Polish /lossless status card formatting

Tighten the /lossless status presentation without changing current-conversation resolution. Switch the card to compact label:value lines, rename the header alias copy, move section titles to title case, and remove session id from the visible current-conversation block while keeping session-key resolution and session-id fallback behavior intact.

Regeneration-Prompt: |
  Polish the /lossless status output on top of the existing session-key resolution work. Keep /lossless as the visible slash command and /lcm as the alias, preserve the active-session-key current-conversation behavior, and do not reintroduce the old binding-based resolution path.

  Adjust the card so it reads well in chat screenshots: avoid all-caps section headers, tighten spacing so it feels like a compact status card instead of debug output, change the header copy from Hidden alias to Alias, and remove current conversation session id from the displayed fields while keeping session key. Update the focused command tests to match the new formatting and verify both the command test file and the full test suite still pass.

* Tighten /lossless status card formatting

* fix: scope /lossless doctor to current conversation

Make /lossless doctor resolve the active LCM conversation using the same session-key/session-id logic as status and refuse to run a global scan when the current conversation cannot be resolved. Keep /lossless visible, preserve /lcm as the alias, and add focused tests for scoped issue, scoped clean, and unavailable behavior.

Regeneration-Prompt: |
  Josh changed the MVP requirement for `/lossless doctor`: it must only diagnose the current LCM conversation from the plugin command context, using the same session-key/session-id resolution path already used by status. If the current conversation cannot be resolved, return an explicit unavailable message and say that no global scan ran. Keep `/lossless` as the visible command, preserve `/lcm` as alias, retain the compact text format, and add focused tests covering a resolved conversation with local issues, a resolved clean conversation, and unresolved context with no global fallback.

* feat: add scoped lossless doctor apply

Implement a native TypeScript repair path for /lossless doctor apply.

Keep doctor scoped to the resolved current conversation only. Leave /lossless doctor as a read-only scan, and add /lossless doctor apply to rewrite detected broken summaries in place using the plugin's existing summarization runtime instead of the Go TUI bridge. Preserve the compact status-card output, return an explicit unavailable message when the current conversation cannot be resolved, and cover clean no-op, successful scoped repair, and unresolved no-global-fallback behavior in focused command tests.

Regeneration-Prompt: |
  Add a native TypeScript implementation for  inside the lossless-claw plugin. Keep  as a read-only scan and never broaden either command beyond the current conversation exposed by the host session identity. Reuse the existing broken-summary marker detection, order repairs bottom-up so condensed nodes can consume freshly repaired child summaries, and rewrite repaired summaries in place in SQLite. Use the plugin's own summarization/runtime facilities instead of calling into the Go TUI. Preserve the compact status-card command output, and if the active conversation cannot be resolved, return an explicit unavailable response without attempting any global scan or repair. Add focused tests for a clean no-op apply, a scoped repair that actually mutates summaries, and the unresolved case proving there is no global fallback.

* fix: improve doctor apply guidance and model fallback

* fix: refine lossless status metrics

* fix: simplify lossless compression ratio

* docs: polish bundled lossless-claw skill

* docs: complete bundled lossless-claw skill
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…#263)

* fix: prune heartbeat turns before compaction

* fix: use sessionKey continuity in afterTurn replay dedup

Resolve the conv642 replay-regression in the afterTurn dedup guard by looking up the stored conversation through the same stable session identity used elsewhere in the engine. The dedup path now prefers sessionKey continuity and only falls back to sessionId through the existing store helper, which prevents restart replays from being treated as fresh history when OpenClaw rotates the runtime sessionId for the same top-level session. Add a focused regression covering restart replay under agent:main:main with a changed runtime sessionId.

Regeneration-Prompt: |\n  Fix the conv642 / 0.6.0 replay-regression in lossless-claw without broad refactoring. The likely bug is that afterTurn replay dedup looks up prior history by sessionId too loosely, while the rest of the engine already treats stable sessionKey continuity as the canonical identity for a live conversation. Make the smallest code change that brings replay dedup into line with the existing getConversationForSession behavior, preserving current fallback behavior when no sessionKey exists. Add focused regression coverage for the real failure mode: a restart or runtime recycle changes the sessionId but keeps the same stable sessionKey, and the replayed historical prefix must still be deduplicated instead of re-ingested. Keep the scope limited to the conv642 replay issue.

* test: update compaction telemetry integration expectations

Refresh the lcm integration tests to match the intended compaction-telemetry cleanup. The compaction engine still reports meaningful result metadata and persists summaries, but it no longer writes synthetic compaction message parts into canonical transcript state. Replace the stale compaction-part assertions with checks that no compaction parts are persisted while leaf and condensed compaction still reduce tokens and create the expected summaries/context transitions.

Regeneration-Prompt: |\n  CI started failing in test/lcm-integration.test.ts after the compaction-telemetry cleanup because two integration tests still expected synthetic compaction parts to be persisted into canonical transcript output. Update those tests only. Keep the new assertions meaningful: verify that canonical transcript state stays free of compaction parts, while compaction still returns useful result metadata, reduces token counts, and creates leaf/condensed summaries and summary context items as appropriate. Rerun the relevant integration file, then a slightly broader pass including engine tests to confirm the branch remains green.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…Engineering#270)

Regeneration-Prompt: |
  Phase 1 for lossless-claw issue Martian-Engineering#268. Timeout-recovery compaction was
  forcing budget-targeted recovery through compactFullSweep(), which only
  reasons over persisted context tokens. In the incident shape, live context
  was 277,403 tokens while stored context was already much smaller, so the
  forced sweep path could no-op on the wrong signal instead of using the
  capped compactUntilUnder() loop.

  Change only the routing needed for forced budget recovery. Preserve the
  existing full-sweep behavior for manual compaction requests and proactive
  threshold sweeps. Add focused regression coverage that proves the forced
  recovery path now calls compactUntilUnder() with the budget target and live
  token count, while threshold-target sweeps still stay on compactFullSweep().
  Include a patch changeset because this is a user-visible bug fix.
…Anthropic no longer supporting usage plans) (Martian-Engineering#273)

* fix: support runtime-managed oauth summarizer providers

* docs: add summary timeout config and preserve default

* fix: restore oauth summarizer behavior support

* fix: preserve codex oauth resolution and skip direct retry

* test: cover openai-codex expansion override happy path

* test: cover codex large-file summarization path

* test: clarify runtime-managed auth retry contract

* fix: use existing codex api predicate helper

* fix: note oauth summarizer support and timeout config

---------

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>
Martian-Engineering#261)

* fix: add per-DB async transaction mutex to prevent cross-session nested-transaction failures

Fixes Martian-Engineering#260

Root cause: Multiple async sessions share one synchronous DatabaseSync handle.
SQLite's transaction state is per-connection, so concurrent async code paths
that both issue BEGIN while the other is mid-transaction (awaiting async work)
cause 'cannot start a transaction within a transaction' errors.

Fix: Introduce acquireTransactionLock() — a per-database async mutex using
a WeakMap<DatabaseSync, promise-chain>. Applied to all three explicit
transaction entry points:

- ConversationStore.withTransaction() — BEGIN IMMEDIATE
- SummaryStore.replaceContextRangeWithSummary() — BEGIN
- lcm-doctor-apply.ts applyScopedDoctorRepair() — BEGIN IMMEDIATE

The mutex serializes transaction acquisition per DB instance while allowing
different databases to proceed independently.

Includes regression tests covering:
- Concurrent withTransaction from multiple sessions on one DB
- Concurrent replaceContextRangeWithSummary calls
- Cross-store (ConversationStore + SummaryStore) concurrent transactions
- Error propagation without mutex deadlock
- 10-session stress test
- Independent database isolation

* [subagent] fix: address PR Martian-Engineering#261 review nits

* fix: widen shared SQLite transaction coordination

* fix: add release notes for sqlite transaction hotfix

---------

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…g#283)

* fix: redirect LCM diagnostic log output to stderr

Route all deps.log calls through console.error() instead of api.logger.*
so that [lcm] diagnostic lines never contaminate stdout JSON output.

Fixes Martian-Engineering#165

Co-authored-by: RJ Johnston <293686+rjdjohnston@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: keep LCM diagnostics on stderr

---------

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: RJ Johnston <293686+rjdjohnston@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Josh Lehman <josh@martian.engineering>
…g#283)

* fix: redirect LCM diagnostic log output to stderr

Route all deps.log calls through console.error() instead of api.logger.*
so that [lcm] diagnostic lines never contaminate stdout JSON output.

Fixes Martian-Engineering#165

Co-authored-by: RJ Johnston <293686+rjdjohnston@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: keep LCM diagnostics on stderr

---------

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: RJ Johnston <293686+rjdjohnston@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Josh Lehman <josh@martian.engineering>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix: resolve TUI topic session lookups

Resolve TUI session metadata and count lookups against the selected conversation row instead of grouping by bare session_id. Topic-suffixed session filenames now prefer an exact session_key match and only then fall back to the normalized bare session_id, which restores conv_id, session key, summary count, and file count for Telegram topic sessions while preserving non-topic behavior. Reuse the same resolution path for single-session conversation lookups so summaries/files/context drill-downs follow the same normalization.

Regeneration-Prompt: |
  Fix the lossless-claw TUI bug where Telegram topic session files on disk are named like <session-id>-topic-<n> but LCM stores the bare runtime session_id and the topic identity separately in session_key. Keep the patch tight in tui/data.go and related tests. Preserve existing behavior for non-topic sessions. Resolve each visible session entry to a concrete conversation row first, preferring an exact session_key match for topic-suffixed filenames and otherwise falling back to the normalized bare session_id, then load summary/file counts by conversation_id so multiple topic rows sharing one bare session_id do not collapse together. Add regression coverage showing a topic session file now gets the right session key, conv_id, summary count, file count, and single-session lookup behavior.

* fix: note TUI topic session lookup correction
Martian-Engineering#288)

* fix: defer DB init to gateway_start hook to prevent database lock race

On macOS with launchd KeepAlive, gateway restarts can spawn two
processes simultaneously. Both call register() and open lcm.db,
causing "database is locked" errors that loop indefinitely.

Defer createLcmDatabaseConnection() and LcmContextEngine construction
from register() to the gateway_start plugin hook, which fires after
the HTTP server binds its port and stale PIDs are killed. Uses
module-level shared state so deferred plugin reloads reuse the
already-initialized connection.

Fixes Martian-Engineering#287

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review findings — FD leak, unhandled rejection, config staleness

Addresses Copilot review comments and adversarial audit findings:

1. Share only the DB handle at module scope; rebuild LcmContextEngine
   per-register() with fresh deps so hot-reloaded config takes effect.

2. Prevent unhandled promise rejection crash by attaching a no-op
   .catch() to the ready promise immediately after creation.

3. Close old DB connection when databasePath changes (prevents FD leak
   and stale locks — the exact problem this PR fixes).

4. Add gateway_stop handler to close DB cleanly on shutdown.

5. Fix half-initialized stuck state: if DB opens but engine fails in
   the else-if branch, properly set initError and reject the promise
   instead of silently swallowing.

6. Export __resetSharedInitForTests() for test isolation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use closeLcmConnection for tracking, accept db callback in command

Addresses second round of Copilot review:

1. Use closeLcmConnection(db) instead of db.close() in the eager-init
   failure path to keep the connection tracking maps consistent.

2. Change createLcmCommand to accept db as DatabaseSync | (() => DatabaseSync)
   so the deferred getter can be passed without a type assertion cast.
   Backward-compatible: existing callers passing a plain DatabaseSync
   still work via the typeof check.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: simplify to eager-first init with deferred fallback on lock only

Major simplification addressing test failures and review concerns:

The previous approach (defer everything to gateway_start, share DB at
module scope) broke tests that never fire gateway_start and introduced
complexity around shared state, promise lifecycle, and config staleness.

New approach: try eager DB init immediately in register() (preserving
original behavior for tests and normal startup). Only defer to
gateway_start if the eager open fails with "database is locked" — the
specific error from the macOS launchd orphan-process race.

This eliminates:
- Module-level shared state (no more sharedDb, no test pollution)
- Promise lifecycle complexity (no unhandled rejection risk in normal path)
- Config staleness (engine built with fresh deps every register())
- The need for __resetSharedInitForTests()

Each register() call gets its own DB handle and engine, matching the
original code's behavior. The only difference: lock errors are caught
and retried via gateway_start instead of looping forever.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review findings — lazy DB in command, handle leak, use-after-close

- Move getDb() into status/doctor branches so /lossless help never
  resolves the database (review comment lcm-command.ts:733)
- Close raw DatabaseSync handle when PRAGMA setup fails in
  createLcmDatabaseConnection to prevent FD leaks (review comment
  index.ts:1586)
- Clear deferredEngine on gateway_stop and guard getEngine() against
  closed database to prevent use-after-close (review comment
  index.ts:1642)
- Add tests covering the db: () => DatabaseSync lazy path: help
  must not invoke the resolver, status must (review comment
  lcm-command.ts:720)

* fix: disambiguate error messages for null database states

getDatabase() now distinguishes "closed after gateway_stop" from
"not yet initialized" with a stopped flag. getEngine() delegates
to getDatabase() instead of duplicating the null check with its
own misleading message.

* fix: guard getEngine against use-after-close, fix misleading comment

- Call getDatabase() before returning eagerly-constructed lcm so
  post-gateway_stop calls fail fast instead of returning an engine
  backed by a closed DB handle
- Update rethrow comment to accurately describe error propagation
  (framework handles it, not the engine constructor)

* fix: await deferred LCM init across runtime entrypoints

When eager DB open hits a lock during gateway restart, share one deferred
initialization promise across context-engine resolution, tools, commands,
and lifecycle hooks so the first request waits for gateway_start instead of
failing. Persist deferred retry failures so later callers see the real
error, and add a patch changeset for the user-visible startup fix.

Regeneration-Prompt: |
  Follow up on PR 288's deferred SQLite startup path for lossless-claw.
  The lock-contention fallback must not move the failure from plugin load to
  the first request: context engine resolution, plugin tools, commands, and
  lifecycle hooks should all await the same deferred initialization when the
  initial open fails with "database is locked" during macOS launchd
  restarts. If the deferred retry also fails, retain and rethrow that real
  error instead of misleading callers with a perpetual "waiting for
  gateway_start" message. Keep the eager-success path intact, add focused
  regression coverage for deferred success and deferred failure, and include
  the missing patch changeset because this changes user-visible runtime
  behavior.

---------

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Josh Lehman <josh@martian.engineering>
…ering#294)

* perf: optimize SQLite PRAGMAs and add missing indexes

Zero-logic-change performance improvements for multi-GB databases
with concurrent agent sessions.

PRAGMAs added to configureConnection():
- cache_size = -65536 (64MB page cache, up from 2MB default)
  Demand-allocated, released on close. 5 connections = 320MB max.
- synchronous = NORMAL (officially recommended for WAL mode)
  Crash-safe for app crashes; only risks power-failure data loss.
  Bootstrap re-ingests any lost transactions from session files.
- temp_store = MEMORY (keeps temp B-trees in RAM)

Added PRAGMA optimize on connection close to update query planner
statistics for tables that changed during the session.

Missing indexes (cause full table scans on large databases):
- summary_messages(message_id) — needed for cascade delete lookups
- summaries(conversation_id, kind, depth) — needed for condensation
  depth filtering queries

Fixes Martian-Engineering#291 (partial — PRAGMA + index portion)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move depth-dependent index after ensureSummaryDepthColumn migration

The summaries(conversation_id, kind, depth) index references the
depth column which is added by ensureSummaryDepthColumn(). The index
was in the initial schema creation (too early). Moved it to run
right after the depth column migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR Martian-Engineering#294 review — optimize error handling, index order, comments

1. PRAGMA optimize in separate try block so SQLITE_BUSY doesn't skip
   db.close() (handle leak prevention).

2. Index column order: (conversation_id, depth, kind) instead of
   (conversation_id, kind, depth) — matches getDistinctDepthsInContext
   query pattern which filters by conversation_id + depth.

3. Fixed misleading comment on summary_messages index.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move depth index after backfillSummaryDepths to avoid migration overhead

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: assert perf indexes exist after migration (Martian-Engineering#291)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add changeset for sqlite tuning PR

---------

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Josh Lehman <josh@martian.engineering>
…ian-Engineering#298)

engine.ts called compaction.compactFullSweep() directly for manual
and overflow compaction paths, bypassing the compact() method. Once
PR Martian-Engineering#295 adds the withContextCache wrapper to compact(), this direct
call would miss the per-phase context cache optimization.

Change: compactFullSweep → compact (same signature, same behavior,
but goes through the wrapper that future PRs will enhance).

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ineering#285)

* feat: add conversation prune function for data retention

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden prune cutoff and delete flow

Use SQLite date math for prune candidate selection so mixed timestamp formats compare chronologically instead of lexically. Wrap confirm-mode candidate selection and deletion in one IMMEDIATE transaction to avoid deleting conversations that become fresh during the prune run.

Add a regression test covering SQLite-formatted timestamps on the cutoff boundary.

Regeneration-Prompt: |
  The prune helper added in PR 285 had two review findings to address before it is safe to use against a live LCM database. First, the candidate query compared message timestamps as raw TEXT against an ISO cutoff string. This repo stores some timestamps via SQLite datetime('now') and others via JavaScript toISOString(), so lexical comparison can prune same-day rows that are actually newer than the cutoff. Change the filter to use SQLite julianday(...) and add a regression test that seeds a SQLite-format timestamp newer than the cutoff but lexically smaller than the ISO string.

  Second, confirm-mode pruning selected candidates and then deleted them row by row outside a transaction. Tighten that by running candidate selection and deletion inside BEGIN IMMEDIATE so the prune sees one consistent snapshot and does not remove conversations that received a fresh message mid-run. Keep dry-run behavior unchanged and preserve the existing optional VACUUM behavior.

* fix: prune dependent records before deleting conversations

Delete summary lineage, context items, and FTS rows ahead of conversation deletion so prune works against the current schema's RESTRICT edges. Add a regression test that prunes a conversation containing summary_messages and context_items.

Regeneration-Prompt: |
  Running the prune helper against the live LCM database exposed a schema-level failure that the existing tests missed. Deleting a conversation directly did not work because several child tables mix CASCADE links from conversations with RESTRICT links back to messages and summaries. Reproduce that case with a test conversation that has a message, a linked summary, summary_messages lineage, and a context_items row. Then change prune so confirm-mode deletes the dependent rows in a safe order before deleting the conversation, and also clear any optional FTS rows tied to the pruned messages and summaries so search indexes do not retain orphaned entries.

* fix: batch prune live databases safely

Chunk confirmed pruning into bounded transactions so large live databases can be cleaned incrementally without one giant write lock. Delete cross-conversation context rows that reference pruned summaries or messages, and add supporting indexes plus regression coverage for batch mode and retained-context cleanup.

Regeneration-Prompt: |
  The prune helper already handled mixed timestamp formats and dependent summary/message cleanup, but it still did not work reliably on a large live LCM database. Update it so confirm-mode pruning runs in small committed batches instead of one giant transaction. Add options to control batch size and an optional max batch count for bounded runs. Preserve dry-run behavior.

  While testing against a large live database, pruning exposed an additional FK case: retained conversations can keep context_items rows that reference summaries being pruned from another conversation. Extend the delete path to remove context_items rows by referenced candidate message_id and summary_id, not just by candidate conversation_id. Keep the existing summary_messages and summary_parents cleanup.

  Add regression tests for multi-batch pruning, bounded batch runs, and the cross-conversation context_items case. Also add the missing indexes needed for live-scale deletes on summary_messages(message_id) and summary_parents(parent_summary_id).

* fix: checkpoint wal after prune vacuum

Follow VACUUM with wal_checkpoint(TRUNCATE) so operator-triggered prune runs reclaim disk space immediately in WAL mode instead of leaving the rewritten pages stranded in lcm.db-wal. Add a regression test that verifies the WAL is drained after a vacuumed prune.

Regeneration-Prompt: |
  The prune helper already supports an optional vacuum pass after confirmed deletion, but in WAL mode that still leaves reclaimed pages sitting in the WAL file until a checkpoint happens. Update the vacuum path so a prune with vacuum enabled also runs PRAGMA wal_checkpoint(TRUNCATE) immediately afterward. Keep the existing API shape.

  Add a focused regression test in prune.test.ts that proves the WAL is drained after a vacuumed prune, for example by checking PRAGMA wal_checkpoint(PASSIVE) returns zero log frames after the prune completes.

---------

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Josh Lehman <josh@martian.engineering>
…-Engineering#302)

* fix: singleton DB init per dbPath + fallback provider config

## Problem

OpenClaw v2026.4.5+ calls plugin register() per-agent-context (main,
subagents, cron lanes) — not once at startup. Each call opens a new
DB connection and runs migrations, causing "Migration failed: database
is locked" storms on large databases. PR Martian-Engineering#288's deferred-init fix
was merged but does not address this per-context re-registration.

## Solution

### Singleton DB + engine (critical fix)

Uses globalThis + Symbol.for() singleton (same pattern as
startup-banner-log.ts) keyed on normalized dbPath. When register()
is called again with the same DB path, it skips init entirely and
wires handlers to the existing waitForEngine/waitForDatabase closures
via wirePluginHandlers(). gateway_stop clears the singleton so a
fresh init occurs on restart.

The shared state stores only the closures (not mutable copies of
database/lcm locals), avoiding stale-reference bugs.

### Fallback provider config (additive)

- Add fallbackProviders config field (env: LCM_FALLBACK_PROVIDERS,
  format: provider/model,provider/model) for explicit compaction
  summarization fallbacks
- Append to existing 5-level candidate chain with dedup
- Exponential backoff (500ms→8s) between candidate retries
- PROVIDER FALLBACK / ALL PROVIDERS EXHAUSTED messages on stderr
- Half-threshold early warning and CIRCUIT BREAKER OPEN/CLOSED
  messages with cooldown time
- Startup banner for configured fallback providers

* fix: handle terminal summarizer exhaustion fallback

Route terminal non-auth provider failures through the shared exhaustion handler so deterministic truncation actually runs, add regression coverage, and include a changeset for the runtime behavior fix.

Regeneration-Prompt: |
  Address the PR review finding in the multi-provider summarizer fallback path. The existing code added an ALL PROVIDERS EXHAUSTED log after the candidate loop, but the loop always returned, continued, or threw before that block could execute. Preserve existing auth-failure behavior because LcmProviderAuthError is used intentionally by compaction and the circuit breaker, but make terminal non-auth failures fall through to one shared exhaustion path that logs clearly and returns buildDeterministicFallbackSummary instead of an empty string. Add a focused regression test that exhausts all resolved non-auth candidates and proves both the terminal log and deterministic fallback behavior. Add a patch changeset because this changes runtime behavior and logging for plugin summarization fallback.

---------

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>
jalehman and others added 11 commits April 11, 2026 07:20
…g#396)

* fix: replay full transcript after session rotation

Session-file rotation was purging the existing conversation and then reseeding it through the normal first-bootstrap cap. That left only a small suffix of the rotated transcript in LCM, which in turn made assembly fall back to live context while incremental compaction and manual compaction still evaluated the truncated persisted state.

Treat rotation reseeds as full transcript replacements instead of first-time bootstraps so LCM keeps coverage parity with the live session. Add a regression test that uses a tiny bootstrapMaxTokens value and verifies a rotated transcript is replayed in full.

Regeneration-Prompt: |
  User reported that a live lossless-claw session showed enormous actual context usage, but incremental compaction and /compact both believed the conversation was comfortably under target. Logs showed the session repeatedly falling back to live context while compaction decisions reported rawTokensOutsideTail=0.

  Trace the mismatch through bootstrap, assemble, and compaction. The key production clue is a session-file rotation event followed by an initial import that only reloaded a tiny suffix of a much larger rotated transcript. Preserve the intended first-bootstrap cap for genuinely new conversations, but do not reapply that cap when a known conversation is being reseeded after session-file rotation. In that case, replay the full rotated transcript so persisted LCM coverage matches the live session again. Add a focused regression test that would fail if rotation reseeds were trimmed by bootstrapMaxTokens.

* fix: stop purging history on session rotation

Remove the automatic bootstrap purge path that deleted persisted conversation data whenever the session file path changed. Rotation now only invalidates the stored bootstrap checkpoint and reconciles forward from the existing conversation state, preserving messages, summaries, lineage, and context items.

Update the rotation regression tests to assert that existing conversation state survives file changes and that only the missing tail messages are imported after rotation.

Regeneration-Prompt: |
  Lossless Claw should never delete persisted conversation history just because
  the backing session JSONL file path changes. We already traced a production
  data-loss incident to the rotation handling added in PR Martian-Engineering#366: on file-path
  mismatch, bootstrap hard-deleted messages, summaries, context items, lineage,
  and telemetry, then rebuilt from the new file. The earlier fix kept the full
  rotated transcript during reseed, but that still left an automatic destructive
  purge in the code, which violates the product contract that lossless is
  lossless unless the user explicitly invokes a destructive command.

  Remove the purge-on-rotation behavior entirely while keeping rotation
  detection/logging. Treat a rotated file as another transcript source for the
  same conversation: invalidate the old bootstrap checkpoint and reconcile from
  the persisted conversation state so only genuinely missing tail messages are
  imported. Preserve existing summaries and context items. Update regression
  tests to prove that file rotation does not wipe conversation history and does
  not reapply the first-bootstrap budget cap to an existing conversation.
…#380)

Add versioned startup backfill state so the expensive summary and tool-call
repairs only run once per algorithm version. Keep retry safety by wrapping each
versioned backfill and its completion marker in a savepoint so a failed startup
rolls back partial backfill writes and reruns cleanly on the next launch.

Regeneration-Prompt: |
  Implement the startup backfill gating change in lossless-claw without using
  PRAGMA user_version or column-existence guesses as the completion signal.
  Add an additive SQLite table keyed by backfill step name and algorithm
  version, and only skip a backfill after that exact version completes.
  Preserve partial-upgrade safety by making the backfill work and state write
  succeed or roll back together, then cover first-run state creation, repeat
  startup skipping, and retry-after-failure behavior in migration tests. This
  runtime change affects package behavior, so include a patch changeset.
…artian-Engineering#387)

* fix: refresh bootstrap checkpoint after afterTurn message ingestion

The append-only fast path introduced in v0.7.0 uses a DB message hash
to verify the bootstrap checkpoint. refreshBootstrapState() is called
after heartbeat pruning and after maintain(), but never after regular
message ingestion in afterTurn().

This means every real conversation turn advances the DB frontier past
the checkpoint hash, causing the next bootstrap to fall back to a full
JSONL transcript read. On large sessions this adds 20+ seconds per turn.

The fix adds a refreshBootstrapState() call after successful ingest,
before compaction, keeping the checkpoint aligned with the DB frontier.

Fixes Martian-Engineering#386

* test: cover PR 387 bootstrap checkpoint fix

Add a regression test for the normal afterTurn-to-bootstrap append-only fast path and include a patch changeset for the user-visible performance fix in PR Martian-Engineering#387.

Regeneration-Prompt: |
  Follow up on lossless-claw PR Martian-Engineering#387 by addressing review findings only. Keep the code change narrow: add one direct regression test that proves a normal real-turn afterTurn refreshes the bootstrap checkpoint so the next bootstrap stays on the append-only fast path without reconcileSessionTail, and add a patch changeset because the fix changes user-visible runtime performance. Run focused engine tests for the new normal-turn case and the existing heartbeat checkpoint case before pushing back to the contributor branch if maintainer edits are allowed.

---------

Co-authored-by: root <root@vega.arpa>
Co-authored-by: Josh Lehman <josh@martian.engineering>
* docs: add lossless data handling principles

* fix: dedupe topic transcript sessions in tui

The TUI was listing raw JSONL filenames as separate sessions, so when OpenClaw wrote both a bare session file and a topic-qualified file for the same canonical session id, the list showed duplicate rows even though both mapped to one logical LCM conversation. Collapse discovered session files by the JSONL session header id and prefer the topic-qualified transcript when both variants exist. Add a regression test for that duplicate-file case.

Regeneration-Prompt: |
  Fix the lossless-claw TUI so it does not show duplicate session rows when the sessions directory contains both <session-id>.jsonl and <session-id>-topic-<n>.jsonl for the same logical session. The canonical identity should come from the JSONL header's session id, not just the filename stem. Keep existing DB lookup behavior for topic sessions, but collapse duplicate on-disk files into one visible row and prefer the topic-qualified transcript when choosing which file to represent that session. Add a focused test that creates both files with the same header id and verifies the topic-qualified transcript wins.
…Martian-Engineering#397)

The OpenClaw core slot resolver in `plugins/slots.ts` reads a plugin's
top-level `kind` field, maps it via `SLOT_BY_KIND` (`context-engine ->
contextEngine`), and calls `applyExclusiveSlotSelection`. When `kind` is
missing, `slotKeysForPluginKind` returns an empty array and the
exclusive-slot selection path early-returns without assigning any slot.
As a result, `openclaw plugins install @martian-engineering/lossless-claw`
silently leaves `plugins.slots.contextEngine` unset, and OpenClaw falls
back to the built-in `legacy` context engine at runtime. The plugin loads
and registers its context engine with `api.registerContextEngine("lossless-claw", ...)`,
but nothing ever routes traffic to it.

The only user-visible symptom is that `lcm.db` stays at the initial
~4 KB / 0 tables forever, even on an install that reports success. The
README states that installation auto-sets the slot, which is the intended
behavior — it only fails because the manifest is missing this one field.

The fix is a single new top-level field in `openclaw.plugin.json`. A
matching manifest-shape test is added to `test/config.test.ts` so any
future edit that drops or retypes the field fails fast.

Verification:
- All 40 test files / 694 tests pass locally on `npm test`
- New manifest test explicitly checks `manifest.kind === "context-engine"`
- Traced against OpenClaw core `dist/slots-CFrDTeTR.js` (`SLOT_BY_KIND`,
  `slotKeysForPluginKind`, `applyExclusiveSlotSelection`): with `kind`
  present, `slotKeysForPluginKind` now returns `["contextEngine"]` and
  `applyExclusiveSlotSelection` writes `plugins.slots.contextEngine =
  "lossless-claw"`, which is the exact config state that activates LCM
  end-to-end.

Fixes Martian-Engineering#384.

Co-authored-by: Molt <molt@openclaw.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…false positive (Martian-Engineering#400)

The OpenClaw security scanner flags `process.env` combined with
`/\bfetch\b/i` as credential harvesting. The word 'Fetch' in a JSDoc
comment ('Fetch all context items') was triggering the network-send half
of the heuristic, blocking installation for users.

Adding --minify-whitespace to the esbuild command strips all comments
(including JSDoc) while keeping identifiers readable. Bundle shrinks
from 712KB to 552KB.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…neering#403)

readFileSegment, readLastJsonlEntryBeforeOffset, and the statSync calls
in bootstrap() / refreshBootstrapState() all used synchronous Node.js fs
APIs. On multi-MB session JSONL files the backward scan in
readLastJsonlEntryBeforeOffset could block the event loop for minutes,
freezing every gateway session, the Control UI, CLI, and WebSocket
connections.

Convert those functions to fs/promises (open / FileHandle.read /
FileHandle.stat / stat). readAppendedLeafPathMessages becomes async
transitively. The backward scan now also only reads a new chunk when the
current carry has no more newlines to peel, instead of re-reading on
every iteration (which both wasted I/O and amplified the implicit O(n^2)
prepended-carry pattern).

The bootstrap append-only fast path additionally short-circuits before
the expensive backward scan when latestDbHash !== lastProcessedEntryHash.
That is the common case during active sessions (the DB frontier advances
past the checkpoint between bootstraps), and the matcher can never find a
matching tail entry in that state, so we skip straight to the async
full-read slow path.

Tests in bootstrap-message-only.test.ts are updated to await the
now-async function; full vitest suite (695 tests, 40 files) stays green.

 via Claude Code

Co-authored-by: jet <dev@jetd.one>
…an-Engineering#355)

* feat: unified inline image detection and tool result string format fix

1. Detect and externalize base64 image data (JPEG, PNG, GIF, WebP, SVG)
   in messages of any role (user/tool/assistant). Images are saved as
   binary files. Handles both OpenClaw "[media attached:]" user pattern
   and pure base64 payloads in any message content.

2. Normalize string-format tool result content to array before processing
   so large string tool outputs are properly externalized.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: preserve tool image externalization and configurable storage

Route externalized payloads through a shared large-files directory, add
largeFilesDir/LCM_LARGE_FILES_DIR to the config surface, and prevent
already-externalized image references from being re-routed through the
generic large tool-result text externalizer.

Add focused regressions for inline image storage, structured tool-result
image payloads, string-content tool-result image payloads, and the new
config resolution path. Sync the plugin schema, docs, skill reference,
and add a changeset for the user-visible behavior.

Regeneration-Prompt: |
  Address the PR review findings on inline image externalization in
  lossless-claw. Keep the change additive and local: preserve the
  existing large tool-result externalization behavior for real text
  payloads, but stop tool-result image references from being
  externalized a second time as .txt files. Make tool-message image
  detection operate on the original message shape so structured
  tool_result/function_call_output payloads still round-trip through
  message_parts and assembly. Also stop hard-coding the large-files
  storage location by introducing a configurable largeFilesDir with an
  LCM_LARGE_FILES_DIR env override, then sync the manifest, docs, skill
  reference, tests, and release notes entry so the new storage behavior
  is fully documented.

---------

Co-authored-by: Lanic <lanic@LanicdeMac-mini.local>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Josh Lehman <josh@martian.engineering>
Adds a daily/weekly/monthly rollup system on top of the existing LCM
summary DAG. Pre-built daily rollups give fast answers to temporal
questions ('what did we do today?', 'catch me up on yesterday') without
requiring keyword search or LLM calls.

Architecture:
- Schema: lcm_rollups, lcm_rollup_sources, lcm_rollup_state tables
  (additive, no existing table modifications)
- RollupStore: data access layer with proper UPSERT semantics
- RollupBuilder: timezone-aware daily builder with fingerprinted
  idempotent rebuilds, keyword-based outcome extraction
- lcm_recent tool: period-based temporal recall with grep fallback

Safety:
- All writes use ON CONFLICT DO UPDATE (not INSERT OR REPLACE)
- Rebuilds wrapped in BEGIN IMMEDIATE transactions
- Rollup tables created before FTS5 guard (FTS5-independent)
- New partial index on summaries(conversation_id, kind) WHERE kind='leaf'

Tested against 1.9GB production LCM database. Adversarial review
completed (R-463). Scenario validation against 15 use cases (R-465).
Copilot AI review requested due to automatic review settings April 13, 2026 11:45
@coderabbitai

coderabbitai Bot commented Apr 13, 2026

Copy link
Copy Markdown

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR introduces version 0.8.2 of the lossless-claw plugin with significant feature additions and architectural improvements: native /lcm plugin commands with diagnostics/repair capabilities, configurable fresh-tail token capping and prompt-aware assembly eviction, conversation pruning, daily rollup construction, compaction telemetry tracking, improved FTS5 search with sorting, async bootstrap operations, and extensive configuration schema expansions. Build output is now distributed as pre-built dist/index.js with esbuild.

Changes

Cohort / File(s) Summary
Changesets
.changeset/async-bootstrap-sync-io.md, .changeset/wise-bears-doubt.md, .changeset/*-removed
Added new patch (async bootstrap) and minor (tool-result externalization) changesets; removed 5 obsolete entries.
Package & Build
package.json, Dockerfile, .github/workflows/publish.yml
Version bumped 0.5.3→0.8.2, main and openclaw.extensions now point to dist/index.js, added esbuild build script, Docker build now fails on npm run build errors, publish workflow adds build step.
Core Configuration
openclaw.plugin.json, src/db/config.ts, docs/configuration.md
Extended plugin config schema with 20+ new knobs (freshTail/leaf/condensed sizing, summary timeout, transcript GC, fallback providers, cache-aware/dynamic-leaf compaction, circuit breaker, large-files dir). Updated defaults and precedence documentation.
Plugin Commands & Diagnostics
src/plugin/lcm-command.ts, src/plugin/lcm-doctor-apply.ts, src/plugin/lcm-doctor-cleaners.ts, src/plugin/lcm-doctor-shared.ts
Implemented /lcm native command with status/doctor/doctor-clean subcommands; added repair logic that rewrites broken summaries, cleaner filters for archived/cron/orphaned conversations, and doctor marker detection.
Lazy Engine & Shared Init
src/plugin/index.ts, src/plugin/shared-init.ts
Refactored plugin initialization to defer DB/engine creation until gateway start (when DB lock is likely released), cache per-dbPath singletons across register calls, support eager-first with deferred retry on lock detection, and multi-profile state-dir selection.
Assembly & Token Management
src/assembler.ts, src/estimate-tokens.ts, src/compaction.ts
Added fresh-tail token cap (freshTailMaxTokens), prompt-aware relevance-based eviction, CJK-aware token estimation (per-codepoint weights), truncation utilities, and enhanced context caching during compaction passes.
Retrieval & Sorting
src/retrieval.ts, src/store/full-text-sort.ts, src/store/conversation-store.ts, src/store/summary-store.ts
Added FTS5 sorting options (recency/relevance/hybrid with age-decay), refactored search input to thread sort parameter through stores, improved timestamp parsing with UTC normalization.
New Storage & State
src/store/compaction-telemetry-store.ts, src/store/rollup-store.ts, src/store/parse-utc-timestamp.ts
Added telemetry tracking (cache state, activity band, compaction timing), rollup builder/storage for daily summaries, and centralized UTC timestamp parsing.
Database Migrations & Features
src/db/migration.ts, src/db/features.ts, src/db/connection.ts, src/transaction-mutex.ts
Extended migrations for rollup/telemetry/FTS tables, added trigram tokenizer detection, improved connection config (cache size, synchronous pragma, OPTIMIZE before close), introduced async transaction serialization via mutex.
Retrieval Tools
src/tools/lcm-grep-tool.ts, src/tools/lcm-expand-tool.ts, src/tools/lcm-describe-tool.ts, src/tools/lcm-expand-query-tool.ts, src/tools/lcm-recent-tool.ts, src/tools/lcm-expansion-recursion-guard.ts
Converted all tools to support optional lazy LCM engine acquisition (getLcm async callback), added sort parameter to grep/expand-query, implemented concurrency guard for multi-conversation expansion, new recent-tool with rollup/fallback support.
Utilities & Cleanup
src/lcm-log.ts, src/prune.ts, src/rollup-builder.ts, src/startup-banner-log.ts, src/summarize.ts
Added logger factory with NOOP fallback, conversation pruning by duration, daily rollup construction with fingerprinting, new startup banners (transcript GC, fallback providers, state dir), improved summarizer error handling and fallback provider integration.
Documentation
README.md, README_zh.md, docs/agent-tools.md, docs/architecture.md, AGENTS.md, CHANGELOG.md, skills/lossless-claw/*
Added Chinese README, extended agent-tools guidance with sort/breakdown docs, updated architecture for largeFilesDir, comprehensive SKILL guide with diagnostics/config/architecture/recall-tools references, updated AGENTS principles for lossless data handling.
Git & Local Metadata
.gitignore, .pebbles/*, .local/lcm-pretyping-latency-memo.md, specs/*
Added .pebbles and docs/plans to ignore list, removed pebbles config/events, added latency investigation memo and tool-result-externalization/incremental-bootstrap spec.
Test Coverage
test/*.test.ts
Added 1000+ lines of tests: command execution, doctor repair, bootstrap flood regression, message-only offset handling, FTS sanitization, expansion concurrency/multi-conversation, reasoning selection, estimate-tokens, lcm-integration assembly/compaction behaviors, and config resolution.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: Significant architectural changes across 90+ files spanning multiple domains (plugin commands, database schema, tool refactoring, new state management). High logic density in doctor repair and expansion orchestration logic. Introduces new concurrency/guard patterns and lazy initialization with deferred DB setup. New FTS search sorting and token estimation require careful validation. Extensive test coverage helps, but interdependencies between config, migration, compaction telemetry, and tool initialization warrant careful trace-through of critical paths.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/lcm-recent

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a temporal rollup foundation and improves LCM’s operational/recall ergonomics (sorting, token estimation, transaction serialization, config/manifest updates, and diagnostic tooling) to make recency-aware recall and large-DB operation safer and faster.

Changes:

  • Added new storage/infra primitives (rollup store, transaction mutex, UTC timestamp parsing, token estimator, full-text sort helpers).
  • Expanded tool and retrieval behavior (FTS query sanitization, full-text sort modes, lazy LCM engine acquisition in tools).
  • Updated packaging/manifest/docs/tests to ship a bundled skill + command surface and to validate new config/auth/search behaviors.

Reviewed changes

Copilot reviewed 84 out of 101 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test/lcm-tools.test.ts Adds grep metadata/sort assertions
test/lcm-summarizer-reasoning.test.ts Covers reasoning defaults + retries
test/lcm-expand-tool.test.ts Aligns deps with new config fields
test/index-secret-ref-auth-profiles.test.ts Ensures runtime provider config merged
test/index-complete-provider-config.test.ts Covers provider config overrides/errors
test/index-complete-model-auth.test.ts Covers runtime model-auth precedence
test/fts5-sanitize.test.ts Adds quoted-phrase sanitize cases
test/fts-fallback.test.ts Extends CJK/mixed-language search coverage
test/expansion.test.ts Updates baseline config to new shape
test/estimate-tokens.test.ts Adds unit tests for token estimator
test/circuit-breaker.test.ts Updates baseline config to new shape
test/bootstrap-message-only.test.ts Covers message-only JSONL tail reading
test/bootstrap-flood-regression.test.ts Adds integration regression for bootstrap flood
test/assembler-blocks.test.ts Covers externalized tool output blocks + scoring
src/types.ts Extends deps/complete types (diagnostics/auth/reasoning)
src/transaction-mutex.ts Serializes DB transactions + savepoints
src/tools/lcm-grep-tool.ts Adds sort param + lazy engine acquisition
src/tools/lcm-expansion-recursion-guard.ts Adds concurrency guard for expansion
src/tools/lcm-expand-tool.ts Adds lazy engine acquisition + copy tweaks
src/tools/lcm-describe-tool.ts Adds lazy engine acquisition
src/summarize.ts Uses shared token estimator, adds timeout/config fallback providers, logging improvements
src/store/rollup-store.ts Introduces rollup/state/source persistence APIs
src/store/parse-utc-timestamp.ts Fixes SQLite UTC timestamp parsing
src/store/index.ts Re-exports new stores/types
src/store/full-text-sort.ts Adds FTS ORDER BY builder incl hybrid mode
src/store/fts5-sanitize.ts Preserves quoted phrases in FTS sanitization
src/store/conversation-store.ts Uses transaction mutex, UTC parsing, adds sort support
src/store/compaction-telemetry-store.ts Adds persisted cache telemetry store
src/startup-banner-log.ts Adds new banner keys for config diagnostics
src/retrieval.ts Plumbs sort + uses shared token estimator
src/prune.ts Adds conversation pruning utility
src/plugin/shared-init.ts Adds process-global shared init for plugin register
src/plugin/lcm-doctor-shared.ts Adds doctor marker detection + stats
src/plugin/lcm-doctor-apply.ts Adds doctor repair application flow
src/lcm-log.ts Adds unified logger + error formatter
src/estimate-tokens.ts Adds Unicode-aware token estimation utilities
src/db/features.ts Adds trigram tokenizer probe
src/db/connection.ts Adds path normalization helpers + PRAGMA tuning + safer connection setup
src/db/config.ts Adds largeFilesDir, transcriptGcEnabled, fallbacks, diagnostics, more parsing helpers
src/assembler.ts Adds prompt-aware eviction + fresh-tail token cap + externalized tool result handling
specs/tool-result-externalization-and-incremental-bootstrap.md Documents transcript GC/bootstrap/externalization design
specs/lossless-claw-mvp-skill-and-commands.md Documents skill/command MVP plan
skills/lossless-claw/SKILL.md Adds bundled skill root
skills/lossless-claw/references/session-lifecycle.md Adds session lifecycle reference
skills/lossless-claw/references/recall-tools.md Adds recall tool guidance
skills/lossless-claw/references/diagnostics.md Adds diagnostics guidance
skills/lossless-claw/references/config.md Adds config reference (synced to runtime)
skills/lossless-claw/references/architecture.md Adds architecture reference
package.json Switches to built dist entry + adds build script and files list
openclaw.plugin.json Declares context-engine kind + skills + expands UI/config schema
docs/architecture.md Updates largeFilesDir docs
docs/agent-tools.md Documents full_text sorting + grep tips
README_zh.md Adds Chinese README
README.md Adds commands/skill section + config updates
Dockerfile Requires build (no longer best-effort)
AGENTS.md Adds repo principles + schema/doc sync rules
.pebbles/config.json Removed local tool config
.pebbles/.gitignore Removed local tool ignore
.local/lcm-pretyping-latency-memo.md Adds internal perf memo
.github/workflows/publish.yml Adds build step prior to publish
.changeset/wise-bears-doubt.md Adds release note for largeFilesDir/externalization
.changeset/plugin-config-schema-sync.md Removed old changeset
.changeset/new-reset-lifecycle.md Removed old changeset
.changeset/lucky-pianos-learn.md Removed old changeset
.changeset/loud-ravens-cheer.md Removed old changeset
.changeset/calm-walls-hear.md Removed old changeset
.changeset/bootstrap-context-budget.md Removed old changeset
.changeset/async-bootstrap-sync-io.md Adds release note for async bootstrap IO

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread openclaw.plugin.json
Comment on lines +74 to 79
"help": "Directory for persisting large-file text payloads (default: <stateDir>/lcm-files). Uses OPENCLAW_STATE_DIR when set."
},
"largeFilesDir": {
"label": "Large Files Directory",
"help": "Directory for externalized large files and inline-image payloads (default: ~/.openclaw/lcm-files)"
},

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate JSON keys are not valid in practice (the latter silently overwrites the former in most parsers). Remove one of the duplicated largeFilesDir entries (and similarly ensure configSchema.properties.largeFilesDir is defined only once) so UI hints and schema descriptions are deterministic.

Suggested change
"help": "Directory for persisting large-file text payloads (default: <stateDir>/lcm-files). Uses OPENCLAW_STATE_DIR when set."
},
"largeFilesDir": {
"label": "Large Files Directory",
"help": "Directory for externalized large files and inline-image payloads (default: ~/.openclaw/lcm-files)"
},
"help": "Directory for persisting externalized large-file text and inline-image payloads (default: <stateDir>/lcm-files; falls back to ~/.openclaw/lcm-files). Uses OPENCLAW_STATE_DIR when set."
},

Copilot uses AI. Check for mistakes.
Comment thread src/store/rollup-store.ts
Comment on lines +274 to +291
replaceRollupSources(rollupId: string, sources: RollupSourceInput[]): void {
void withDatabaseTransaction(this.db, "BEGIN", () => {
this.db.prepare(`DELETE FROM lcm_rollup_sources WHERE rollup_id = ?`).run(rollupId);

if (sources.length === 0) {
return;
}

const insert = this.db.prepare(
`INSERT INTO lcm_rollup_sources (rollup_id, source_type, source_id, ordinal)
VALUES (?, ?, ?, ?)`,
);

for (const source of sources) {
insert.run(rollupId, source.type, source.id, source.ordinal);
}
});
}

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withDatabaseTransaction(...) is async and returns a Promise, but this method returns void and explicitly discards the Promise. This can cause callers to observe sources not yet written and can also swallow transaction errors. Make replaceRollupSources async and await withDatabaseTransaction(...) (or use a synchronous transaction wrapper if you intend this to be sync).

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +21
export function buildFtsOrderBy(sort: SearchSort | undefined, createdAtExpr: string): string {
switch (sort ?? "recency") {
case "relevance":
return `rank ASC, ${createdAtExpr} DESC`;
case "hybrid":
return `(rank / (1 + ((julianday('now') - julianday(${createdAtExpr})) * 24 * ${AGE_DECAY_RATE}))) ASC, ${createdAtExpr} DESC`;
default:
return `${createdAtExpr} DESC`;
}
}

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hybrid formula divides rank by an age factor. If rank is non-negative (common BM25 behavior), dividing by a larger value makes older rows look more relevant (smaller), which is the opposite of an age penalty. Use a monotonic penalty that increases the score with age regardless of sign (e.g., multiply by the age factor, or add a penalty term) so hybrid consistently prefers newer items when relevance ties.

Copilot uses AI. Check for mistakes.
Comment thread src/prune.ts
Comment on lines +351 to +366
let batchCount = 0;
db.exec("BEGIN IMMEDIATE");
try {
const batch = loadPruneCandidates(db, cutoffDate, batchSize);
batchCount = batch.length;
if (batch.length === 0) {
db.exec("COMMIT");
break;
}
deleted += deleteCandidates(db, batch);
candidates.push(...batch);
db.exec("COMMIT");
} catch (error) {
db.exec("ROLLBACK");
throw error;
}

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces explicit BEGIN IMMEDIATE transaction handling without using the new per-db withDatabaseTransaction() mutex/savepoint mechanism. If pruneConversations() is ever called from an async path that may already be inside a transaction scope (or concurrently with other async transactional LCM work), it can reintroduce the nested-transaction error this PR is addressing elsewhere. Consider switching the confirm-mode delete loop to use withDatabaseTransaction(db, \"BEGIN IMMEDIATE\", ...) so pruning participates in the same serialization and nested savepoint behavior.

Copilot uses AI. Check for mistakes.
Comment thread package.json
Comment on lines 2 to +6
"name": "@martian-engineering/lossless-claw",
"version": "0.5.3",
"version": "0.8.2",
"description": "Lossless Context Management plugin for OpenClaw — DAG-based conversation summarization with incremental compaction",
"type": "module",
"main": "index.ts",
"main": "dist/index.js",

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package now ships only bundled JS (dist/index.js) but does not declare a types entry or publish .d.ts files. If any consumers import this package as a library (not just as an OpenClaw plugin extension), this becomes a breaking change for TypeScript users. Consider generating and publishing type declarations (or keeping TS sources in files) and adding a types field so downstream TS builds remain stable.

Copilot uses AI. Check for mistakes.
Comment thread package.json
Comment on lines 25 to +27
"files": [
"index.ts",
"src/**/*.ts",
"dist/",
"skills/",

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package now ships only bundled JS (dist/index.js) but does not declare a types entry or publish .d.ts files. If any consumers import this package as a library (not just as an OpenClaw plugin extension), this becomes a breaking change for TypeScript users. Consider generating and publishing type declarations (or keeping TS sources in files) and adding a types field so downstream TS builds remain stable.

Copilot uses AI. Check for mistakes.
Comment thread package.json
{
"name": "@martian-engineering/lossless-claw",
"version": "0.5.3",
"version": "0.8.2",

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description focuses on the new lcm_recent rollup layer, but the diff includes a broad set of additional features/changes (transaction mutex, doctor flows, pruning, bundled skills/commands, packaging changes, token estimation, etc.). This makes it hard to review/release-note accurately. Consider splitting the non-rollup changes into separate PRs, or updating the PR description to explicitly enumerate these additional scope items.

Copilot uses AI. Check for mistakes.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13a964c3a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +356 to +357
query: "",
mode: "full_text",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use non-empty fallback query for all-conversation recent search

In the allConversations fallback path, lcm_recent calls retrieval.grep with mode: "full_text" and an empty query. The full-text sanitizer turns empty input into "", which produces zero matches, so this branch reports no activity even when the requested time window has messages/summaries. That makes cross-conversation recency recap unreliable whenever a prebuilt rollup is absent.

Useful? React with 👍 / 👎.

Comment thread src/rollup-builder.ts
Comment on lines +89 to +91
if (state && state.pending_rebuild === 0 && !forceCurrentDay) {
result.skipped += daysBack;
return result;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove permanent skip gate when pending_rebuild is unset

This early return skips all rollup work whenever state exists with pending_rebuild = 0, but the new rollup flow only writes pending_rebuild: 0 and has no corresponding write to 1. After the first successful build creates state, later buildDailyRollups() calls (without forceCurrentDay) will keep skipping indefinitely, so daily rollups stop updating as new summaries arrive.

Useful? React with 👍 / 👎.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 11 potential issues.

Open in Devin Review

Comment thread openclaw.plugin.json
Comment on lines +76 to 79
"largeFilesDir": {
"label": "Large Files Directory",
"help": "Directory for externalized large files and inline-image payloads (default: ~/.openclaw/lcm-files)"
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Duplicate largeFilesDir key in openclaw.plugin.json uiHints silently discards the first entry

The uiHints object in openclaw.plugin.json declares largeFilesDir twice at lines 72-75 and 76-79 with conflicting help text. JSON parsers take the last value, so the first entry — which documents OPENCLAW_STATE_DIR behavior — is silently discarded. The same duplication also appears in configSchema.properties at lines 256 and 375. This violates the AGENTS.md Config Schema Sync rule: "Keep the manifest configSchema and uiHints aligned with every supported plugins.entries.lossless-claw.config field."

Suggested change
"largeFilesDir": {
"label": "Large Files Directory",
"help": "Directory for externalized large files and inline-image payloads (default: ~/.openclaw/lcm-files)"
},
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/store/rollup-store.ts
Comment on lines +274 to +291
replaceRollupSources(rollupId: string, sources: RollupSourceInput[]): void {
void withDatabaseTransaction(this.db, "BEGIN", () => {
this.db.prepare(`DELETE FROM lcm_rollup_sources WHERE rollup_id = ?`).run(rollupId);

if (sources.length === 0) {
return;
}

const insert = this.db.prepare(
`INSERT INTO lcm_rollup_sources (rollup_id, source_type, source_id, ordinal)
VALUES (?, ?, ?, ?)`,
);

for (const source of sources) {
insert.run(rollupId, source.type, source.id, source.ordinal);
}
});
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 replaceRollupSources fire-and-forgets the transaction promise, silently swallowing errors

void withDatabaseTransaction(...) at src/store/rollup-store.ts:275 discards the Promise returned by the async transaction wrapper. If any INSERT inside the callback throws (e.g., a constraint violation or SQLITE_BUSY), the error becomes an unhandled promise rejection instead of propagating to the caller. When called from buildDayRollup (src/rollup-builder.ts:215), the outer transaction continues as if the source replacement succeeded, leaving the rollup in an inconsistent state (rollup row committed but source links missing or incomplete).

Suggested change
replaceRollupSources(rollupId: string, sources: RollupSourceInput[]): void {
void withDatabaseTransaction(this.db, "BEGIN", () => {
this.db.prepare(`DELETE FROM lcm_rollup_sources WHERE rollup_id = ?`).run(rollupId);
if (sources.length === 0) {
return;
}
const insert = this.db.prepare(
`INSERT INTO lcm_rollup_sources (rollup_id, source_type, source_id, ordinal)
VALUES (?, ?, ?, ?)`,
);
for (const source of sources) {
insert.run(rollupId, source.type, source.id, source.ordinal);
}
});
}
replaceRollupSources(rollupId: string, sources: RollupSourceInput[]): void {
const callback = () => {
this.db.prepare(`DELETE FROM lcm_rollup_sources WHERE rollup_id = ?`).run(rollupId);
if (sources.length === 0) {
return;
}
const insert = this.db.prepare(
`INSERT INTO lcm_rollup_sources (rollup_id, source_type, source_id, ordinal)
VALUES (?, ?, ?, ?)`,
);
for (const source of sources) {
insert.run(rollupId, source.type, source.id, source.ordinal);
}
};
callback();
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/rollup-builder.ts
Comment on lines +507 to +513
function estimateTokens(content: string): number {
const text = content.trim();
if (!text) {
return 0;
}
return Math.max(1, Math.ceil(text.length / 4));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 rollup-builder.ts uses naive text.length / 4 token estimation instead of the PR's shared CJK-aware estimator

The local estimateTokens function at src/rollup-builder.ts:507 uses the old Math.ceil(text.length / 4) formula, while the rest of the codebase (compaction, assembler, retrieval) was migrated to the CJK-aware estimateTokens from src/estimate-tokens.ts. For CJK-heavy conversations, the rollup builder will underestimate token counts by ~6×, causing rollup content to significantly exceed the configured dailyMaxTokens budget before the trimming loop catches it, or potentially not trimming at all when the naive estimate stays under budget.

Suggested change
function estimateTokens(content: string): number {
const text = content.trim();
if (!text) {
return 0;
}
return Math.max(1, Math.ceil(text.length / 4));
}
function estimateTokens(content: string): number {
const { estimateTokens: sharedEstimate } = require("./estimate-tokens.js");
return sharedEstimate(content);
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/compaction.ts
Comment on lines +541 to +542
// Delta tracking: compute token change from pass results instead of re-querying DB
const tokensAfterLeaf = tokensBefore - leafResult.removedTokens + leafResult.addedTokens;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Delta-based compaction token tracking diverges slightly from DB truth for corrupt messages

The compaction engine now uses arithmetic delta tracking (tokensBefore - removedTokens + addedTokens) instead of re-querying getContextTokenCount() after each pass. This avoids N+1 DB queries during multi-pass sweeps. The removedTokens calculation at src/compaction.ts:1525 uses resolveMessageTokenCount which falls back to estimateTokens(content) for messages with token_count <= 0, while getContextTokenCount() would sum the stored 0. The divergence is bounded to corrupt/empty messages and is documented in the code comment at lines 1519-1523. This is not a bug — it's a known tradeoff that makes stopping decisions slightly more conservative for pathological data.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/compaction.ts
Comment on lines 1788 to +1793
condensedPassOccurred: boolean;
}): Promise<void> {
const content = `LCM compaction ${input.pass} pass (${input.level}): ${input.tokensBefore} -> ${input.tokensAfter}`;
const metadata = JSON.stringify({
conversationId: input.conversationId,
pass: input.pass,
level: input.level,
tokensBefore: input.tokensBefore,
tokensAfter: input.tokensAfter,
createdSummaryId: input.createdSummaryId,
createdSummaryIds: input.createdSummaryIds,
condensedPassOccurred: input.condensedPassOccurred,
});

const writeEvent = async (): Promise<void> => {
const seq = (await this.conversationStore.getMaxSeq(input.conversationId)) + 1;
const eventMessage = await this.conversationStore.createMessage({
conversationId: input.conversationId,
seq,
role: "system",
content,
tokenCount: estimateTokens(content),
});

const parts: CreateMessagePartInput[] = [
{
sessionId: input.sessionId,
partType: "compaction",
ordinal: 0,
textContent: content,
metadata,
},
];
await this.conversationStore.createMessageParts(eventMessage.messageId, parts);
};

try {
await this.conversationStore.withTransaction(() => writeEvent());
} catch {
// Compaction should still succeed if event persistence fails.
}
this.log.info(
`[lcm] ${content} conversation=${input.conversationId} summary=${input.createdSummaryId}`,
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Compaction event persistence replaced with log-only telemetry — intentional data-loss-safe change

The old persistCompactionEvent at src/compaction.ts:1788 wrote synthetic system messages with compaction metadata into the conversation's canonical message history. The new version replaces this with a log.info() call. This removes compaction markers from the persisted conversation record, which is consistent with the AGENTS.md 'lossless means lossless' principle — synthetic system messages about internal compaction operations were arguably not user data. The tests were updated accordingly (test/lcm-integration.test.ts:1569-1591). Not a bug, but operators who relied on querying message_parts WHERE part_type = 'compaction' for compaction audit trails will find those rows no longer created.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/compaction.ts
Comment on lines +347 to +396
* entry points. null when inactive — external callers (e.g., engine.ts
* evaluateLeafTrigger) get uncached reads.
*
* Uses a reference count so concurrent compactions on different
* conversations don't interfere: each withContextCache increments
* on entry and decrements on exit; the cache is only destroyed
* when all users have exited.
*/
private _contextItemsCache: Map<number, ContextItemRecord[]> | null = null;
private _contextItemsCacheRefCount = 0;

constructor(
private conversationStore: ConversationStore,
private summaryStore: SummaryStore,
private config: CompactionConfig,
private log: LcmLogger = NOOP_LCM_LOGGER,
) {}

/** Read context items, using per-phase cache when active. */
private async getContextItemsCached(conversationId: number): Promise<ContextItemRecord[]> {
if (this._contextItemsCache) {
if (this._contextItemsCache.has(conversationId)) {
return this._contextItemsCache.get(conversationId)!;
}
const items = await this.summaryStore.getContextItems(conversationId);
this._contextItemsCache.set(conversationId, items);
return items;
}
return this.summaryStore.getContextItems(conversationId);
}

/** Invalidate cache for a conversation after context mutation. */
private invalidateContextCache(conversationId: number): void {
this._contextItemsCache?.delete(conversationId);
}

/** Execute with context cache active. Reference-counted for concurrent use. */
private async withContextCache<T>(fn: () => Promise<T>): Promise<T> {
if (!this._contextItemsCache) this._contextItemsCache = new Map();
this._contextItemsCacheRefCount++;
try {
return await fn();
} finally {
this._contextItemsCacheRefCount--;
if (this._contextItemsCacheRefCount <= 0) {
this._contextItemsCache = null;
this._contextItemsCacheRefCount = 0;
}
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Context items cache in CompactionEngine is not concurrency-safe across conversations

The _contextItemsCache (src/compaction.ts:355) is a single Map<number, ContextItemRecord[]> shared across all conversation IDs within a cache scope. The reference counting at lines 385-396 ensures the cache lifetime is correct for nested withContextCache calls. However, if two different conversations were compacted concurrently within the same CompactionEngine instance, they would share the same cache map. The invalidateContextCache at line 380 only deletes entries by conversation ID, so cross-conversation interference is limited to stale reads if one conversation's compaction invalidates while another reads. In practice, compaction is serialized per-conversation by the transaction mutex, so this is safe. The design comment at lines 348-354 documents this intent.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/plugin/index.ts
Comment on lines +556 to 557
return map[normalized];
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: inferApiFromProvider now returns undefined for unknown providers — behavioral change

At src/plugin/index.ts:556, inferApiFromProvider was changed from returning "openai-responses" as a catch-all default to returning undefined for unknown providers. The caller at line 1459 chains this with || so when undefined is returned and no other resolution succeeds, the code at line 1460 throws a clear diagnostic error (unable to resolve API family for provider ...). This is a deliberate improvement: previously, unknown/custom providers would silently get routed through the OpenAI Responses API, which would fail with confusing errors. The new behavior surfaces the configuration gap immediately. The test at test/index-complete-provider-config.test.ts:324-361 covers this case.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +621 to +636
try {
db.exec("BEGIN IMMEDIATE");
transactionActive = true;
stageCleanerConversationIds(db, definitions);
const counts = readTempCleanerDeleteCounts(db);
deletedMessages = counts.messageCount;
if (counts.conversationCount > 0) {
deletedConversations = deleteTempCleanerCandidates(db);
}
db.exec("COMMIT");
transactionActive = false;
} catch (error) {
if (transactionActive) {
db.exec("ROLLBACK");
}
throw error;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Doctor clean apply uses raw BEGIN IMMEDIATE outside the transaction mutex

The applyDoctorCleaners function at src/plugin/lcm-doctor-cleaners.ts:622 uses raw db.exec('BEGIN IMMEDIATE') and manual COMMIT/ROLLBACK instead of the new withDatabaseTransaction mutex. If a concurrent async operation (e.g., bootstrap, compaction) is holding the transaction mutex on the same database, this raw BEGIN could conflict. In practice, doctor clean apply is an operator-initiated one-shot command that is unlikely to race with background compaction, and the SQLITE_BUSY timeout (configured at connection setup) provides a fallback. However, this is inconsistent with the pattern established in this PR.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread openclaw.plugin.json
Comment on lines +256 to +258
"largeFilesDir": {
"type": "string"
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: configSchema has largeFilesDir defined twice in properties — second definition wins in JSON

In openclaw.plugin.json, configSchema.properties declares largeFilesDir at line 256 (bare type: string) and again at line 375 (with added description). This is the same class of issue as the uiHints duplicate reported in BUG-0001. In standard JSON parsing, the second definition silently wins. The schema validation still works (both are type: string), but the duplicate should be consolidated.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@@ -0,0 +1,21 @@
export type SearchSort = "recency" | "relevance" | "hybrid";

export const AGE_DECAY_RATE = 0.001;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: FTS5 sort mode hybrid uses a hardcoded decay rate constant that may need tuning

The hybrid sort formula at src/store/full-text-sort.ts:17 uses AGE_DECAY_RATE = 0.001 (exported constant). The formula rank / (1 + (age_in_hours * 0.001)) means a result 1000 hours (~42 days) old gets its relevance score halved. This is a reasonable starting point, but the decay rate is hardcoded rather than configurable. For long-lived conversations spanning months, the decay may be too aggressive for archival recall, while for fast-moving conversations it may be too weak. Not a bug — this is a tuning constant that may need adjustment based on operator feedback.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

100yenadmin pushed a commit that referenced this pull request May 5, 2026
… (Wire.1+2)

Closes Final-review Finding #3 (HIGH): "worker orchestrator + extraction
queue + procedure mining + themes + backfill are all infrastructure-only,
unwired into the production plugin surface". This commit lands the two
most-load-bearing pieces of wiring so v4.1 retrieval works end-to-end:

## 1. Leaf-write hook → lcm_extraction_queue

src/store/summary-store.ts:insertSummary now enqueues an entity-extraction
row for every leaf written. Best-effort (try/catch — leaf-write must
succeed even if queue insert fails). MUST run BEFORE the FTS-availability
early-return so FTS-disabled installs (or in-memory test DBs) still
participate.

This was the missing link: without it, lcm_extraction_queue stayed
empty regardless of how many leaves the gateway wrote, so the entity
coreference worker would have nothing to drain in production.

NEVER inline LLM call (per the v3.1 invariant — 3-agent-convergent
finding). Just inserts a row; worker drains async.

## 2. `/lcm worker tick embedding-backfill` operator command

src/plugin/lcm-command.ts. Wraps the worker-orchestrator's
tickEmbeddingBackfill in a subcommand that:
  - Pre-flight checks: VOYAGE_API_KEY present, vec0 loaded, active
    embedding profile registered. Each failure prints a clear
    actionable error.
  - Pre-tick stats: pending count + active model name
  - Runs ONE tick (perTickLimit=200, ~7-15 min at 0.5 RPS)
  - Post-tick: embedded count, skipped, Voyage tokens, duration
  - Hint operator to re-invoke if pending > 0

This is the operator's path to actually USE v4.1 retrieval today. Without
it, lcm_semantic_recall + lcm_grep --mode hybrid would always degrade
to FTS-only (no embeddings exist).

Other tick kinds (extraction, procedure-mining, themes-consolidation)
require LLM-call injection wiring through the plugin lifecycle — flagged
in the operator-error message as cycle-2.

## What this PR now actually delivers (vs pre-Wire commits)

Pre-Wire: schema landed + agent tools registered, but vec0 stayed empty
(no backfill ever invoked) + entity coref had nothing to drain. Most of
the +21K LOC was infrastructure-only dead code.

Post-Wire:
  - Operator runs `/lcm worker tick embedding-backfill` to populate vec0
  - Existing `lcm_semantic_recall` + `lcm_grep --mode hybrid` start
    returning real results (the +52.5pp paraphrastic lift from Phase A
    spike actually applies)
  - Future leaf writes enqueue for entity coref (worker tick path
    deferred to cycle-2)

Coverage: 3 new tests in test/v41-wiring.test.ts:
  - inserting a leaf enqueues an entity-extraction row
  - condensed summaries do NOT enqueue (leaf only)
  - queue insert failure (e.g. table missing) does NOT fail leaf-write

Live-DB verified: copied Eva's lcm.db, ran migration, inserted a test
leaf via SummaryStore — queue row appears as expected.

Tests: 1322 → 1325 (+3).
Build: dist/index.js = 794.6kb (was 782.4kb; +12kb for the new
operator command).

## Still deferred to cycle-2 (now with smaller scope)

- Worker-loop autostart on plugin init (so backfill runs without
  manual /lcm worker tick)
- Auto-tick `extraction` when leaves enqueue (needs LLM-injection path)
- procedure-mining + themes-consolidation auto-ticks
- Worker_threads heartbeat isolation (v4.1.1 A9)

These are discrete commits, each ≤200 LOC, that build on the wiring
this PR adds. Operator can validate v4.1 today by running the manual
tick command.
100yenadmin pushed a commit that referenced this pull request May 5, 2026
…+ message grep cascade + over-cap accounting + purge doc (P1+P2)

Resolves all four findings from the final adversarial review.

## P1 #1 — Semantic backfill is no longer production-inert

Reviewer was right: connection.ts opened DatabaseSync without
allowExtension=true, so production never loaded sqlite-vec, never
registered an embedding profile, never created the vec0 table.
Autostart's pre-flight returned NO_OP and the entire v4.1 semantic
feature was silently inert despite the PR claim "set VOYAGE_API_KEY
and redeploy."

Fix:
- src/db/connection.ts: open with `{allowExtension: true}` so
  db.loadExtension() works
- src/operator/semantic-infra-init.ts (NEW): tryLoadSqliteVec +
  registerEmbeddingProfile + ensureEmbeddingsTable, all best-effort
  with graceful degrade
- src/plugin/index.ts: call initSemanticInfraIfPossible BEFORE
  tryStartBackfillAutostart so the pre-flight checks actually pass

Configurable via env: LCM_EMBEDDING_MODEL (default voyage-4-large),
LCM_EMBEDDING_DIM (default 1024), LCM_DISABLE_SEMANTIC=true to opt out.

## P1 #2 — Suppressed leaves no longer leak through raw message grep

Reviewer was right: runPurge set summaries.suppressed_at but never
touched messages.suppressed_at, and conversation-store.ts message
search didn't filter on it. Operator hard-purges a leaf for
confidentiality → raw message grep still surfaces the underlying
content. Privacy/correctness blocker.

Fix:
- src/store/conversation-store.ts: 3 search paths now filter
  `WHERE suppressed_at IS NULL` (FTS5, LIKE, regex paths)
- src/operator/purge.ts: runPurge soft mode now cascades to
  messages.suppressed_at via summary_messages junction table

Privacy contract: "purge leaf" = both summary AND raw messages
become invisible to every agent surface.

## P2 #3 — Immediate-purge JSDoc no longer lies

Reviewer was right: doc said "UNRECOVERABLE hard-DELETE" but
implementation only does suppress + enqueue (because FK RESTRICT
prevents direct DELETE).

Fix: rewrote module docstring + PurgeOptions docstring to accurately
describe the two-step process with explicit CYCLE-3 GAP warning that
the rebuild worker doesn't exist yet. Suggests VACUUM/DB-level scrub
for compliance-driven disk-removal needs.

## P2 #4 — Over-cap leaves now surfaced in /lcm health

Reviewer was right: countPendingDocs filters BETWEEN min AND max, so
oversized leaves (>30K tokens, mostly legacy from before A.10 cap)
were neither embedded nor reported as pending. Health could show
"pending=0" while semantic coverage had permanent blind spots.

Fix:
- src/operator/health.ts: added overCapPending counter to
  EmbeddingsHealth — counts leaves with token_count > 30000 that have
  no embedding meta row
- src/plugin/lcm-command.ts: /lcm health now surfaces this when
  count > 0, with operator hint to re-summarize at lower cap

## Test status

1373 passing (no test count delta — fixes are surgical; the
suppression-cascade behavior was already tested in
v41-finalreview-suppression.test.ts which now covers the message
path too via the existing assertions).

Build: dist/index.js = 856.4kb (was 813.0kb; +43kb for the 4 new
modules + updated rendering).

## What v4.1 actually delivers POST-this-commit

When Eva redeploys with VOYAGE_API_KEY set:
  1. Plugin boots → connection opens with allowExtension=true
  2. Migration runs (existing)
  3. initSemanticInfraIfPossible loads sqlite-vec + registers profile
     + ensures vec0 table (NEW — was missing, autostart was inert)
  4. Backfill autostart kicks in 5s later → embeds first 200 docs
  5. Extraction autostart drains entity coref queue every 60s
  6. After ~1 hour: full corpus embedded; semantic surfaces return
     real results

The v4.1 "set VOYAGE_API_KEY and redeploy" promise from the PR
description is now ACTUALLY TRUE (was false before this commit).

## Reviewer's lcm_recent verdict — separate response

Will post a comment on the PR clarifying that lcm_recent was
intentionally rejected based on Eva's user testing (concatenation
rollups were repetitive content dumps, not useful), and
lcm_synthesize_around is the better successor (LLM-driven synthesis
with per-tier model dispatch). Not addressed in this commit.
100yenadmin pushed a commit that referenced this pull request May 6, 2026
…MED + 1 LOW)

Three Opus 1M-context agents reviewed the P1-P8 commit (e182f24) at
≥95% confidence. Fixed everything HIGH/MED + a small LOW. All 1328
tests still passing.

HIGH #1 (semantic-search.ts:286): entity-only return path was missing
  the new mandatory cosineSimilarity field — would have crashed
  downstream `.toFixed(3)` calls when caller had embedded entities/themes
  and no summary candidates returned. Added cosine derivation to that
  branch.

HIGH #2 (lcm-grep-tool.ts:268): full_text mode was applying our new
  sanitizeFts5Pattern AND the existing store-layer sanitizer (in
  conversation-store / summary-store via fts5-sanitize.ts). Composition
  is actually safe (verified by tracing) but redundant; removed the
  tool-layer sanitize from full_text path. Verbatim path keeps it
  (verbatim has its own SQL path bypassing the store sanitizer).

HIGH #3 (lcm-grep-tool.ts:725-735): when FTS5 isn't available, the
  catch-block fallback to `m.content LIKE ?` was looking for the raw
  pattern in `binds` to replace — but `binds` was poisoned by
  sanitizeFts5Pattern (`v4.1` → `"v4.1"`). findIndex returned -1,
  no replacement happened, LIKE got the literal phrase-quoted form.
  All sanitized verbatim queries silently returned 0 hits on
  no-FTS5 SQLite installations. Fixed: replace at known-position
  index 0 (the FTS-MATCH bind is always pushed first).

HIGH #4 (lcm-grep-tool.ts:99): role enum included only user / assistant /
  tool / all — but messages table contains 'system' role too. system
  messages were silently unfilterable. Added 'system' to schema enum
  and to the runtime VALID_ROLES set.

MED #5 (semantic-search.ts:127): cosineSimilarity doc-comment thresholds
  said ≥0.8/0.6/0.4 but actual impl used ≥0.65/0.5/0.35. Doc fixed.

MED #6 (lcm-describe-tool.ts:241): early header signal said "N
  candidates; details below" based on raw childIds.length, but detail
  block could say "0/N (all suppressed)" if everything was suppressed —
  contradictory signals. Reworded header to "N raw candidate(s) before
  suppression filter; survivors + details below" so it doesn't lie.

MED #7 (lcm-describe-tool.ts:381): expandMessagesOffset had no upper
  bound, enabling adversarial DoS via huge OFFSET scans. Clamped at
  100k (well past any realistic 216-msg leaf).

MED #8 (lcm-search-entities-tool.ts:208): the P8 catalogStatus probe
  ran COUNT(*) on lcm_entities globally — full-table scan on
  multi-million-entity DBs. Replaced with EXISTS(SELECT 1 ... LIMIT 1)
  which short-circuits at first row.

LOW #9 (lcm-describe-tool.ts:418): when expandMessagesOffset >=
  totalMessages, status was misleadingly "ok" with 0 results. Added
  distinct "offset-past-end" status variant so callers can distinguish
  "leaf is empty" vs "you paginated past the end".

Verified end-to-end on snapshot DB:
- role: "system" no longer schema-rejected
- offset 50000 (clamped to 100k cap) returns "offset-past-end" status

Tests: 1328 passing (no regressions; existing tests cover the changed
contracts via type-checked fields).
100yenadmin pushed a commit that referenced this pull request May 6, 2026
…W closed

Ten parallel Opus 1M-context agents reviewed PR Martian-Engineering#613 partitioned by
surface (migration / voyage / synthesis / hybrid+retrieval / agent tools /
concurrency / extraction / operator / tests / docs+manifest). All
HIGH+MED findings closed below; QA runner improved alongside.

DATA-CORRUPTION / AVAILABILITY HIGH FIXES
=========================================

Synthesis (Auditor #3 #1 #2 #5):
  - INSERT → INSERT OR IGNORE on lcm_synthesis_cache so concurrent callers
    don't crash with UNIQUE collision; latch-loser re-SELECTs and either
    returns cached result or "building elsewhere" hint.
  - Reap zombie 'building' rows older than 10 min before INSERT (prevents
    process-killed-mid-dispatch availability latch).
  - Audit GC: prune 'started' audit rows >1h and 'completed'/'failed' rows
    >30 days on every synthesize_around call. Bounded growth.

Voyage (Auditor #2 #1 #2 #3 #4):
  - MAX_TOKENS_PER_EMBED_DOC: 30k → 27k (Voyage tokenizer counts ~9.5%
    higher than DB token_count; 30k × 1.095 = 32.85k > 32k Voyage cap →
    400 errors on 28-30k stored-token leaves).
  - BACKOFF_CAP_MS: 30s → 25s (so worst-case retry path 25s + 30s + 30s
    = 85s leaves 5s margin under WORKER_LOCK_TTL_MS=90s).
  - heartbeatLock now requires `expires_at > now` predicate, refusing to
    extend an already-expired lock (prevented two-workers-think-both-own
    race when our long Voyage call exceeded TTL).
  - writeBatch wraps each row in SAVEPOINT so per-row failure rolls back
    JUST that row's vec0+meta partial writes (was leaving phantom vec0
    rows when meta-side INSERT failed).

Hybrid retrieval (Auditor #4 #2 #3):
  - FTS adapter in lcm-grep-tool now over-fetches + post-filters on
    sessionKeys/summaryKinds (was silently dropping these filters,
    leaking cross-session content into hybrid results — violated v4.1
    §10 session-family scoping invariant).
  - Semantic-search time filter changed from `s.created_at` to
    `julianday(COALESCE(latest_at, created_at))` to match FTS arm. Was
    returning divergent sets for the same since/before window.

Entity coref (Auditor #7 #1 #2 #3 #4 #5):
  - Entity ID generation: Math.random() (32-bit, ~64K collision) →
    crypto.randomUUID()-derived 48-bit suffix.
  - Mention ID: 16-char prefix truncation → FNV-1a content hash. Long
    surfaces sharing the first 16 chars no longer silently collide.
  - Entity INSERT → INSERT OR IGNORE + re-SELECT winner. Prevents
    ROLLBACK + retry-forever loop when two ticks process the same
    canonical surface concurrently.
  - occurrence_count: bump ONLY when a new mention row is actually
    inserted (was double-counting on idempotent re-process).
  - Extractor 16K char silent truncation now logs a warn line with
    the dropped-chars count.

Concurrency (Auditor #6 #4):
  - extraction-autostart now calls tickExtraction (orchestrator-wrapped
    with acquireLock/releaseLock) instead of runCoreferenceTick directly.
    Prevents two gateway processes from double-processing the queue.

Migration (Auditor #1 #3):
  - widenLcmSynthesisCacheTierCheck_v413 now DELETEs orphaned
    lcm_synthesis_audit rows before DROP-ing lcm_synthesis_cache. With
    foreign_keys=OFF during migration (the standard pattern), audit
    rows would have become dangling references; now they're cleaned.

OPERATOR SURFACE (Auditor #8 BLOCKER #1)
========================================
  - /lcm purge command now wired (was dead code). Soft mode only
    (immediate cut from PR). Defaults to dry-run preview; --apply to
    actually suppress. --allow-main-session gates Eva's primary thread.
    Required: --reason "..." + at least one criterion (--session-key,
    --summary-ids, --since, --before, --min-token-count).

MED FIXES
=========
  - dispatch.ts verify_fidelity regex: `/^\s*OK\b/i` → `/(?:^|\n)\s*OK\b/i`
    so model preambles before "OK" don't false-positive a hallucination
    flag (Auditor #3 #4).
  - lcm_describe budget=0 now emits an explicit "delegated grant
    exhausted" line instead of silently showing budget=over on every
    node (Auditor #5 #3).
  - lcm_get_entity / lcm_search_entities entityType docs now list the
    actual extractor-produced types (person_name, pr_number, agent_id,
    etc.) instead of the fictitious ('person', 'project', 'pr',
    'commit', 'file') that never matched (Auditor #7 #8).

QA RUNNER IMPROVEMENTS (Auditor #9)
====================================
  - adv-empty-pattern: vacuous predicate fixed; now asserts either
    graceful error OR 0 matches.
  - Added 2 missing-tool smokes: adv-lcm-get-entity-smoke and
    adv-lcm-expand-query-smoke (8 tools now exercised, was 5 of 8).
  - Determinism: replaced `ORDER BY RANDOM()` and unsorted `LIMIT 1`
    with stable `ORDER BY summary_id ASC LIMIT 1 OFFSET ?` so re-runs
    pick the same leaves and report deltas cleanly.
  - JSON output now includes `schemaVersion: "1.0.0"`.
  - Voyage cost rate corrected: 0.00012 → 0.00018 per 1K tokens
    (under-reported by ~33%).

DOC RECONCILIATION
==================
  - PR_DESCRIPTION.md: 22/25 claim now annotated with live-harness
    refinement (14/25 high confidence + 8/25 degraded UX + 3/25 fallback).
  - HARNESS_REPORT_2026-05-06.md: prepended status banner + per-bug
    [FIXED in commit X] annotations so reviewers reading the report
    end-to-end see what's still open vs. closed.

VERIFICATION
============
  - 1328/1328 tests passing (no regressions; 2 tests updated for
    intentional behavior changes — voyage cap 30k→27k, batching test
    sizes 30k→25k to stay under new cap).
  - QA runner: smoke 8/8, adversarial 10/10, full 30/30 — all clean.
  - Total cost ~$0.11 per full QA run.

DEFERRED TO CYCLE-3 (acknowledged in PR description, not blocking merge)
=========================================================================
  - Auditor #6 #1-#3 (concurrency doc overclaims about busy_timeout +
    fallback-soak + heartbeat-on-worker-thread): in-process model means
    these guarantees aren't load-bearing today. Doc to be reconciled
    when worker-thread isolation lands in cycle-3.
  - Auditor #7 #6 idle GC for zero-mention entities: not blocking;
    occurrence_count only ever bumps up, never down.
  - P9 / P10 from harness report: low priority, no immediate workaround
    needed.
100yenadmin pushed a commit that referenced this pull request May 6, 2026
Wave-2 ran 10 Opus 1M-context agents over the post-Wave-1 commit. Key
findings + fixes:

CRITICAL CRASH BUG
==================
Wave-2 Auditor #1 finding #1 (HIGH): the synthesis cache loser-path
SELECT queried column `output` but the schema has `content`
(migration.ts:1506). EVERY concurrent ready-cache hit threw
`no such column: output`. Single-flight winner-already-ready fast-path
was completely broken.
Fix: changed SELECT to use `content`, response field renamed `text`.

DATA-CORRECTNESS HIGH
=====================
Auditor #1 #2: zombie cache janitor only reaped `'building'` rows;
`'failed'` rows would block all future synthesis of the same window
forever. Now reaps both. Added `recent_failure` response shape so
caller can distinguish from `building_elsewhere`.

Auditor #2 finding F1: parseRetryAfterMs silently clamped Voyage
server-supplied Retry-After to BACKOFF_CAP_MS (25s), so a
`Retry-After: 60` was retried at 25s — still rate-limited, wasting a
retry slot. Also tightly coupled with WORKER_LOCK_TTL_MS=90s.
Fix: honor server retry-after up to 5min cap; if it exceeds the
lock-aware budget (60s), throw rate_limit immediately so caller
releases lock and the next autostart tick retries cleanly.

Auditor #6 BUG-2 + BUG-3 (HIGH): /lcm purge dry-run preview used its
own SQL with `datetime(created_at)` while runPurge used raw
`created_at >= ?`. Edge cases (timezones, microseconds) gave
divergent counts; --summary-ids dry-run returned input length
without filtering for actually-existing leaves. Also the empty-
criteria dry-run scared operators with whole-DB count.
Fix: extracted `previewPurgeAffected(db, opts)` from purge.ts and
wired the dry-run to use it. Added validation parity, --allow-main-
session warning, race-window note in output.

Auditor #7 finding A1 (HIGH): time-filter inconsistency across tools
— summary FTS + semantic used `julianday(COALESCE(latest_at,
created_at))` (post Wave-1) but synthesize-around still used
`datetime(created_at)` and verbatim grep used `datetime(m.created_at)`.
Cross-tool: same `since`/`before` window returned different result
sets depending on which tool the agent picked.
Fix: synthesize-around now uses `julianday(COALESCE(latest_at,
created_at))`. Verbatim grep (messages — no latest_at) now uses
`julianday(m.created_at)` for syntactic parity.

TEST COVERAGE GAP
=================
Auditor #8 finding F1: zero test coverage for the Wave-1 migration
DELETE-before-DROP fix.
Fix: added 3 new tests in v41-synthesis-tables.test.ts:
  - DELETE prunes only orphan-pointing rows, preserves
    target_summary_id-pointing rows
  - re-running runLcmMigrations on already-widened DB is a no-op
  - schema includes wide CHECK including 'monthly' on first migration

Auditor #8 finding F2: bare catch in migration too broad — could
swallow corrupted-DB errors. Now narrowed to expected
"no such table.*lcm_synthesis_audit" pattern; re-throws otherwise.

QA RUNNER IMPROVEMENTS
======================
Auditor #9 HIGH-2: OFFSET overflow returned `undefined` row, target
became `undefined`, predicate accepted any error → tests passed on
empty corpus.
Fix: fall back to OFFSET 0 (first leaf) if requested offset exceeds
row count. Sentinel `__NO_LEAVES_IN_CORPUS__` when even that fails.

Auditor #9 HIGH-3: B/C predicates only checked for `r.error` →
0-hit returns silently passed.
Fix: added `Array.isArray(r.details?.hits)` assertion + per-hit
shape validation (content, role for verbatim).

DOC RECONCILIATION
==================
Auditor #10 F1: HARNESS_REPORT internally inconsistent (banner said
"30/30 pass" but verdict body still showed 14/8/3). Reconciled:
explicit "two numbers reflect two rubrics" explanation.

Auditor #10 F2: THE_FIVE_QUESTIONS.md still said "22/25 PRIMARY
coverage" without live-harness annotation. Added post-fix verification
note pointing to QA runner + HARNESS_REPORT.

Auditor #10 F3: PR_DESCRIPTION listed "5 operator commands" but the
plugin exposes 9 (status, health, worker, reconcile-session-keys,
eval, purge, backup, rotate, doctor + help). Fixed to 9 with
descriptions.

CROSS-TOOL NAMING PARITY
=========================
Auditor #7 A2 (MED): synthesize-around emits `voyage_tokens_consumed`
(snake_case) while semantic-recall emits `voyageTokensConsumed`
(camelCase). The tool's output uses snake_case throughout for
internal consistency, so we added `voyageTokensConsumed` as a
camelCase alias alongside the original.

VERIFICATION
============
- 1331/1331 tests passing (1328 baseline + 3 new migration tests)
- QA runner full suite: 30/30 pass
- QA runner adversarial suite: 10/10 pass
- Total cost: ~$0.11 per full QA run

DEFERRED (acknowledged, not blocking merge)
============================================
- Auditor #2 F3 (heartbeat between batches, not mid-batch): the
  SAVEPOINT-per-row + heartbeatLock-with-expires_at-predicate
  combination already detects lock theft cleanly; mid-batch
  heartbeat is a cycle-3 hardening item.
- Auditor #6 #11 (operator permission gate on /lcm purge): the
  command runs without an explicit auth gate at the plugin
  registration site. Gate is delegated to the OpenClaw plugin
  contract layer (per the existing convention with reconcile-
  session-keys, doctor clean apply, etc.). If/when OpenClaw exposes
  isOperatorSession() to plugins, all destructive subcommands will
  consume it together.
- Auditor #1 #4 (verify_fidelity regex still has edge case where
  "OK" appears mid-line in negative context): improvement over Wave-1;
  full negative-context detection requires a more sophisticated parser.
- Auditor #1 #5 (audit GC scans full table per call): cost is
  ~1ms; future move to scheduled background sweep.
- Auditor #3 F2/F3 (entity coref single-flight contract): improvements
  documented; in-process inFlight + DB-row-level lock combination is
  sufficient for current single-process deployments.
- Auditor #9 HIGH-1 (QA-runner durationMs varies across runs): timing
  fields are inherently non-deterministic; row selection IS now stable
  which is the actual reproducibility property.
100yenadmin pushed a commit that referenced this pull request May 6, 2026
Wave-3 ran 10 Opus 1M-context agents on the post-Wave-2 commit. Three
agents (#3, #8, #9) couldn't see the post-Wave-2 tree — they looked at
stale checkouts and produced no usable findings. The remaining seven
surfaced 11 real issues.

DATA-CORRECTNESS HIGH
=====================
Auditor #1 H1: `recent_failure` response (Wave-2 addition) didn't include
`failure_reason` even though we stored it on the row — caller saw a
generic hint instead of the actual cause one column away.
Fix: SELECT `failure_reason` from the loser-path query and surface it
in the response. Truncate to 200 chars in the hint.

Auditor #1 H2: 10-min `failed`-row TTL caused hammering during long
Voyage outages — every 10 min, every distinct (session, range, fp)
tuple would re-attempt LLM, fail, mark failed, repeat. With many
windows this cascaded into a steady DDoS against the LLM provider.
Fix: exponential backoff per cache row — `TTL_MIN * 2^audit_attempts`,
capped at 6h. Audit row count gives us attempt history per cache_id.

Auditor #1 H3: `building_elsewhere` had no max-retries hint — if the
winner process died between INSERT and the next zombie sweep, every
concurrent caller would loop indefinitely.
Fix: compute `retry_after_ms = max(0, building_started_at + 10min - now)`
so callers can sleep precisely once instead of polling.

Auditor #1 M1: audit GC's 30-day branch had no index — full-table scan
on every `synthesize_around` call.
Fix: added partial index `lcm_synthesis_audit_completed_gc_idx` on
`(ran_at) WHERE status IN ('completed', 'failed')` so both GC branches
are O(log n).

Auditor #1 M2: janitor DELETE + INSERT OR IGNORE were not atomic —
cross-process callers could sneak in between, causing benign latch
loss + unexpected `building_elsewhere` responses.
Fix: wrapped both in `BEGIN IMMEDIATE` ... `COMMIT` so the operation
is serialized at the SQLite write-lock level.

Auditor #4 #3 (HIGH): `lcm_grep mode='semantic'` details.hits[] was
missing `conversationId` (broke parity with hybrid + verbatim modes)
and missing `cosineSimilarity` + `confidenceBand` (broke parity with
`lcm_semantic_recall`). Cross-tool agents JSON-parsing the response
shape would hit drift.
Fix: details.hits now mirrors `lcm_semantic_recall` exactly:
{summaryId, conversationId, sessionKey, kind, distance, cosineSimilarity,
tokenCount, createdAt}. Tool now also emits `confidenceBand` at the
top level + warns on low/noise just like semantic-recall.

DOC FIXES
=========
Auditor #6 #2/#3: README.md was stale — listed only 3 v3-era tools
(`lcm_grep`, `lcm_describe`, `lcm_expand`) and 5 of the 9 commands.
Fix: rewrote the tool list (8 tools with one-liners) and command
section (9 subcommands with full flags).

TEST COVERAGE FILLS (Auditor #7 top-3 priority gaps)
=====================================================
Added 8 new tests (1331 → 1339):

1. `operator-purge.test.ts` previewPurgeAffected parity (4 tests):
   - Range purge: preview count == affectedLeafIds.length
   - --summary-ids: filters out non-leaf, already-suppressed, nonexistent
   - since/before time filter: preview matches apply
   - Empty match: preview returns 0 cleanly

2. `voyage-client.test.ts` lock-budget retry behavior (2 tests):
   - Retry-After > 60s threshold: throws immediately, does NOT sleep,
     elapsed time < 2s (proven by wall-clock measurement)
   - Retry-After ≤ 60s: server-supplied value honored, retries as expected

3. `lcm-synthesize-around-tool.test.ts` schema column-name regression
   (2 tests):
   - Schema has `content` (not `output`); all 6 columns the loser-path
     SELECT references exist
   - Literal SELECT used by loser-path executes without error against
     the real schema (proves the Wave-2 crash bug can't regress)

VERIFICATION
============
- 1339/1339 tests passing
- QA runner full suite: 30/30
- QA runner adversarial: 10/10
- Total cost ~$0.11 per full QA run

DEFERRED (acknowledged, not blocking)
======================================
- Auditor #1 L1 (test exercises only the SQL DELETE not the full
  migration step): the DELETE-in-isolation is sufficient for what
  changed; the migration step itself has its own coverage in
  `v41-pre-existing-schema-migration.test.ts`.
- Auditor #2 F2/F3 (60s lock-budget threshold has zero margin under
  worst-case scenarios): the Wave-1 heartbeat-with-expires_at predicate
  detects lock theft cleanly even if budget is exhausted; tightening
  the threshold further is a future hardening item.
- Auditor #4 confirmed-clean items (suppression filter parity, error
  envelope shape, conversation-scope error message) — no further
  work needed.
- Auditor #5 (E2E smoke): documented real UX gaps in
  `lcm_synthesize_around` discoverability (target= vs query=, window_kind
  required) — would require schema-description rewrites; queued for
  cycle-3 ergonomics pass.

Audit cycle stats:
- Wave-1: 17 HIGH + 9 MED + 1 LOW closed across 1 commit
- Wave-2: 19 findings (4 HIGH + 4 MED + 1 LOW + others) closed
- Wave-3: 11 findings closed (this commit)
- Total: 36+11 = 47 findings closed across 3 commits
- 1339 tests passing
100yenadmin pushed a commit that referenced this pull request May 6, 2026
…4 P2 closed

Wave-5 ran 3 parallel Opus agents focused on the Wave-4 commit
(`cd76389`) to verify those fixes didn't introduce new bugs. Surfaced
1 P0-classified pre-existing classification ambiguity (reclassified P3
on inspection — not a Wave-4 regression), 4 real P1s introduced by
Wave-4 changes, and several P2s.

P1 — REGRESSIONS INTRODUCED BY WAVE-4 (4 closed)
================================================

Wave-5 #1 — expandRecursive `visited` set broke DAG re-entry semantics.
The Wave-4 cycle-guard correctly prevented infinite loops but ALSO
prevented legitimate cross-path expansion: if A→B and C→B (B reachable
from two distinct ancestors), B's subtree was explored only once
because `visited.has(B) === true` on the second path. This is a
correctness regression dressed as a safety fix — the pre-Wave-4 code
allowed duplicate emissions but explored both paths.
Fix: replaced `visited` (all-time) with `stackAncestors` (in-flight
DFS path only). `add` on entry, `delete` on return via `try/finally`.
Cycles are still blocked (a node can't be its own ancestor) but
distinct ancestor paths each explore the shared descendant.

Wave-5 #2 — recordEmbedding SAVEPOINT names used Math.random 24-bit
suffix (~1/4096 collision under concurrent outer-tx callers). SQLite
SAVEPOINTs aren't nestable with the same name; collision could cause
inner ROLLBACK TO to unwind the wrong scope.
Fix: switched to crypto.randomUUID-derived 12-hex-char (48-bit)
suffix. Collision-free for any realistic concurrency.

Wave-5 #3 — dead-letter UPDATE failure in entity-coreference was
silent: if the attempts-bump UPDATE itself failed (DB locked, schema
race) the catch swallowed it and the row retried forever (defeating
the very dead-letter mechanism Wave-4 added).
Fix: failure now surfaces in itemDetail.error as
"original | dead-letter-update-failed: ..." so operators see the
mechanism is broken rather than silently looping. Loop continues so
other items are still processable.

Wave-5 #4 — synthesis health single-query SUM(CASE...) couldn't use
any of the 4 partial indexes on lcm_synthesis_audit. On a large audit
table (the very condition this surfaces), /lcm health became O(n).
The fix description claimed observability for "millions of stale rows"
but ironically degraded health latency precisely under that condition.
Fix: split into 4 separate queries — total + 7-day-recent (PK scans;
bounded) + stale-started (uses lcm_synthesis_audit_started_gc_idx) +
stale-done (uses lcm_synthesis_audit_completed_gc_idx). Each query is
O(log n) on the indexed branches.

P2 — DEFENSIVE CLAMPS + CAPS (4 closed)
========================================

Wave-5 #5 — bestOfN silent clamp. Caller passing bestOfN=10 saw the
result with bestOfN.n=5 (Wave-4 cap) but no signal it was clamped.
Fix: added requested + capped fields to bestOfN result so callers can
see the clamp + audit cost decisions.

Wave-5 #6 — perQueryTimeoutMs ≤0 / NaN resolved immediately, zeroing
out every query's recall with no error. opts.perQueryTimeoutMs ?? 30s
allowed 0 / negative through.
Fix: clamp to [100ms, 5min]; values outside the band get default 30s.

Wave-5 #7 — citedIds IN-list unbounded for SQL validation. If LLM
emitted thousands of fabricated IDs, the placeholder query would blow
SQLITE_MAX_VARIABLE_NUMBER (default 32766) and the catch would fall
back to UNVALIDATED set — defeating the validation Wave-4 added.
Fix: cap at first 1000 IDs before the IN query (well above realistic
citation count, well under SQLite cap). Excess IDs are still
reported in citedIdsRejectedAsFabricated count.

Wave-5 #8 — doctor "old" classifier dead code. Pre-Wave-4 fallback
was emitted as a SUFFIX (truncated content + marker), so
content.startsWith(FALLBACK_SUMMARY_MARKER) was always false on
legitimate legacy data. The "old" branch was effectively unreachable
for real DBs. NOT a Wave-4 regression — it's a pre-existing
classifier ambiguity. Documented the intent: legacy data flows
through the trailing-suffix `fallbackIndex` branch and is classified
"fallback" (correct semantics; same repair path).

VERIFICATION
============
- 1345/1345 tests passing
- QA runner full: 30/30 pass
- QA runner adversarial: 10/10 pass

DEFERRED FROM WAVE-5
=====================
- A2 P1-D: forceReleaseLock empty-string falsy-check defensive — minor
- A2 P1-G: pickModel forceModel semantic change — by design (Wave-4
  intent was "force" actually forces); any caller relying on no-op
  with forceModel=true and modelOverride=undefined will see tier
  default now. No production callers do this per code search.
- A3 P1-A: citedIdsRejectedAsFabricated not in docs — added to type
  with JSDoc; PR description / agent-tools.md update deferred to
  next doc pass
- A3 P1-B: hits[] shape STILL drifts across grep modes — mode-specific
  signals (rerank score, semanticDistance, FTS rank) are intentionally
  per-mode; `confidenceBand` + `cosineSimilarity` parity is what
  matters cross-mode and is now uniform
- A3 P1-C: doctor pre-filter false-positive on benign content
  containing marker text — detectDoctorMarker per-row classifier is
  the gate; pre-filter false positive is just extra work, not wrong
  classification
100yenadmin pushed a commit that referenced this pull request May 6, 2026
…0 + 9 P1 closed + 4 new regression tests

Wave-9 was the first audit cycle to give every agent FULL FILE context
(not just diffs) plus cross-cutting checklists tailored to their slice,
plus all prior wave findings as known-closed reference. Eva's directive:
"agents need ENOUGH CONTEXT not to introduce new issues while fixing
minor ones." Wave-9 also added a TS-strict closure pass (separate
commit 11f10a6) that brought PR-introduced TS errors from 30 → 0.

11 agents (slicing by responsibility, ~14.7k LOC src + 12.5k LOC tests
+ 2.2k LOC scripts):
  #1 Lossless core      — engine, assembler, retrieval, summarize, compaction
  #2 Migration + schema — db/migration, all migration tests
  #3 Storage layer      — summary-store, conversation-store
  #4 Search tools       — lcm_grep, lcm_semantic_recall, hybrid, semantic
  #5 Drilldown tools    — lcm_describe, lcm_expand, lcm_expand_query
  #6 Entity + extraction — lcm_get_entity, lcm_search_entities, coreference
  #7 Synthesis          — synthesize_around, dispatch, prompt-registry, seed
  #8 Voyage stack       — voyage/client, embeddings/store/backfill/semantic
  #9 Worker + concurrency — concurrency/*, autostarts, worker-orchestrator
  #10 Operator surface  — purge, health, reconcile, eval-runner, plugin
  #11 Scripts/QA-runner — coverage-gap audit Eva caught after launch

Findings: 1 P0 + 13 P1 + 22 P2 + 42 P3 = ~77 unique
(Agent #2 P2 and Agent #7 P1 converged on same `{{date_range}}` bug.)

This commit closes the P0 + 9 of 13 P1s + adds 4 regression tests.
Remaining P1s + all P2/P3 are documented in PR comment for follow-up.

P0 (CLOSED) — Owner gate parity (Agent #10):
- /lcm reconcile-session-keys --apply lacked senderIsOwner (Wave-7
  P0-1 had only added it to /lcm purge). Cross-session data theft
  vector: non-owner agent could re-key Eva's primary thread into an
  attacker bucket via --allow-main-session.
- /lcm worker tick embedding-backfill same gap (lower-impact:
  DoS-by-billing on the operator's Voyage account).
- Both fixed: same gate pattern as case "purge" applied to both.
- 3 new regression tests pin the gate behavior so future refactors
  can't silently regress.

P1 fixes (9 of 13):

P1.1 (Agent #5) — Citation-fabrication count threaded through
ExpandQueryReply. Wave-4+W6+W8 chain validated citedIds internally
(rejected fabricated IDs against summaries table) but
buildExpandQueryReply silently dropped the counts. Agent now sees
citedIdsRejectedAsFabricated + citedIdsExceededValidationCap in the
JSON reply (omitted when zero, summed across buckets in multi-conv
path).

P1.2 (Agent #5) — lcm_describe expandChildren/expandMessages now
consumes the grant token budget. Previously the budget was CHECKED
(budgetExhausted detection) but never DECREMENTED. With 50 children
+ 50 messages × ~2K tokens each = ~100K tokens delivered per call
without grant cap touching. Now sums consumed tokens and calls
authManager.consumeTokenBudget() for sub-agent sessions. Closes the
unbudgeted side-channel that defeated the W4/W6 expansion budget.

P1.3 (Agent #4) — lcm_grep --mode semantic VoyageError contract
parity. Previously caught only `auth` and SemanticSearchUnavailable;
let rate_limit/server_error/network/bad_request/unexpected propagate
as unhandled tool errors. lcm_semantic_recall correctly catches all
VoyageError kinds. Now mirrored — both surfaces routed for Question B
have identical error contract.

P1.4 (Agent #4) — lcm_grep --mode verbatim CJK fallback. messages_fts
uses tokenize='porter unicode61' which can't segment CJK ideographs
— MATCH on 中文 returned 0 rows WITHOUT throwing, so the
exception-driven LIKE fallback never fired. Now containsCjk(pattern)
detected at JS layer, routes directly to LIKE substring match
(skipping FTS join entirely). 1 new regression test covers Chinese
characters.

P1.5 (Agent #10) — reconcileSessionKeys TOCTOU race. affectedConvs
snapshot was taken OUTSIDE BEGIN IMMEDIATE; concurrent INSERT/UPDATE
between snapshot and tx-acquire could be UPDATE-moved without an
audit row, silently dropping it → loss-of-undo on a destructive op.
Same pattern as Wave-8 P1's runSoftPurgeAtomic fix. Refactored:
active-conflict pre-check + affectedConvs SELECT + UPDATEs all run
inside the same BEGIN IMMEDIATE.

P1.6 (Agent #10) — runRecallEval setTimeout leak. Promise.race
spawned a timer that was never cleared on adapter resolve. N=100
queries × 30s = 30s tail-latency floor + event-loop liveness held
open (process never exits in scripts). Added try/finally with
clearTimeout.

P1.8 (Agent #1) — Compaction fallback marker regression. Wave-4 P0
fix in summarize.ts tagged fallback content with "[LCM fallback
summary - model unavailable]" — but because the marker adds ~25
tokens, the resulting summary is LARGER than the source, so
summarizeWithEscalation rejected it as "didn't compress" and fell
through to compaction.ts's OWN buildDeterministicFallback which
emitted raw truncated content with NO marker, silently undoing the
W4 fix for any source <= max(targetTokens*4, 256) chars (i.e. most
leaves under LLM outage). Fix: prepend the same marker in
compaction.ts's fallback. Empty-source path tagged for parity.

P1.9 (Agent #2 + #7 convergence) — {{date_range}} placeholder
orphaned in seed prompts vs renderer. dispatch.renderPrompt only
substituted source_text/tier/memory_type. Seeded daily/weekly/
monthly templates used {{date_range}} literally; SynthesizeRequest
had no dateRange field. Currently latent (synthesize_around clamps
to custom/filtered) but becomes P0 the moment a daily/weekly/monthly
synthesis worker wires up. Same class as Final.review.3 Loop 4 Bug
4.2. Fix: dropped {{date_range}} from seeded templates (use
"from a single day/week/month" phrasing instead). Caller can bake
explicit ranges into sourceText if needed.

P1.10-P1.13 (Agent #11) — QA harness coverage gaps:

P1.10 — process.chdir("/tmp/lossless-claw-upstream") hardcoded made
the QA harness unrunnable anywhere except that exact path. Replaced
with a sentinel-file existence check that errors fast with a clear
"run from repo root" message.

P1.11 — adv-lcm-expand-query-smoke was vacuous: predicate returned
null unconditionally, args omitted required `prompt` field. Now
exercises full dispatch path with real prompt + asserts response
shape (answer + citedIds, or graceful LLM-unavailable error).

P1.12 — Period mode (lcm_recent replacement, most reviewer-debated
capability) had ZERO harness coverage. Added 2 new test cases:
period='yesterday' and period='last-7d' (covers the W7-tightened
hyphenated parser).

P1.13 — lcm_grep regex/full_text modes had ZERO harness coverage
(2 of 5 documented modes). Added 2 new test cases asserting the
regex/full_text response shape (totalMatches/messageCount/
summaryCount, not details.hits which is hybrid-only).

Verifications:
- npx tsc --noEmit → 739 errors (exactly matches origin/main baseline;
  ZERO PR-introduced TS errors)
- npx vitest run → 1353/1353 passing (1349 baseline + 3 owner-gate
  + 1 CJK regression tests)
- All Wave-9 fixes verified at code level on real file paths

Deferred P1s (4 of 13) — handled in follow-up commits / cycle-3:

- P1.7: TOCTOU between affectedConvs and active-conflict pre-check
  is now closed (folded into P1.5 fix above).
- Agent #5 P2 multi-bucket DEFAULT_MAX_CONVERSATION_BUCKETS=3 silent
  drop is documented but deferred (ergonomic, not safety).
- Agent #4 cosineSimilarity not clamped in hybrid mode: trivial 2-line
  fix but not safety.
- Agent #5 dead `runDelegatedExpansionLoop` in lcm_expand: cleanup
  task, no behavior change.

Pattern observation: Wave-9's full-file-context approach paid off —
caught the same class of bug (missing owner gate) on the SISTER case
of a previously-fixed P0, which a narrow-diff audit could not have
spotted. Future audits should keep this approach.
100yenadmin pushed a commit that referenced this pull request May 6, 2026
… 4 sub-agent test layers + 8 source bugs closed

A separate reviewer raised 12 findings on PR Martian-Engineering#613 with the strategic bar
"don't just make the findings disappear; make the PR truthful under real
operator scenarios." User correctly noted "wasn't sure if verified" so
I verified each before fixing. Verification result: 12-for-12 real bugs.

Combined with 4 parallel test-quality sub-agents addressing antipatterns
A8 (concurrency) + A9 (schema drift) + A1/A4 (adversarial scenarios +
fixture-test circularity) + A4-at-scale (stress fixture).

# Reviewer findings (all 12 closed)

## P1 (5)

- **#1 Period synthesis timezone** (src/tools/lcm-synthesize-around-tool.ts):
  parsePeriodShortcut anchored "today/yesterday/this-week/last-week/
  this-month/last-month" at UTC midnight. A Bangkok operator (UTC+7) at
  02:00 local asking "yesterday" got UTC-yesterday — ~17 hours off.
  Operator-trust violation. Now uses Intl.DateTimeFormat to compute
  local-day boundaries in lcm.timezone (configured IANA TZ); samples
  the offset at local noon to avoid DST-fold ambiguity. Relative forms
  (last-Nh, last-Nd) stay UTC-anchored (now-minus-N, not day-anchored).

- **#2 Synthesis cache key** (src/db/migration.ts +
  src/tools/lcm-synthesize-around-tool.ts): UNIQUE index keyed only on
  (session_key, range_start, range_end, leaf_fingerprint, grep_filter).
  Two correctness bugs: (a) tier='custom' then tier='filtered' for same
  range/leaves silently returned wrong-tier cached text, (b) registerPrompt
  changing the active prompt left cache serving stale text from the old
  prompt. Now includes tier_label + prompt_id in both the UNIQUE index
  and the lookup SELECT. Cache is rebuildable so wiping under the new key
  is safe.

- **#4 /lcm eval owner gate** (src/plugin/lcm-command.ts): /lcm eval
  mutates lcm_eval_run + lcm_eval_query_result tables AND can use Voyage
  in hybrid mode (small but non-zero quota cost). Wave-9 Agent #10 had
  classified it as READ_ONLY — the reviewer correctly challenged that
  classification. Now gated on senderIsOwner and added to the
  authorization-invariant test's DESTRUCTIVE_OPERATOR_CASES list.

- **#5 Voyage rerank token budget** (src/embeddings/hybrid-search.ts):
  rerank sent ALL candidates' full content with no enforcement of the
  ~600K-token cap. Realistic queries with many large condensed summaries
  hit Voyage 400 → silent RRF degradation, losing the +52.5pp paraphrastic
  recall lift. Now packs candidates into rerank input cumulatively until
  85% of MAX_TOKENS_PER_RERANK_CALL, dropping tail when over budget.
  Surfaces rerankPackTruncated + rerankPackedCount in HybridSearchResult.

- **#6 lcm_describe base content not charged**
  (src/tools/lcm-describe-tool.ts): Wave-9 P1.2 fix added
  consumeTokenBudget for expandedChildren + expandedMessages but skipped
  the base summary's s.content (which lines.push()es ALL of it). A
  sub-agent could lcm_describe a 30K-token condensed summary with NO
  expansion flags and drain context for free. Now charges base s.tokenCount
  too.

## P2 (5)

- **#3 Suppressed entity leakage** (src/tools/lcm-get-entity-tool.ts +
  src/tools/lcm-search-entities-tool.ts): when ALL mentions of an entity
  were suppressed via /lcm purge, the entity row in lcm_entities still
  leaked canonical_text + alternate_surfaces + metadata via both tools.
  The reviewer's framing: "suppression means invisible to agents, period."
  Both tools now require at least one unsuppressed mention via EXISTS
  guard. The "not found" branch now covers both "no such entity" AND
  "all mentions suppressed" indistinguishably (so an attacker can't
  infer entity existence). Updated test fixtures' insertEntity helpers
  to auto-create a default visible mention; tests that explicitly want
  the all-suppressed case opt out via noDefaultMention: true.

- **#7 Pending-extractions count** (src/extraction/entity-coreference.ts):
  countPendingExtractions filtered only on (kind, completed_at IS NULL),
  but runCoreferenceTick's selector ALSO requires (attempts < 5,
  summaries.suppressed_at IS NULL). Mismatch caused autostart to spin
  forever on rows the tick would never select. Predicate now exactly
  matches the selector.

- **#8 QA runner period coverage + exit semantics** (scripts/v41-qa-runner.mjs):
  period test cases I added in Wave-9 P1.12 omitted window_kind="period"
  (required by the tool), so they only hit schema-validation early-return
  and the regex match on 'period' made them trivially pass. Added the
  required field. Plus failedImportant had no exit branch — runner exited
  0 on any "important" failure, advisory-only. Added exit code 1 for
  important failures so the runner can act as a release gate.

- **#9 sqlite-vec install honesty** (package.json + semantic-infra-init.ts):
  sqlite-vec wasn't in any dependencies block, init log was log.info
  (low visibility), and PR_DESCRIPTION emphasized VOYAGE_API_KEY alone.
  Added to optionalDependencies; bumped log to log.warn with explicit
  install instructions + clear "what becomes unavailable" message.

- **#10 Backfill complete message lies** (src/plugin/lcm-command.ts):
  countBackfillPending excludes leaves with token_count >
  MAX_TOKENS_PER_EMBED_DOC, so an over-cap leaf was neither pending nor
  backfilled. Worker-tick output printed "✅ Backfill complete" even when
  over-cap leaves remained unembedded. Added countOverCapPendingForBackfill
  helper; completion message now distinguishes "in-range complete +
  over-cap remain" from full coverage.

## P3 (2)

- **#11 lcm_synthesize_around description** (src/tools/lcm-synthesize-around-tool.ts):
  agent-tool description still said "Two modes" (time + semantic) while
  schema declared three. Rewrote description + JSDoc to mention all three
  (period, time, semantic) and explicitly call out 'period' as the
  lcm_recent replacement / "what did we work on yesterday" surface.

- **#12 NUL byte in source** (src/tools/lcm-synthesize-around-tool.ts:331):
  fingerprintLeaves used a literal NUL byte (\x00) as a hashing separator,
  making the file binary to grep. Replaced with the escape sequence "\0"
  (functionally identical at runtime, readable in source). File is now
  searchable.

# Sub-agent test layers (4 in parallel)

## Sub-agent #1 — Concurrency / TOCTOU (test/v41-concurrency-invariants.test.ts, ~1044 LOC, 8 tests)

Worker-thread-based parallel-writer harness reproduces and pins
race-condition fixes: reconcileSessionKeys race (Wave-9 P1.5),
runSoftPurgeAtomic race (Wave-8 P1), worker-lock acquire (5-way),
heartbeat-during-LLM-call (Wave-9 Agent #8 P2), recordEmbedding
DELETE-before-INSERT atomicity. Verified regression-detection by
simulating pre-fix code. 0 new bugs found.

## Sub-agent #2 — Schema/placeholder drift (test/v41-schema-drift-invariants.test.ts, ~654 LOC, 19 tests)

Static-analysis tests via readFileSync + regex. Catches: placeholder
drift in seeded prompts vs renderer (Wave-9 P1.9 class), tier_label
CHECK constraint coverage vs TS union (Final.review.3 Bug 4.4 class),
manifest-vs-registered-tool drift (Wave-9 vapor-tools class),
parser/handler symmetry, FK ON-DELETE explicitness. **Found 3 P3 FK
drift bugs** — 3 declarations missing explicit ON DELETE clauses.
Closed in this commit (lcm_synthesis_cache.prompt_id,
lcm_synthesis_audit.prompt_id, lcm_embedding_meta.embedding_model →
all now `ON DELETE RESTRICT`).

## Sub-agent #3 — Adversarial scenarios + fixture-test circularity audit (test/v41-adversarial-scenarios.test.ts, ~1149 LOC, 37 tests)

Audit of original 25 scenarios: 16/26 strong, 9/26 weak ("only
totalMatches > 0"), 1 sentinel. Strengthened 6 weak tests in
v41-five-questions.test.ts (B1-B5, E2) to assert specific summary
IDs. **Found 1 real fixture bug**: summaries_fts insert used `rowid`
but schema declares `(summary_id UNINDEXED, content)` — original
B1-B5 tests "passed" only because they matched at the messages
layer, never actually exercising summary FTS. Fixed in fixture; the
strengthened B1-B5 tests now actually exercise summary FTS. 37 hard
adversarial scenarios spanning paraphrase, ambiguity/ranking, compound
queries, negative queries, content injection (placeholder/XML/script/
SQL-injection), ranking sensitivity, cross-tool composition,
suppression boundary.

## Sub-agent #4 — Stress fixture (test/fixtures/v41-stress-corpus.ts + test/v41-stress-fixture.test.ts, ~898 LOC, 11 tests)

Deterministic generator for 1500-2500 leaves with realistic distribution
(30% last-7-days, dense days with 100+ leaves, 5-10% suppressed, 5% CJK,
near-duplicates, 5 adversarial-content leaves). 11 stress tests cover
build smoke, determinism, distribution, dense-day query, suppression
cascade, FTS5 perf, vec0 KNN (graceful no-op when vec0 unavailable),
adversarial-content non-breaking, near-duplicate handling, recency floor.

# Wave-10 reviewer regression coverage (test/v41-wave10-reviewer-regressions.test.ts, 6 tests)

Pins fixes for #2 (cache UNIQUE index w/ tier+prompt), #3 (suppressed
entity invisibility), #7 (pending count predicate), #10 (over-cap
counting). #1 has its own dedicated v41-period-timezone.test.ts (8
tests). #4 covered by extending v41-authorization-invariants.test.ts
DESTRUCTIVE_OPERATOR_CASES.

# Verification

- **1490/1490 tests passing** (1401 pre-Wave-10 + 89 new from this commit)
- **677 TS errors** (FEWER than the 739 main baseline — type-tightening
  fixes cascaded from the source changes)
- 4 sub-agent test files all green
- 6 reviewer-regression tests all green
- Authorization invariant test now covers `eval` → catches future
  removal of the gate

# What's NOT in this commit (future work)

- Mutation testing CI integration (stryker is too slow for per-PR;
  config exists for ad-hoc invocation)
- Wave-1-9 antipattern tabulation update with Wave-10 findings
100yenadmin pushed a commit that referenced this pull request May 6, 2026
…ed 12/12 real)

Fresh re-audit at 37e2b71 found 12 issues; 11 closed in this commit, 1
documented as known limitation. Reviewer was 12-for-12 real (Wave-10
was also 12-for-12; reviewer track record: 24-for-24).

# CI blockers

- **#1 (P1)** Auth invariant test hardcoded `/tmp/lossless-claw-upstream`
  path. CI failed because that path doesn't exist on GitHub runners;
  local runs accidentally succeeded by reading whatever stale checkout
  was at that path. Now resolves via `import.meta.url` →
  `__dirname/../src/plugin/lcm-command.ts`. Works in any worktree.

- **#10 (P2)** `pnpm-lock.yaml` was stale after the Wave-10
  `optionalDependencies` addition. Regenerated via `pnpm install
  --lockfile-only`; verified `pnpm install --frozen-lockfile` succeeds.

# Security parity

- **#2 (P1)** `/lcm doctor apply` and `/lcm doctor clean apply` lacked
  `senderIsOwner` gate. Wave-9 Agent #10 had classified the doctor
  cases as READ_ONLY, but the `apply` flag inside dispatches to the
  summarizer (cost) AND mutates summaries (state) for `doctor apply`,
  and DELETEs cleaner matches for `doctor clean apply`. Mirror the
  purge / reconcile / worker-tick / eval gate pattern. Read-only
  variants (no `--apply`) stay open.

  Plus updated `test/lcm-command.test.ts`'s `createCommandContext`
  helper to default `senderIsOwner: true` so existing tests for the
  doctor mutating paths continue passing — Wave-9 negative tests
  still explicitly pass `senderIsOwner: false` via overrides.

  Plus added 4 new tests to `v41-authorization-invariants.test.ts`
  pinning the Wave-11 doctor-apply gate behavior (apply-rejected,
  read-only-allowed for both `doctor` and `doctor clean`).

- **#5 (P1)** `lcm_describe` early-budget-gate. The Wave-10 fix charged
  base summary tokens against the grant AFTER emitting `s.content`.
  For a sub-agent at zero remaining budget, the content was already
  disclosed before accounting could prevent it. Added an EARLY gate:
  if delegated session AND base summary tokens > remaining grant,
  redact `s.content` with a clear "[REDACTED — base summary content
  is N tokens but grant has only M remaining]" message and skip the
  charge. Closes the disclosure-before-accounting path.

# Correctness

- **#3 (P1)** Timezone fractional offsets + DST. Wave-10's "sample
  offset at noon" approach broke on:
    - Half-hour zones: Asia/Kolkata (UTC+5:30) → showed +5 not +5:30
    - Quarter-hour zones: Asia/Kathmandu (UTC+5:45)
    - DST transition days: LA spring-forward 2026-03-08 → noon is in
      PDT (-7) but local midnight was in PST (-8); my function used
      the noon offset for the whole day → wrong by 1 hour
  Replaced with iterative converge-to-midnight algorithm:
    1. Format `at` in target tz to get y/m/d
    2. Probe = naive `Date.UTC(y, m-1, d, 0, 0, 0)`
    3. Format probe in target tz; compute delta from target midnight
    4. Adjust probe; repeat until delta=0 (typically 1-2 iters)
  Handles all IANA timezones, DST transitions, and arbitrary offsets.

  Added 3 new regression tests:
    - Asia/Kolkata 'yesterday' (UTC+5:30) — half-hour offset
    - Asia/Kathmandu 'today' (UTC+5:45) — quarter-hour offset
    - America/Los_Angeles 2026-03-08 — spring-forward day, asserting
      'today' duration is exactly 23h

- **#6 (P1)** Hybrid rerank now skips individually oversized
  candidates instead of bailing. Pre-fix: when the FIRST candidate
  exceeded the 510K-token (85% of 600K) rerank budget, the packer
  set `rerankPacked=[]` and broke out, disabling rerank for the
  whole result set. Now: oversized candidates are individually
  skipped (counted in `rerankPackSkippedOversized`) and packing
  continues with later candidates that fit. Result: a single huge
  FTS hit no longer takes down the whole rerank.

- **#7 (P1)** Voyage `output_dimension` not forwarded. Configurable
  embedding dimensions (`LCM_EMBEDDING_DIM=2048` registers a 2048-dim
  profile in `lcm_embedding_profile`) but `embedTexts()` never sent
  `output_dimension` to Voyage, so Voyage returned its default (1024).
  vec0 INSERT then failed with dim mismatch on the per-model table.
  Added `outputDimension?: number` to `VoyageEmbedOptions`; forwarded
  via backfill (`opts.voyageOutputDimension`) and semantic-search
  query embed (`active.dim`). Default unchanged (omit → Voyage 1024).

# Documentation accuracy

- **#4 (P1)** Synthesis dispatch model claim. Tool description said
  "per-tier dispatch (haiku/sonnet/opus/thinking)" but actual LLM call
  routes through the configured summarizer chain (which ignores
  `args.model`). Source code already had honest comment in
  `buildLlmCallFromSummarizer` ("the summarizer wrapper ignores the
  dispatch-supplied model"); the tool description and PR description
  overclaimed. Updated tool description to be accurate: dispatch
  records the per-tier model name in the audit table, but the
  actual LLM call uses the operator's configured summarizer chain.

# Polish

- **#9 (P2)** Health archive filter. `readActiveProfile` selected on
  `active = 1` alone, ignoring `archive_after IS NOT NULL`. Semantic
  retrieval correctly filters archived; health was reporting a
  profile semantic search would not actually use during model cutover.
  Now matches: `WHERE active = 1 AND archive_after IS NULL`.

- **#11 (P2)** Changeset rewritten. Old changeset only mentioned
  session-family recall. New changeset documents the full v4.1
  release surface: 8 agent tools (with new modes), 2 worker autostarts,
  9 operator commands (with owner-gating), schema changes, sqlite-vec
  optionalDependency, configuration env vars, and what was cut to Martian-Engineering#616.

- **#12 (P3)** Stale entity-search docblock. The header comment said
  "entities with all-suppressed mentions can still appear here";
  Wave-10 added the EXISTS guard so they no longer can. Updated
  comment to reflect the actual filter behavior.

# Known limitation (deferred)

- **#8 (P2)** Cache key still ignores resolved model. Adding `model_used`
  to the UNIQUE index doesn't help because model resolution is dynamic
  (the summarizer chain picks at call time, not before INSERT). The
  proper fix is invalidate-on-mismatch at cache-hit time, which is a
  larger refactor. Documented in the entry above + tracked for follow-up.

# Verification

- `npx vitest run`: **1513 / 1513 tests passing** (1502 → 1513;
  +11 new regression tests for Wave-11 fixes)
- `npx tsc --noEmit`: **677 errors** (still below 739 main baseline;
  no PR-introduced TS errors)
- `pnpm install --frozen-lockfile --ignore-scripts --lockfile-only`:
  **succeeds** (was failing pre-fix with ERR_PNPM_OUTDATED_LOCKFILE)
- Authorization invariant test: now resolves the source path relative
  to test file via `__dirname` — works in any checkout location
100yenadmin pushed a commit that referenced this pull request May 7, 2026
… net)

Wire #3 of 3 for the agent context-management architecture (Wave-14).

# What this lands

When `afterTurn` records deferred compaction debt AND the current
context ratio is at critical pressure (>= criticalBudgetPressureRatio,
default 0.70), the drain runs SYNCHRONOUSLY inline instead of
scheduling via setImmediate.

This guarantees the next assemble() call (run by the loop hook between
LLM iterations) sees the compacted state — closing the cache-hot
deferred-drain race that previously let context overflow into
openclaw's overflow-recovery path. That recovery path can engage
LOSSY tool-result truncation (run.ts:1743) which breaks the lossless
guarantee — agents lose the actual content of past tool calls.

Below critical pressure, deferred-async behavior is unchanged
(preserves cache-aware throttling).

# Why this matters

Before this fix, the layered defenses had a gap:
  Layer 1 (loop hook afterTurn) — fires per-iteration
  Layer 2 (deferred drain via setImmediate) — async, RACE-PRONE at
                                              high pressure
  Layer 3 (overflow recovery) — kicks in at API rejection
  Layer 4 (tool-result truncation) — LOSSY last resort

At critical pressure, the cache-hot gate would defer the drain.
setImmediate would schedule it. But the next LLM call could fire
BEFORE setImmediate completes — sees un-compacted state — overflows.
Layer 3+4 then kick in, and Layer 4 truncates tool results = lossless
guarantee broken.

This commit makes Layer 2 SYNC at critical pressure. The afterTurn
caller waits until compaction lands before returning. assemble()
runs after afterTurn returns and reads the compacted state. No race.

# Files

EDITED:
- src/engine.ts — afterTurn deferred-drain trigger now branches:
  - critical pressure → await drainDeferredCompactionDebtIfIdle inline
  - below critical → scheduleDeferredCompactionDebtDrain (async, unchanged)
  Failure path falls back to async if sync fails.

- test/engine.test.ts:
  - "afterTurn records deferred cold-cache catchup" — fixture
    tokenBudget raised to 100K (was 4K, accidentally critical) so
    the test still exercises the deferred-async path it intends to.
  - NEW: "afterTurn drains deferred debt SYNCHRONOUSLY at critical
    pressure (Wave-14 safety net)" — pins new behavior. Asserts
    pending=false WITHOUT vi.waitFor (would need it if drain were
    async).

# Architecture summary (3 commits combined)

Layer 1 — needsCompact pre-call gate (Commit 2): tools refuse before
overflow when projected result > REFUSAL_THRESHOLD (0.92). Agent
calls lcm_compact, retries — natural negotiation pattern.

Layer 2 — token state cache (Commit 1): llm_output hook + per-tool
self-update keeps the cache accurate within iterations + across
parallel-tool-call sequences.

Layer 3 — sync-at-critical (this commit): system safety net for
when the agent ignores all gates OR can't see them (no telemetry).
Engine guarantees compaction lands before next LLM call.

Layer 4 — agent-explicit lcm_compact tool (already shipped): rare
manual lever when agent KNOWS it needs space.

# Verification

- 1593/1593 tests passing (1592 baseline + 1 new sync-pressure test)
- 7/7 release-readiness preflight checks pass
- 330 TS errors (under 700 baseline; PR introduced none)
100yenadmin pushed a commit that referenced this pull request May 7, 2026
…describe cap

W1A1 #2 — estimator HARD_CAP was hard-coded at 10_000 but the per-tool
char cap (LCM_TOOL_RESULT_TOKEN_BUDGET) is operator-tunable. With env
raised to 30K, tools could emit 30K but the gate's projection still
capped at 10K — needsCompact decisions drifted low (refusals missed
when they should fire) by up to 3×.

W1A8 #3 — lcm_describe was truly unbounded. Worst case (Wave-12
estimator already noted this in a code comment): a single
describe(condensed_id, expandChildren=true) on a wide condensed could
emit ~210K tokens (10K base + 20×10K children). Sub-agent grant ledger
(consumeTokenBudget, Wave-9 P1) protected delegated sessions; main-
agent calls had no per-tool char cap.

Single source of truth
- New src/plugin/result-budget.ts owns the env knob resolution. Exports:
  - MAX_RESULT_TOKENS — used by needs-compact-gate as HARD_CAP_TOKENS
  - MAX_RESULT_CHARS — used by tools for truncation
  - truncationNotice(reasonHint) — standard message format
- needs-compact-gate.ts pulls HARD_CAP from MAX_RESULT_TOKENS so the
  estimator and per-tool cap stay in lockstep.
- lcm-grep-tool.ts drops its local resolveMaxResultChars (now imports
  from result-budget). Behavior identical at the default; no change to
  truncation messages. (Existing per-grep messages preserved.)

lcm_describe truncation
- truncateLinesToCap helper at top of file. Mirrors lcm_grep's pattern:
  walk lines, accumulate char count (incl. join newlines), append the
  truncation notice and stop when over cap.
- Applied at both return sites (summary describe + file describe).
- details.manifest.truncated boolean flag exposed for programmatic
  callers; details.truncated on the file branch.

Tests (6 new, total 15 in suite)
- env=30000 → MAX_RESULT_TOKENS=30K, MAX_RESULT_CHARS=120K, estimator
  projection rises above 10_000 for verbatim mode (proves no longer
  pinned at the old hard-coded ceiling)
- env unset → 10_000 default
- env=100 → clamped UP to 2_000 floor (anti-misconfig)
- env=garbage → falls back to 10_000 default
- describe with 30K-char content + env=2000 → bounded under 10K + emits
  truncation marker
- describe with small content → emits full content, no truncation marker

Verified
- 1593/1593 vitest passing (was 1587, added 6 regression tests)
100yenadmin pushed a commit that referenced this pull request May 7, 2026
Wave-12 found 9 of 10 bugs that escaped 1593 tests. Each bug was
hidden by a distinct antipattern. This commit adds 4 new test layers
that pin the antipatterns so each bug class fails LOUDLY on regression.

A. Wiring/registration smoke (14 tests)
- test/v41-tool-wiring-smoke.test.ts
- For each tool documented as wrapped in needs-compact-gate.ts: assert
  the factory file calls runWithTokenGate(. For each documented-exempt
  tool: assert it does NOT call runWithTokenGate(. Catches the W2A1 P0
  bug class (synthesize_around silently dropped off the bus).
- For each registered tool in plugin/index.ts: assert getRuntimeContext
  is wired. Catches the half of the bug where the wrapper is present
  but not given runtime context.

B. Adversarial output bounds (3 tests)
- test/v41-adversarial-output-bounds.test.ts
- lcm_get_entity with 200 mentions × 1000-char surface_forms: bound check
- lcm_search_entities with 500 entities × 200-char canonical: bound check
- lcm_search_entities respects schema-bounded limit even with caller=500
- Catches W1A8 #3 sister cases (any tool that emits content without
  per-tool char cap).

C. Cross-module invariants (6 tests)
- test/v41-cross-module-invariants.test.ts
- estimateResultTokens projection ceiling === MAX_RESULT_TOKENS
  (caller-tunable env knob). Catches the W1A1 #2 bug class where two
  modules pin the same constant in isolation and drift apart.
- MAX_RESULT_CHARS = MAX_RESULT_TOKENS × 4 ratio
- REFUSAL_THRESHOLD calibration sanity vs MAX_RESULT_TOKENS
- Every src/tools/lcm-*-tool.ts factory referenced in plugin/index.ts
- summaryKinds reaches BOTH semantic and hybrid dispatch (W1A5 #1
  schema-vs-implementation drift)
- Sub-agent expansion-auth gate consistency (lcm_expand + lcm_describe
  both consult same manager)

D. QA-runner antipattern static scan (26 tests)
- test/v41-qa-runner-antipatterns.test.ts
- Extracts each `expect: (r) => {...}` closure from qa-runner.mjs.
  For tools with external deps (Voyage / LLM), assert the graceful-
  degradation regex check appears BEFORE bare `if (r.error) return`.
  Catches the W1 F5 bug class (inverted predicate making graceful
  branch dead code).
- Pins F1 has no entityType filter (catalog browse) AND F4 has
  entityType: pr_number (W1 F1/F4 args swap regression).

Verified
- 1642/1642 vitest passing (was 1593, +49 new tests; 0 bugs surfaced
  by the new layers — the patterns pin the existing post-Wave-12 fixes
  rather than uncovering new issues).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.