feat: add daily lcm_recent rollups#13
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
📝 WalkthroughWalkthroughAdds DB schema and views for rollups, a RollupStore and RollupBuilder to produce per-conversation daily markdown rollups with fingerprinting and token-budget trimming, engine hooks to mark/trigger rebuilds, a new lcm_recent tool with SQL fallback, tests, and a changeset. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Plugin as Plugin/\nToolHandler
participant Engine as LcmContext\nEngine
participant Builder as RollupBuilder
participant RollupStore as RollupStore
participant DB as Database
Client->>Plugin: call lcm_recent(period)
Plugin->>RollupStore: listRollups(conversationId, periodKind)
alt rollups exist & ready
RollupStore->>DB: SELECT lcm_rollups...
DB-->>RollupStore: rollup rows
RollupStore-->>Plugin: rollup content + metadata
Plugin-->>Client: return rollup-based markdown
else missing/incomplete/stale
Plugin->>RollupStore: getLeafSummariesForDay(dateRange)
RollupStore->>DB: SELECT summaries(kind='leaf')...
DB-->>RollupStore: leaf summaries
RollupStore-->>Plugin: leaf summaries
Plugin-->>Client: return fallback markdown
end
rect rgba(100,150,200,0.5)
Note over Engine,Builder: Background maintenance flow
Engine->>Engine: ingestSingle(message)
Engine->>RollupStore: upsertState(conversationId, pending_rebuild=1, last_message_at)
Engine->>Engine: maintain()
Engine->>RollupStore: getState(conversationId)
RollupStore-->>Engine: state
alt pending_rebuild
Engine->>Builder: buildDailyRollups(conversationId, daysBack=7, forceCurrentDay=true)
Builder->>RollupStore: getLeafSummariesForDay(day)
RollupStore->>DB: SELECT summaries...
DB-->>RollupStore: summaries
RollupStore-->>Builder: summary rows
Builder->>Builder: computeFingerprint()/render/trim tokens
Builder->>DB: BEGIN IMMEDIATE
Builder->>RollupStore: upsertRollup(...)
Builder->>RollupStore: replaceRollupSources(ordered list)
Builder->>RollupStore: upsertState(pending_rebuild=0)
Builder->>DB: COMMIT
Builder-->>Engine: BuildResult
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Pushed hardening fixes in Addressed the blocking review points on the daily-rollup foundation:
Verification:
|
|
Pushed follow-up fixes in Closed the remaining mergeability gaps from review:
Verification:
At this point I don’t see any remaining blocking concerns for PR #13. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/db/migration.ts (1)
1046-1048: 🧹 Nitpick | 🔵 TrivialDrop legacy
summaries_leaf_temporal_idx; keep expression indexes only.
- Line 1046-1048 keeps a raw-column index that doesn’t match the current
julianday(coalesce(...))temporal predicates.- This adds avoidable write/update index maintenance on
summaries.Proposed migration diff
- CREATE INDEX IF NOT EXISTS summaries_leaf_temporal_idx - ON summaries (conversation_id, kind, latest_at, earliest_at, created_at) - WHERE kind = 'leaf';#!/bin/bash set -euo pipefail echo "Find temporal leaf-summary predicates used by query code:" rg -n -C2 "julianday\\(coalesce\\(earliest_at, latest_at, created_at\\)\\)|julianday\\(coalesce\\(latest_at, earliest_at, created_at\\)\\)" src --type ts echo echo "Find any leaf-summary query still relying on raw latest/earliest ordering/filtering:" rg -n -C3 "kind\\s*=\\s*'leaf'|kind\\s*=\\s*\"leaf\"" src --type ts | rg -n "latest_at|earliest_at|created_at"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/db/migration.ts` around lines 1046 - 1048, The migration currently creates a legacy raw-column index summaries_leaf_temporal_idx on summaries (conversation_id, kind, latest_at, earliest_at, created_at) which is redundant and harmful because query code uses expression indexes based on julianday(coalesce(...)); remove the CREATE INDEX for summaries_leaf_temporal_idx and replace it with a DROP INDEX IF EXISTS summaries_leaf_temporal_idx in the migration so only the expression indexes remain, and verify no code relies on raw latest_at/earliest_at ordering before applying the change (search for uses of latest_at/earliest_at/created_at in leaf queries and the julianday(coalesce(...)) expression references to confirm compatibility).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/db/migration.ts`:
- Around line 1046-1048: The migration currently creates a legacy raw-column
index summaries_leaf_temporal_idx on summaries (conversation_id, kind,
latest_at, earliest_at, created_at) which is redundant and harmful because query
code uses expression indexes based on julianday(coalesce(...)); remove the
CREATE INDEX for summaries_leaf_temporal_idx and replace it with a DROP INDEX IF
EXISTS summaries_leaf_temporal_idx in the migration so only the expression
indexes remain, and verify no code relies on raw latest_at/earliest_at ordering
before applying the change (search for uses of latest_at/earliest_at/created_at
in leaf queries and the julianday(coalesce(...)) expression references to
confirm compatibility).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 70c6a54a-ac15-4fe2-a7ec-3ff4c8f4e41a
📒 Files selected for processing (1)
src/db/migration.ts
📜 Review details
🔇 Additional comments (2)
src/db/migration.ts (2)
989-1033: Rollup table schema is coherent and constraint-safe.
- Good use of
CHECKconstraints and FK links forlcm_rollups,lcm_rollup_sources, andlcm_rollup_state.UNIQUE (conversation_id, period_kind, period_key)correctly enforces one logical rollup per period key.
1067-1078: Compatibility views are clean and useful.
- The day/week/month views provide stable query surfaces without duplicating storage logic.
Current-head triage summaryHead: Validation:
Thread matrix:
Remaining concerns: none known for the daily MVP slice. |
Latest review-thread triage after
|
| Comment ID | Status | Evidence | Action |
|---|---|---|---|
| 3153906664 | fixed | buildDailyRollups() now derives candidate days with shiftDateKey(todayKey, -offset), and buildDayRollup() uses getLocalDayBoundsForDateKey(dateKey, timezone) rather than a UTC-noon surrogate. Added UTC+13 regression coverage. |
Resolving. |
| 3150297222 | fixed | buildDailyRollups() now updates last_rollup_check_at and clears pending_rebuild after a successful full scan, including skipped/empty scans. It preserves pending when last_message_at advanced during the scan. |
Resolving. |
| 3150297116 | fixed/stale | Tool description now says direct SQL leaf-summary fallback and explicitly scopes week/month aggregate rollups to the follow-up PR. | Resolving. |
| 3150297284 | fixed | Tests now cover schema/views, daily rollup use, SQL fallback, invalid dates, empty-scan state updates, content-change fingerprints, and UTC+13 daily bounds. | Resolving. |
| 3150297080, 3150297242 | stale after newer commit | Fallback and leaf-summary scans use julianday(...) comparisons and canonical UTC output formatting. |
Resolving. |
| 3150297148, 3150297193 | stale after newer commit | Source replacement errors now propagate through awaited transaction flow. | Resolving. |
| 3150307392, 3153797461 | valid non-blocking polish | All-conversation fallback remains intentionally bounded in the MVP; larger budgeted all-conversation controls arrive in the aggregate/deepening slice. | Leaving explained in summary; not a blocker. |
| 3153544741 | delegated to PR #12 | Skill/reference guidance is intentionally owned by the docs/spec PR in this stack, which now gates lcm_recent availability and documents recap-vs-proof. |
Not a code blocker for this slice. |
| 3151696627, 3153907149 | incorrect / safe by construction | pending_rebuild=0 is the intended maintenance short-circuit; ingest and timezone/maintenance paths mark the state dirty when rebuild work exists. |
Resolving. |
| 3153906868 | safe by construction | Intl hour 24 is normalized in the shared local-time formatter path and covered by the temporal parser tests in descendant slices. |
Resolving. |
Remaining concerns: no current-head blocker identified for the daily MVP slice.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/rollup-builder.ts`:
- Around line 179-187: The current logic clears pending_rebuild based only on
last_message_at vs scannedAt which can overwrite a newer rebuild flag set after
scannedAt; change the update to only clear pending_rebuild when you can verify
the row is the same causal version that you inspected—e.g., read a
generation/version from latestState (latestState.dirty_generation) when
computing shouldClearPending and pass that generation into the write to
this.store (either via a new compare-and-swap API like
this.store.compareAndSwapState(conversationId, expectedGeneration, update) or by
extending this.store.upsertState to accept an expectedGeneration), and set
pending_rebuild to 0 only if the stored generation equals the expected
generation; otherwise leave pending_rebuild as 1. Ensure to reference
latestState, scannedAt, result.errors, pending_rebuild, and conversationId when
implementing the conditional write.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 32008203-7297-4d61-85b5-d162169a4a45
📒 Files selected for processing (2)
src/rollup-builder.tstest/rollup-store-builder.test.ts
|
Follow-up after |
|
Follow-up after |
|
Follow-up hardening landed in |
|
Follow-up hardening landed in |
| this.config.timezone | ||
| ); | ||
| if (existing) { | ||
| this.store.deleteRollup(existing.rollup_id); |
There was a problem hiding this comment.
🔴 Automatic rollup deletion in daily sweep also violates AGENTS.md lossless rule
Same issue as the buildDayRollup path: in buildDailyRollups at line 138, when an empty-day is detected during the sweep, the existing rollup is deleted instead of marked stale. This is the second call site for the same AGENTS.md violation — automatic maintenance code performing destructive recovery instead of using a stale marker.
| this.store.deleteRollup(existing.rollup_id); | |
| this.store.markStale(existing.rollup_id); |
Was this helpful? React with 👍 or 👎 to provide feedback.
| export interface RollupBuilderConfig { | ||
| timezone: string; | ||
| dailyTargetTokens?: number; | ||
| dailyMaxTokens?: number; | ||
| } |
There was a problem hiding this comment.
📝 Info: No new plugin config options added — openclaw.plugin.json update not needed
The RollupBuilderConfig type has dailyTargetTokens and dailyMaxTokens fields, but these are only used internally by RollupBuilder and are not exposed as user-facing plugin config options. The engine constructs the builder with only timezone (src/engine.ts:1507-1509), using default values for the token settings. Per the AGENTS.md Config Schema Sync rule, no openclaw.plugin.json or docs update is needed for this PR.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Consolidation note: this public fork PR remains review provenance for the LCM temporal-memory stack. I consolidated the docs + daily + sub-day + aggregate/debug layers onto current upstream main in Martian-Engineering#516 so upstream reviewers can inspect one clean current-main diff. This PR should be read as historical/layered context for that upstream PR rather than the current merge target. |
| const summaries = this.getLeafSummariesForDay(conversationId, start, end) | ||
| .filter((summary) => summary.kind === "leaf") | ||
| .sort(compareSummariesChronologically); | ||
|
|
||
| if (summaries.length === 0) { | ||
| const existing = this.store.getRollup( | ||
| conversationId, | ||
| PERIOD_KIND, | ||
| dateKey, | ||
| this.config.timezone | ||
| ); | ||
| if (existing) { | ||
| this.store.deleteRollup(existing.rollup_id); | ||
| return true; | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| const totalSourceTokens = summaries.reduce( | ||
| (sum, summary) => sum + safeTokenCount(summary.tokenCount), | ||
| 0 | ||
| ); | ||
| const sourceMessageCount = summaries.reduce( | ||
| (sum, summary) => sum + safeTokenCount(summary.sourceMessageCount), | ||
| 0 | ||
| ); | ||
| const fingerprint = computeFingerprint( | ||
| summaries.map((summary) => ({ | ||
| id: summary.summaryId, | ||
| tokenCount: summary.tokenCount, | ||
| content: summary.content, | ||
| updatedAt: summary.updatedAt, | ||
| createdAt: summary.createdAt, | ||
| earliestAt: summary.earliestAt, | ||
| latestAt: summary.latestAt, | ||
| sourceCount: summary.sourceMessageCount, | ||
| })) | ||
| ); | ||
| const draft = buildDailyRollupContent({ | ||
| dateKey, | ||
| summaries, | ||
| timezone: this.config.timezone, | ||
| maxTokens: this.dailyMaxTokens, | ||
| }); | ||
| const builtAt = new Date(); | ||
| const sourceSummaryIds = JSON.stringify( | ||
| summaries.map((summary) => summary.summaryId) | ||
| ); | ||
| const coverageStart = minSummaryTime(summaries)?.toISOString() ?? null; | ||
| const coverageEnd = maxSummaryTime(summaries)?.toISOString() ?? null; | ||
|
|
There was a problem hiding this comment.
📝 Info: Summaries read outside the transaction in buildDayRollup
In src/rollup-builder.ts:226-228, the leaf summaries are fetched and processed OUTSIDE the withDatabaseTransaction block at line 277. The fingerprint, draft content, and source metadata are all computed from this pre-transaction read. If summaries change concurrently between the read and the write transaction, the rollup would be built on stale data. This is mitigated by: (1) the source_fingerprint is stored, so the next sweep detects the mismatch and triggers a rebuild; (2) the engine's maintenance code serializes per-conversation work; (3) BEGIN IMMEDIATE on the write path prevents concurrent writers. This is an eventual-consistency trade-off rather than a correctness bug, but worth noting for future reviewers.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
Closing this local/fork PR as superseded by the canonical upstream temporal-spine PR: The consolidated architecture and sprint map is now recorded in:
This local PR remains useful as historical source material only; future work should happen upstream. |
|
Closed as part of LCM PR-stack cleanup. See the preceding comment for the canonical upstream PR or follow-up issue. |
|
Fork cleanup note: daily lcm_recent rollup scope is superseded by upstream Martian-Engineering/lossless-claw PR Martian-Engineering#516. Follow-up hardening and observed-work integration live in PR Martian-Engineering#526 and descendants. |
… + 8 P1 closed Wave 8 ran 22 parallel Opus 1M-context agents at ~1k LOC each across the full PR — verifying Wave-7 fixes don't break untouched surfaces + catching pre-existing bugs Wave-7 missed. Surfaced 1 NEW P0 + ~10 P1s + many P2/P3. P0 (CRITICAL — Wave-4+W6 anti-fabrication validation was no-op for months) ========================================================================== Wave-8 Auditor #7-12 P0-1: lcm_expand_query citedIds validation has been silently broken since Wave-4 introduced it. The `runDelegatedExpandQuery` function calls `params.lcm.getDb()` for citation validation against the summaries table, BUT `RunDelegatedExpandQueryParams` never declared a `lcm` field, AND neither call site (lines 1232, 1283) passed `lcm`. Every validation call threw `TypeError: Cannot read properties of undefined (reading 'getDb')`. The empty `catch {}` on the validation block silently swallowed the error → `citedIdsValidated` stayed as the unvalidated set, `citedIdsRejectedAsFabricated` and `citedIdsExceededValidationCap` stayed at 0 forever. This means: an LLM that hallucinates `sum_xyz` IDs has those IDs surfaced unchanged to the caller, with telemetry that LOOKS like validation ran. The entire W4 + W6 anti-fabrication contract was fictional in production. Fix: 1. Added `lcm: LcmContextEngine` to `RunDelegatedExpandQueryParams`. 2. Both call sites now pass `lcm: input.lcm` (the tool's LcmContextEngine). 3. Replaced empty `catch {}` with logging + graceful degrade — so any future regression in validation is visible in logs instead of silently no-op. P1 — HIGH IMPACT (8 closed) ============================ Auditor #7-12 P1-1: conversation-store.searchRegex lacked SQL LIMIT (W8 R1 fixed summary-store but missed the parallel conversation-store). Without LIMIT, regex grep on broad scope materialized the entire messages table into memory before the JS-side MAX_ROW_SCAN cap. Fix: bind SQL_SCAN_BOUND=10000 LIMIT. Auditor #7-12 P1-4: lcm_describe budget=0 status reused "skipped-non-summary" for "delegated-grant exhausted" — programmatic consumers couldn't distinguish file-target skips from budget-exhausted skips. Fix: distinct `"budget-exhausted"` status added to both expandChildrenStatus + expandMessagesStatus union types. Auditor #7-12 P1-5: verbatim grep FTS-fallback hard-coded `ftsBindIndex = 0` with a brittle "always pushed first" comment. Any future refactor moving the FTS bind would silently break the LIKE fallback substitution. Fix: track `ftsBindIndex = binds.length` AT THE PUSH SITE so it's correct regardless of bind order. Auditor #13-18 E-P1: purge.ts resolved targetLeaves OUTSIDE the BEGIN IMMEDIATE transaction — race window where a concurrent suppression update between resolve + apply caused audit-trail loss (stale `suppress_reason` overwrite). Fix: extracted runSoftPurgeAtomic that opens BEGIN IMMEDIATE first, resolves INSIDE the tx, then runs the cascade. Old runSoftPurge kept as a backward-compat shim. Auditor #19-22 B-P1-1: PR_DESCRIPTION Scenario 1 ("yesterday") said to use `window_kind: 'time'` — but every other doc routes "yesterday" through `window_kind: 'period'` (the lcm_recent replacement). Fix: corrected to period mode. Auditor #19-22 A-P1-1: v41-live-db-harness cost comment was 100× wrong ($0.05 vs actual $0.001 for 5K tokens at $0.18/1M). Fix: corrected math + reasoning. Wave-8 R1 (already shipped earlier in this session, included for completeness): - best-of-N 1-survivor short-circuit was returning a malformed SynthesizeResult missing 4 required fields (primaryPromptId, auditIds, totalLatencyMs, totalCostCents) — TYPE CONTRACT VIOLATION - judge `parseJudgeOutput(_, ctx.bestOfN)` should be `parseJudgeOutput(_, candidateResults.length)` — judge can pick out-of-bounds index when survivors < requested N - summary-store.searchRegex SQL LIMIT - semantic-search filteredAfterJoin: false on entity-only branch - getLeafSummaryLinksForMessageIds suppression filter REGRESSION TESTS ADDED (4 new) ================================ Wave-7 P0-2 message-orphan: 2 tests covering shared-message suppression invariants (only suppress when ALL referencing leaves purged together). Wave-8 P0 best-of-N short-circuit: 2 tests covering the 1-survivor-returns-complete-SynthesizeResult contract + the judge-survivor-count fix. VERIFICATION ============ - 1349/1349 tests passing (1345 + 4 new regression) - QA runner full: 30/30 pass - Total cost ~$0.11 per full QA run DEFERRED (acknowledged) ======================== - Auditor #13-18 L-P1: reconcileSessionTail partial-failure semantics (pre-existing, not introduced by Wave-7 ingestSingle wrap). - Auditor #13-18 J-P1: eval timeout indistinguishable from genuine zero-recall — needs RecallResult shape change; cycle-3. - Auditor #7-12 P1-2: expandRecursive token-cap overshoot — minor contract violation, in-flight; defer. - Auditor #7-12 P1-3: describe raw-count COUNT(*) performance — needs schema-level (parent_summary_id) index check, deferred. - Auditor #7-12 P0-2 hybrid confidenceBand mislabel on rerank — W7 added confidenceBandSource field (cosine|rerank); thresholds were re-evaluated as P3 calibration concern. - Auditor #19-22 D: 7 of 11 W7+W8 fixes still lack regression tests (4 added in this commit; 7 deferred to a follow-up backfill PR). - Various P2/P3 cosmetic items. Audit cycle stats: - Waves 1-3: 32 P1+ findings closed - Wave 4 (full PR, 22 agents): 7 P0 + 30 P1 closed - Wave 5: 4 P1 + 4 P2 closed (focused on W4 changes) - Wave 6: 0 P1 + 4 P2 closed (focused on W5 changes + integration) - Wave 7 (full PR re-audit, 22 agents): 7 P0 + 15 P1 closed - Wave 8 (full PR re-audit, 22 agents): 1 P0 + 8 P1 closed TOTAL: ~140 unique findings closed across 8 waves
Summary
Adds the daily LCM temporal-rollup foundation and the first
lcm_recentimplementation. This PR is the lowest implementation slice: daily rollups, source provenance, compatibility views, strict date handling, and bounded direct SQL fallback over leaf summaries.lcm_recentis intentionally a recap/window-entry tool. It helps locate likely time windows and source IDs; exact claims still requirelcm_describeorlcm_expand_query.What Changed
lcm_rollups,lcm_rollup_sources,lcm_rollup_state, compatibility views, and rollup indexes.lcm_recentfortoday,yesterday,date:YYYY-MM-DD, and rolling7d/30drecap windows.lcm_grep.julianday(...), canonical UTC timestamp outputs, and expression indexes aligned to fallback queries.Out Of Scope
lcm_rollup_debugare in feat: add weekly and monthly lcm rollups #15.Validation
npm test -- --run test/rollup-store-builder.test.ts test/lcm-tools.test.ts— 18 tests passed.npm run build— passed.git diff --check— clean.Reviewer Notes
Known earlier review concerns addressed here: strict invalid-date rejection, SQL fallback wording, rollup-miss fallback coverage, compatibility views, content-change fingerprint invalidation, source-message counts, transaction propagation, TOCTOU lookup inside
BEGIN IMMEDIATE, full-scan pending-state clearing, mixed timestamp comparisons, expression indexes, timezone-safe stored rollup lookup, UTC+13 date-key preservation, race-safe pending-state clearing, and deletion of stale direct day rollups and empty-day daily sweep rollups, day-scoped empty cleanup errors, and monotonic daily sweeplast_rollup_check_atupdates, final sweep-state failure reporting, resolved engine timezone wiring, source-ID redaction in tool details, explicit invalid-calendar-date errors, and multi-day output caps, and 30-day maintenance backfill for the30drecall surface.Summary by CodeRabbit
New Features
Performance
Tests
Chores