feat: session-family retrieval scope (cherry-pick of upstream PR #338) + extend to v0.9.4 tools#63
Conversation
…ian-Engineering#338) Cherry-picks Martian-Engineering#338. Treats multiple physical conversation segments that share a stable session identity as one logical retrieval family. Beyond the upstream scope (which only covered lcm_grep / lcm_describe / lcm_expand_query), this commit also extends family scope to the v0.9.4 stack tools that are conversation-scoped: - lcm_recent (already family-aware via relatedConversationIds — verified) - lcm_work_density (observed-work density spans family) - lcm_event_search (events span family) - lcm_task_suggestions (suggestions derived from family-wide getDensity) This closes the correctness gap that PR Martian-Engineering#592 (auto-rotate session files) introduced: without family scope, retrieval after auto-rotate silently misses data from archived segments. Conflict resolution: - src/store/conversation-store.ts: union of v0.9.4 listConversationsBySessionKey (no-cap) + PR Martian-Engineering#338 getConversationFamilyIds (works for both session_key and session_id paths). Both retained. - src/tools/lcm-conversation-scope.ts: LcmConversationScope now carries BOTH conversationIds (PR Martian-Engineering#338-staged grep/describe/expand-query consumers) and relatedConversationIds (v0.9.4 recent/task-suggestions consumers). Both populated from a single family lookup that prefers the new getConversationFamilyIds shim and falls back to the older listConversationsBySessionKey for test mocks. allConversations rejection + family resolution kept together. Store extensions: - ObservedWorkStore.getDensity: accept conversationIds[] via the shared appendConversationScopeConstraint helper. Falls back to conversationId when no list is supplied. - EventObservationStore.listObservations / listEpisodes: same pattern.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (7)
Cache: Disabled due to Reviews > Disable Cache setting Disabled knowledge base sources:
📝 WalkthroughWalkthroughAdds optional multi-conversation scoping (conversationIds?: number[]) across retrieval, stores, and LCM tools; introduces appendConversationScopeConstraint to normalize/apply conversation-family SQL filters; wires conversation-family IDs from the scope resolver into grep, summary search, density, event queries, and related tooling. ChangesSession Family Scoping Across Retrieval and Storage
Sequence DiagramsequenceDiagram
participant Tool as LCM Tool
participant Retrieval as RetrievalEngine
participant Store as Summary/Work/Event Store
participant DB as Database
Tool->>Retrieval: grep/search with conversationIds[]
Retrieval->>Store: searchMessages/searchSummaries(conversationIds[])
Store->>DB: SQL query with WHERE ... IN (conversationIds) / appended constraint
DB-->>Store: rows
Store-->>Retrieval: shaped results (messages/summaries)
Retrieval-->>Tool: aggregated results (including family metadata)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Pull request overview
This PR extends “session-family” scoping so retrieval-style reads can span multiple physical conversation rows created by session-file rotation (/new, /reset), avoiding silent context loss when a session is split into archived segments.
Changes:
- Adds a
conversationIds(session-family) scope alongside the existing singleconversationId, and threads it through multiple tools. - Extends stores/queries (summaries, observed work density, event search) to accept
conversationIdsand apply a shared SQL scoping helper. - Updates/extends tests to cover session-family scoping behavior for
lcm_grepandlcm_expand_query.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/lcm-tools.test.ts | Adds test coverage for lcm_grep passing session-family conversationIds and updated describe error wording. |
| test/lcm-expand-query-tool.test.ts | Adds family-scope tests for lcm_expand_query selection and out-of-scope rejection. |
| src/tools/lcm-work-density-tool.ts | Passes session-family conversationIds into observed-work density queries. |
| src/tools/lcm-task-suggestions-tool.ts | Threads session-family conversationIds into density reads used by task suggestions. |
| src/tools/lcm-grep-tool.ts | Passes conversationIds into retrieval grep and updates rendered scope string for families. |
| src/tools/lcm-expand-query-tool.ts | Updates expand-query candidate resolution to consider family scopes and adds scope enforcement. |
| src/tools/lcm-event-search-tool.ts | Passes conversationIds to event observation listing for family-aware event search. |
| src/tools/lcm-describe-tool.ts | Enforces describe scoping against the allowed conversation family and updates the error message. |
| src/tools/lcm-conversation-scope.ts | Introduces conversationIds on the scope object and resolves family IDs via store helpers. |
| src/store/summary-store.ts | Adds conversationIds support to summary search paths (FTS/LIKE/regex/CJK variants). |
| src/store/observed-work-store.ts | Adds conversationIds support for density scoping (family-aware reads). |
| src/store/event-observation-store.ts | Adds conversationIds scoping for listing observations/episodes. |
| src/store/conversation-scope.ts | Adds appendConversationScopeConstraint() helper to apply (conversationId vs conversationIds) SQL constraints consistently. |
| src/retrieval.ts | Extends grep input to carry conversationIds through to store search calls. |
Comments suppressed due to low confidence (1)
src/retrieval.ts:244
conversationIdsis added toGrepInput, butRetrievalEngine.grep()passes the combinedsearchInputintoconversationStore.searchMessages(), whoseMessageSearchInput/implementation only honorsconversationId. In session-family mode (where callers may setconversationIdto undefined and rely onconversationIds), message searches become effectively unscoped (all conversations) and can leak/return results outside the intended session family. Fix by extendingConversationStore.searchMessages(andMessageSearchInput) to acceptconversationIdsand apply the same scope constraint logic asSummaryStore, or by explicitly fan-out searching per conversationId whenconversationIdsis provided.
async grep(input: GrepInput): Promise<GrepResult> {
const { query, mode, scope, conversationId, conversationIds, since, before, limit, sort } = input;
const searchInput = { query, mode, conversationId, conversationIds, since, before, limit, sort };
let messages: MessageSearchResult[] = [];
let summaries: SummarySearchResult[] = [];
if (scope === "messages") {
messages = await this.conversationStore.searchMessages(searchInput);
} else if (scope === "summaries") {
summaries = await this.summaryStore.searchSummaries(searchInput);
} else {
// scope === "both" — run in parallel
[messages, summaries] = await Promise.all([
this.conversationStore.searchMessages(searchInput),
this.summaryStore.searchSummaries(searchInput),
]);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const familyScopedConversationId = | ||
| (conversationScope.conversationIds?.length ?? 0) > 1 | ||
| ? undefined | ||
| : conversationScope.conversationId; | ||
| let scopedConversationId = familyScopedConversationId; | ||
| if ( | ||
| !conversationScope.allConversations && | ||
| scopedConversationId == null && | ||
| (conversationScope.conversationIds?.length ?? 0) <= 1 && | ||
| callerSessionKey |
| const firstAllowed = params.candidates.find((candidate) => | ||
| allowedConversationIds.has(candidate.conversationId), | ||
| ); | ||
| if (firstAllowed) { | ||
| return firstAllowed.conversationId; | ||
| } |
| const explicitConversationId = | ||
| typeof params.conversationId === "number" && Number.isFinite(params.conversationId) | ||
| ? Math.trunc(params.conversationId) | ||
| : undefined; | ||
| if (explicitConversationId != null) { | ||
| return { | ||
| conversationId: explicitConversationId, | ||
| conversationIds: [explicitConversationId], | ||
| allConversations: false, | ||
| relatedConversationIds: [], | ||
| }; |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/tools/lcm-work-density-tool.ts (1)
389-409:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReport the effective family scope in the payload.
The query is family-wide when
scope.conversationIdshas multiple entries, butconversationScopestill serializes onlyscope.conversationId. That makes the response metadata disagree with the counts and items it contains.Possible fix
- conversationScope: scope.allConversations ? "all" : scope.conversationId, + conversationScope: + scope.allConversations + ? "all" + : scope.conversationIds && scope.conversationIds.length > 1 + ? scope.conversationIds + : scope.conversationId,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tools/lcm-work-density-tool.ts` around lines 389 - 409, The payload's conversationScope is incorrect for family-wide queries: when store.getDensity was called with scope.conversationIds containing multiple IDs the metadata still serialized scope.conversationId. Update the conversationScope argument passed into applyOutputBudget (where density is set) to reflect the effective scope — keep "all" when scope.allConversations is true, return "family" (or another explicit family label) when scope.conversationIds exists and has length > 1, and otherwise return scope.conversationId so the response metadata matches the actual query made by store.getDensity.src/retrieval.ts (1)
228-243:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftFamily-scoped grep still drops message hits.
searchInputnow carriesconversationIds, butsrc/store/conversation-store.ts:778-816still scopessearchMessages(...)with onlyconversationId.- That means
scope === "messages"andscope === "both"stay single-conversation for message hits while summaries span the whole family.Please mirror the
conversationIdssupport intoConversationStore.searchMessages(...), or keep grep family-scoped for summaries only until message search is updated.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/retrieval.ts` around lines 228 - 243, The message-search path is still only honoring conversationId while searchInput now includes conversationIds, so update ConversationStore.searchMessages (the method and its internal use of conversationId) to accept and handle conversationIds the same way SummaryStore.searchSummaries does: change the signature to accept conversationIds, apply family-scoped filtering/grep over the provided conversationIds (falling back to conversationId or all conversations as before), and ensure callers (where searchMessages is invoked with searchInput) pass through the conversationIds field; alternatively, if you prefer not to implement family-scoped message search yet, explicitly restrict the top-level retrieval logic to not pass conversationIds into searchMessages and only support family-scoped searches for summaries until message search is updated.src/tools/lcm-event-search-tool.ts (1)
112-126:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReturn family scope metadata when results span multiple conversations.
listObservations(...)now reads acrossscope.conversationIds, but the payload still reports onlyscope.conversationId. For family-scoped searches, that makes the top-level metadata misleading even though each observation carries its ownconversationId.Possible fix
- conversationScope: scope.allConversations ? "all" : scope.conversationId, + conversationScope: + scope.allConversations + ? "all" + : scope.conversationIds && scope.conversationIds.length > 1 + ? scope.conversationIds + : scope.conversationId,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tools/lcm-event-search-tool.ts` around lines 112 - 126, The response metadata currently always uses scope.conversationId even when listObservations (via lcm.getEventObservationStore().listObservations) was run across scope.conversationIds; update the jsonResult payload to compute conversationScope from the actual observations: after calling listObservations, collect distinct observation.conversationId values and if more than one distinct id is present (or if scope.conversationIds.length > 1) set conversationScope to a family/aggregate value (e.g. "family" or "all") otherwise set it to scope.conversationId; change the assignment where jsonResult is built (the conversationScope key) to use this computed value so top-level metadata reflects multi-conversation results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/tools/lcm-conversation-scope.ts`:
- Around line 204-234: collectFamilyConversationIds currently returns
getConversationFamilyIds results verbatim while listConversationsBySessionKey
results are mapped newest-first, causing unstable ordering for
relatedConversationIds; normalize ordering in collectFamilyConversationIds by
applying a consistent sort before returning: call
store.getConversationFamilyIds(...) (or map records from
listConversationsBySessionKey) and then sort the resulting number[] into the
documented newest-first order (use conversation timestamp or inferred ordering
field from the returned records if available), ensuring both paths
(getConversationFamilyIds and listConversationsBySessionKey) produce the same
stable newest-first sequence for relatedConversationIds.
---
Outside diff comments:
In `@src/retrieval.ts`:
- Around line 228-243: The message-search path is still only honoring
conversationId while searchInput now includes conversationIds, so update
ConversationStore.searchMessages (the method and its internal use of
conversationId) to accept and handle conversationIds the same way
SummaryStore.searchSummaries does: change the signature to accept
conversationIds, apply family-scoped filtering/grep over the provided
conversationIds (falling back to conversationId or all conversations as before),
and ensure callers (where searchMessages is invoked with searchInput) pass
through the conversationIds field; alternatively, if you prefer not to implement
family-scoped message search yet, explicitly restrict the top-level retrieval
logic to not pass conversationIds into searchMessages and only support
family-scoped searches for summaries until message search is updated.
In `@src/tools/lcm-event-search-tool.ts`:
- Around line 112-126: The response metadata currently always uses
scope.conversationId even when listObservations (via
lcm.getEventObservationStore().listObservations) was run across
scope.conversationIds; update the jsonResult payload to compute
conversationScope from the actual observations: after calling listObservations,
collect distinct observation.conversationId values and if more than one distinct
id is present (or if scope.conversationIds.length > 1) set conversationScope to
a family/aggregate value (e.g. "family" or "all") otherwise set it to
scope.conversationId; change the assignment where jsonResult is built (the
conversationScope key) to use this computed value so top-level metadata reflects
multi-conversation results.
In `@src/tools/lcm-work-density-tool.ts`:
- Around line 389-409: The payload's conversationScope is incorrect for
family-wide queries: when store.getDensity was called with scope.conversationIds
containing multiple IDs the metadata still serialized scope.conversationId.
Update the conversationScope argument passed into applyOutputBudget (where
density is set) to reflect the effective scope — keep "all" when
scope.allConversations is true, return "family" (or another explicit family
label) when scope.conversationIds exists and has length > 1, and otherwise
return scope.conversationId so the response metadata matches the actual query
made by store.getDensity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e7f35339-1d8a-4e35-911e-170d7bb9c5ec
📒 Files selected for processing (14)
src/retrieval.tssrc/store/conversation-scope.tssrc/store/event-observation-store.tssrc/store/observed-work-store.tssrc/store/summary-store.tssrc/tools/lcm-conversation-scope.tssrc/tools/lcm-describe-tool.tssrc/tools/lcm-event-search-tool.tssrc/tools/lcm-expand-query-tool.tssrc/tools/lcm-grep-tool.tssrc/tools/lcm-task-suggestions-tool.tssrc/tools/lcm-work-density-tool.tstest/lcm-expand-query-tool.test.tstest/lcm-tools.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: smoke-latest-openclaw
- GitHub Check: Agent
🔇 Additional comments (13)
src/tools/lcm-task-suggestions-tool.ts (1)
202-215: LGTM!The addition of
conversationIds: scope.conversationIdscorrectly extends density queries to span the session family. The dual-field approach (conversationId+conversationIds) aligns with the PR's stated conflict resolution strategy, and the underlyingappendConversationScopeConstraintsafely handles undefined/empty arrays.src/tools/lcm-grep-tool.ts (2)
150-160: LGTM!Correctly forwards
conversationIdsto the retrieval layer, enabling family-scoped grep. The dual-field pattern is consistent with other tools in this PR.
168-175: LGTM!The output header logic correctly handles:
undefined/empty arrays → falls back to single conversationId display- Multiple segments → shows family root + segment count
Clean user feedback for the new scoping behavior.
src/tools/lcm-describe-tool.ts (1)
105-120: LGTM!The scope validation correctly:
- Prioritizes
conversationIdsarray when populated- Falls back to single
conversationIdwhen array is empty/undefined- Uses Set membership for O(1) lookup
- Updates error messaging to reflect session-scoped semantics
The guard at line 105 ensures we only build the allowed set when a valid scope exists.
src/tools/lcm-expand-query-tool.ts (3)
1051-1104: LGTM!
resolveSourceConversationIdcorrectly extended:
- Validates all candidates against
allowedConversationIdswhen provided- Selects first allowed candidate when multiple family IDs exist
- Falls through to existing single-conversation logic otherwise
The validation at lines 1073-1082 ensures we never reach the multi-select logic with out-of-scope candidates.
1140-1254: LGTM!The
resolveSummaryCandidatesfallback path correctly handles session-family scoping:
- Computes
fallbackConversationIdsfrom array or single ID- Checks max summary depth per conversation before allowing message fallback
- Groups message hits by
conversationIdfor per-conversation leaf link lookups- Maps results back to correct
conversationIdviamessageConversationByIdThe "all must be shallow" check at lines 1200-1202 is appropriately conservative.
1579-1614: LGTM!The tool wrapper correctly integrates family scoping:
familyScopedConversationIdonly set when family is singular (avoids ambiguity)- Falls back to requester resolution only when family doesn't provide scope
- Error path guards against missing scope when family is singular
- Threads
conversationIdsthrough to downstream functionstest/lcm-expand-query-tool.test.ts (3)
136-168: LGTM!The test engine mock correctly extended to support
conversationFamilyIds:
- Returns explicit family IDs when provided
- Falls back to wrapping single
conversationIdin array- Returns empty array when no conversation context exists
This accurately mirrors the production
getConversationFamilyIdsbehavior.
303-423: LGTM!Good coverage for session-family scoping in expand-query:
"resolves a single source conversation from a session family query"verifies grep receives family IDs and the tool correctly routes to the first allowed conversation"rejects explicit summaryIds that fall outside the allowed session family scope"confirms the scope validation throws the expected errorBoth tests validate the assertions against the specific error messages and call signatures.
2005-2125: LGTM!Comprehensive test for the multi-segment fallback path:
- Verifies grep is called twice (summaries then messages) with full family scope
- Confirms per-conversation depth checks via
getConversationMaxSummaryDepthcalls- Validates per-conversation leaf link lookups with correct message ID grouping
- Asserts final result routes to the expected family segment
Well-structured mock setup with distinct conversation segments.
test/lcm-tools.test.ts (3)
86-141: LGTM!
buildLcmEnginemock correctly extended:
- Accepts optional
conversationFamilyIdsparameter- Mocks
getConversationFamilyIdswith proper fallback chain- Consistent with the expand-query test file pattern
307-347: LGTM!Good test coverage for session-family grep:
- Line 310: Updated assertion to include
conversationIds: [42]alongside existing single ID check- Lines 315-347: New test validates multi-segment family search, verifying both the call signature and user-facing output format
Clear and targeted assertions.
399-401: LGTM!Error message assertion correctly updated to match the new session-scope wording in
lcm-describe-tool.ts.
Adversarial audits 1-4 found these issues. Addressed: - MAJOR-1: lcm_recent freshness gate now ORs pending_rebuild across all family conversations. Without this, an auto-rotated session with conv A pending and conv B clean could serve A's stale rollup when queried via B. - MAJOR-2: collectFamilyConversationIds normalizes ordering across both lookup paths (getConversationFamilyIds + listConversationsBySessionKey fallback). User-visible 'session family rooted at X' string is now stable. - MAJOR-3: appendConversationScopeConstraint singleton path now Number.isFinite validates + Math.trunc, matching the array path. - MAJOR-4: listConversationsBySessionKey now caps at 256 most-recent. Defends against IN-list overflow if a session_key accumulates 1000+ archived rows. - MINOR: dropped unimplemented getConversationFamilyIds optional shape; collectFamilyConversationIds standardizes on listConversationsBySessionKey. Added regression test: family pending_rebuild OR'd correctly.
Cherry-picks Martian-Engineering#338. Extends the upstream session-family scope work to cover the v0.9.4 stack tools that the upstream PR didn't know about.
Why this matters
PR Martian-Engineering#592 (auto-rotate oversized session files), which we landed in v0.9.5, can split a long-running session into multiple physical conversation rows. Without session-family scope, retrieval against the post-rotate session only sees the active segment — silently missing context from archived segments.
This PR closes that correctness gap.
What this includes
From upstream PR Martian-Engineering#338
Net-new (v0.9.4 stack tool extensions)
Conflict resolution
Tests
Concerns flagged for review (audit will probe these)
Summary by CodeRabbit
New Features
Bug Fixes / Reliability
Tests