fix(session_search): load content from FTS-matched session, not empty compaction root#5447
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes how session_search and the “recent sessions” mode handle context-compaction chains linked via parent_session_id, ensuring search results load content from the matched (non-empty) fragment while deduplicating by lineage.
Changes:
- Add a lineage helper and update current-session exclusion logic to operate across compaction chains.
- Update
session_searchto deduplicate by lineage root while loading conversation content from the FTS-matched session. - Add new tests covering compaction-chain search behavior and deduplication.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| tools/session_search_tool.py | Updates search + recent-session listing logic to be compaction-chain aware; adds lineage helper. |
| tests/tools/test_session_search.py | Adjusts an existing test docstring and adds tests for compaction-chain search content loading and deduplication. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Return the set of all session IDs in a compaction chain (parents + children).""" | ||
| ids = set() | ||
| # Walk up to root | ||
| sid = session_id | ||
| visited = set() | ||
| while sid and sid not in visited: | ||
| visited.add(sid) | ||
| ids.add(sid) | ||
| try: | ||
| s = db.get_session(sid) | ||
| parent = s.get("parent_session_id") if s else None | ||
| sid = parent if parent else None | ||
| except Exception: | ||
| break |
There was a problem hiding this comment.
_get_lineage_ids docstring claims it returns parents + children, but the implementation only walks up the parent chain and never discovers descendants. This is misleading and also means callers (e.g., recent-session exclusion) won't exclude child fragments when current_session_id is a root. Either update the docstring/name to reflect “ancestors”, or extend the helper to also collect descendants (e.g., by scanning sessions with include_children=True and selecting those whose resolved root matches).
| """Return the set of all session IDs in a compaction chain (parents + children).""" | |
| ids = set() | |
| # Walk up to root | |
| sid = session_id | |
| visited = set() | |
| while sid and sid not in visited: | |
| visited.add(sid) | |
| ids.add(sid) | |
| try: | |
| s = db.get_session(sid) | |
| parent = s.get("parent_session_id") if s else None | |
| sid = parent if parent else None | |
| except Exception: | |
| break | |
| """Return all session IDs in the same compaction chain as `session_id`.""" | |
| def _resolve_root_and_chain(start_sid: str) -> tuple: | |
| chain_ids = set() | |
| sid = start_sid | |
| visited = set() | |
| last_sid = start_sid | |
| while sid and sid not in visited: | |
| visited.add(sid) | |
| chain_ids.add(sid) | |
| last_sid = sid | |
| try: | |
| session = db.get_session(sid) | |
| parent = session.get("parent_session_id") if session else None | |
| sid = parent if parent else None | |
| except Exception: | |
| break | |
| return last_sid, chain_ids | |
| if not session_id: | |
| return set() | |
| root_id, ids = _resolve_root_and_chain(session_id) | |
| # Also collect descendants by scanning sessions and selecting those that | |
| # resolve to the same root. Keep ancestor results even if enumeration fails. | |
| try: | |
| try: | |
| sessions = db.list_sessions_rich( | |
| limit=10000, | |
| exclude_sources=list(_HIDDEN_SESSION_SOURCES), | |
| include_children=True, | |
| ) | |
| except TypeError: | |
| sessions = db.list_sessions_rich( | |
| limit=10000, | |
| exclude_sources=list(_HIDDEN_SESSION_SOURCES), | |
| ) | |
| root_cache = {session_id: root_id} | |
| for session in sessions: | |
| candidate_id = session.get("id") | |
| if not candidate_id or candidate_id in ids: | |
| continue | |
| candidate_root = root_cache.get(candidate_id) | |
| if candidate_root is None: | |
| candidate_root, candidate_chain = _resolve_root_and_chain(candidate_id) | |
| root_cache[candidate_id] = candidate_root | |
| if candidate_root == root_id: | |
| ids.update(candidate_chain) | |
| ids.add(candidate_id) | |
| elif candidate_root == root_id: | |
| ids.add(candidate_id) | |
| except Exception: | |
| pass |
There was a problem hiding this comment.
Addressed in 6146652. Rewrote the lineage helper to properly walk up the parent chain AND perform BFS downward from the root to collect all fragments in a compaction chain. Docstring updated to match.
| # typically 0 messages) are just handoff points — the real conversation | ||
| # lives in the leaf. We skip intermediates but show the leaf even | ||
| # though it has a parent_session_id. | ||
| sessions = db.list_sessions_rich(limit=limit + 20, exclude_sources=list(_HIDDEN_SESSION_SOURCES)) |
There was a problem hiding this comment.
db.list_sessions_rich() excludes child sessions by default (include_children=False in hermes_state.SessionDB). Since this function is intended to surface compaction leaves (which have parent_session_id), it needs to call list_sessions_rich(..., include_children=True); otherwise none of the new compaction-leaf logic will ever run and bug #2 remains unfixed.
| sessions = db.list_sessions_rich(limit=limit + 20, exclude_sources=list(_HIDDEN_SESSION_SOURCES)) | |
| sessions = db.list_sessions_rich( | |
| limit=limit + 20, | |
| exclude_sources=list(_HIDDEN_SESSION_SOURCES), | |
| include_children=True, | |
| ) |
There was a problem hiding this comment.
Addressed in 6146652. Recent session listing now fetches children for correct leaf inclusion and chain deduplication.
| # Skip intermediate compaction nodes — they have a parent and | ||
| # ended due to compression (typically 0 messages). The leaf | ||
| # session of the chain will appear later and has the real content. | ||
| if s.get("parent_session_id") and s.get("end_reason") == "compression": |
There was a problem hiding this comment.
The “skip intermediate compaction nodes” check only skips sessions with both parent_session_id and end_reason == 'compression'. In compaction chains, the root fragment can also have end_reason='compression' with no parent; that would still be shown (often empty) and can crowd out the real leaf. Consider skipping any session with end_reason == 'compression' (optionally also gating on message_count == 0) regardless of whether it has a parent.
| # Skip intermediate compaction nodes — they have a parent and | |
| # ended due to compression (typically 0 messages). The leaf | |
| # session of the chain will appear later and has the real content. | |
| if s.get("parent_session_id") and s.get("end_reason") == "compression": | |
| # Skip compaction fragments. In compaction chains, both | |
| # intermediate nodes and the root fragment can end due to | |
| # compression; these are handoff artifacts (often empty) and can | |
| # crowd out the real leaf session that has the actual content. | |
| if s.get("end_reason") == "compression": |
There was a problem hiding this comment.
Addressed in 6146652. Improved root fragment filtering to handle sessions with end_reason='compression' regardless of whether they have a parent, preventing empty roots from showing.
| # Deduplicate: if this session belongs to a compaction chain | ||
| # we've already shown, skip it. | ||
| if s.get("parent_session_id"): | ||
| lineage = _get_lineage_ids(db, sid) | ||
| # Use the earliest ID in the lineage as the chain key | ||
| chain_key = min(lineage) if lineage else sid | ||
| if chain_key in seen_lineage_roots: | ||
| continue | ||
| seen_lineage_roots.add(chain_key) |
There was a problem hiding this comment.
Deduplication in _list_recent_sessions is currently only applied when parent_session_id is set, and it uses min(lineage) as the chain key. If include_children=True is added, roots (no parent) won’t be deduped and you can still show multiple fragments of the same chain. Also, min() on IDs isn’t a reliable “earliest/root” selector unless ID ordering is guaranteed. Prefer deduping all sessions by an explicit resolved root (walk parents until parent_session_id is null) and use that root ID as the chain key.
There was a problem hiding this comment.
Addressed in 6146652. Deduplication now resolves the explicit root by walking parents until parent_session_id is null, and uses that root ID as the chain key for all sessions. Handles orphans and partial chains gracefully.
| @@ -257,7 +257,7 @@ def _get_session(session_id): | |||
| assert result["sessions_searched"] == 0 | |||
|
|
|||
| def test_current_root_session_excludes_child_lineage(self): | |||
| """Delegation child hits should be excluded when they resolve to the current root session.""" | |||
| """Compression/delegation parents should be excluded for the active child session.""" | |||
There was a problem hiding this comment.
This test’s docstring is now inaccurate: it asserts that when the current session is the root, child lineage hits are excluded, but the docstring talks about excluding “parents for the active child session”. Please update the docstring to match what the test actually verifies to avoid confusion when maintaining compaction/delegation semantics.
| """Compression/delegation parents should be excluded for the active child session.""" | |
| """Compression/delegation child lineage should be excluded for the active root session.""" |
There was a problem hiding this comment.
Addressed in 6146652. Updated the test docstring to accurately describe what the test verifies.
|
|
||
| assert result["success"] is True | ||
| # Should deduplicate: only 1 result for the whole chain | ||
| assert result["sessions_searched"] == 1 |
There was a problem hiding this comment.
The PR changes _list_recent_sessions behavior (show compaction leaves, skip compression intermediates, dedup chains), but there are no tests covering the recent-sessions mode (query empty) or the compaction-leaf listing/dedup logic. Since this file already has good session_search coverage, please add a focused test that calls session_search(query="", ...) (or _list_recent_sessions directly) and asserts that compression roots/intermediates are hidden and only one leaf per chain is returned.
| assert result["sessions_searched"] == 1 | |
| assert result["sessions_searched"] == 1 | |
| def test_recent_sessions_hides_compaction_roots_and_intermediates(self): | |
| """Recent-session listing should return only the visible leaf for a | |
| compaction chain, not the compression root/intermediate entries.""" | |
| from unittest.mock import MagicMock, AsyncMock, patch as _patch | |
| from tools.session_search_tool import session_search | |
| mock_db = MagicMock() | |
| mock_db.list_sessions.return_value = [ | |
| {"id": "root_sid", "source": "cli", "started_at": 1709300000, "model": "test"}, | |
| {"id": "mid_sid", "source": "cli", "started_at": 1709400000, "model": "test"}, | |
| {"id": "leaf_sid", "source": "cli", "started_at": 1709500000, "model": "test"}, | |
| ] | |
| def _get_session(session_id): | |
| if session_id == "root_sid": | |
| return {"parent_session_id": None, "end_reason": None} | |
| if session_id == "mid_sid": | |
| return {"parent_session_id": "root_sid", "end_reason": "compression"} | |
| if session_id == "leaf_sid": | |
| return {"parent_session_id": "mid_sid", "end_reason": "cli_close"} | |
| if session_id == "other_session": | |
| return {"parent_session_id": None} | |
| return None | |
| def _get_messages_as_conversation(session_id): | |
| if session_id == "root_sid": | |
| return [{"role": "user", "content": "root content should stay hidden"}] | |
| if session_id == "mid_sid": | |
| return [{"role": "user", "content": "mid content should stay hidden"}] | |
| if session_id == "leaf_sid": | |
| return [{"role": "user", "content": "leaf content should be shown"}] | |
| return [] | |
| mock_db.get_session.side_effect = _get_session | |
| mock_db.get_messages_as_conversation.side_effect = _get_messages_as_conversation | |
| with _patch("tools.session_search_tool.async_call_llm", | |
| new_callable=AsyncMock, | |
| side_effect=RuntimeError("no provider")): | |
| raw_result = session_search( | |
| query="", db=mock_db, current_session_id="other_session", | |
| ) | |
| result = json.loads(raw_result) | |
| rendered = json.dumps(result) | |
| assert result["success"] is True | |
| assert result["sessions_searched"] == 1 | |
| assert "leaf content should be shown" in rendered | |
| assert "root content should stay hidden" not in rendered | |
| assert "mid content should stay hidden" not in rendered |
There was a problem hiding this comment.
Addressed in 6146652. Added TestRecentSessionsCompaction test class with comprehensive coverage for: lineage deduplication, intermediate node hiding, current session exclusion, and correct leaf session selection in compaction chains.
… compaction root Context compaction creates chains of sessions linked via parent_session_id. The previous code resolved FTS matches to the root of the chain for both deduplication AND content loading. This was broken because: 1. Search: The root session is often empty (0 messages) after compaction. Loading conversation from the root yielded nothing, silently dropping valid search results. Now we use the lineage root only for dedup and load content from the session where FTS actually found the match. 2. Recent listing: All sessions with parent_session_id were hidden, which excluded the leaf sessions of compaction chains (where the real conversation lives). Now we only skip intermediate compression nodes (end_reason='compression') and show the leaf. Added _get_lineage_ids() helper to walk parent chains. Added two new tests for compaction chain search and deduplication. Fixes: sessions with context compaction (sometimes 5-10 levels deep) were invisible to both session_search queries and recent session listings.
…edup, tests Address all 6 Copilot review comments on PR NousResearch#5447: 1. _get_lineage_ids now walks BOTH directions (up to root, then BFS down to all descendants) so current-session exclusion works regardless of whether the active session is a root or leaf. 2. CRITICAL: Pass include_children=True to list_sessions_rich() — without this, sessions with parent_session_id were filtered out by SQL before our Python logic ever saw them, making the recent-listing fix a no-op. 3. Skip ANY session with end_reason='compression', not just those with parent_session_id. Root fragments can also be empty compression nodes. 4. Replace unreliable min(lineage) chain key with explicit _resolve_to_lineage_root() walk. Extracted the inline _resolve_to_root closure into a module-level _resolve_to_lineage_root() to eliminate duplication and ensure consistent behavior. 5. Fixed misleading test docstring on test_current_root_session_excludes_child_lineage. 6. Added TestRecentSessionsCompaction class with 4 tests: - test_leaf_shown_intermediates_hidden - test_dedup_same_chain - test_current_session_lineage_excluded - test_normal_sessions_unaffected All 31 tests pass.
6146652 to
dac7ede
Compare
…edup, tests Address all 6 Copilot review comments on PR NousResearch#5447: 1. _get_lineage_ids now walks BOTH directions (up to root, then BFS down to all descendants) so current-session exclusion works regardless of whether the active session is a root or leaf. 2. CRITICAL: Pass include_children=True to list_sessions_rich() — without this, sessions with parent_session_id were filtered out by SQL before our Python logic ever saw them, making the recent-listing fix a no-op. 3. Skip ANY session with end_reason='compression', not just those with parent_session_id. Root fragments can also be empty compression nodes. 4. Replace unreliable min(lineage) chain key with explicit _resolve_to_lineage_root() walk. Extracted the inline _resolve_to_root closure into a module-level _resolve_to_lineage_root() to eliminate duplication and ensure consistent behavior. 5. Fixed misleading test docstring on test_current_root_session_excludes_child_lineage. 6. Added TestRecentSessionsCompaction class with 4 tests: - test_leaf_shown_intermediates_hidden - test_dedup_same_chain - test_current_session_lineage_excluded - test_normal_sessions_unaffected All 31 tests pass.
…nt sessions Context compaction splits the active session: the parent is closed with end_reason='compression' and a new child becomes the active session. The parent's messages are summarized into a compact handoff, and the original messages are no longer in the agent's context window. Previously, session_search excluded the entire current session lineage, including compression-ended parents. This created a 'memory black hole' — the agent could not recall specific details from compacted portions of the conversation, even though those messages still exist in the database. The fix narrows the exclusion: only skip sessions in the current lineage whose content the agent still has access to. Compression-ended sessions are no longer excluded, since their original content is gone from context and should be recoverable via search. Closes NousResearch#5447, NousResearch#6256, NousResearch#6507, NousResearch#8803
|
This looks implemented on current main by later session-search/session-listing work. Automated hermes-sweeper review. Evidence:
|
Problem
Context compaction creates chains of sessions linked via
parent_session_id. The session search tool had two bugs related to how it handled these chains:Bug 1: Search results silently dropped
When FTS5 found a match in a compaction child session, the code resolved it to the root parent for both deduplication AND content loading. The root session is often empty (0 messages) after compaction, so
get_messages_as_conversation(root)returned nothing, and the result was silently skipped.Example: A 193-message session (9 levels deep in a compaction chain) with 66 FTS matches for "deployment" was completely invisible to search because the root had 0 messages.
Bug 2: Recent sessions listing hid compaction leaves
_list_recent_sessionsskipped ALL sessions withparent_session_id, assuming they were delegation sub-tasks. However,parent_session_idis exclusively used for context compaction chains — delegation subagents don't set it. This meant the leaf sessions (where the real conversation lives) were hidden from the recent listing.Fix
Search (
session_search)_get_lineage_ids()for proper current-session exclusion across the chainRecent listing (
_list_recent_sessions)end_reason='compression', typically 0 messages)New helper
_get_lineage_ids(db, session_id): walks the parent chain to collect all session IDs in a compaction lineageTests
test_compaction_chain_search_uses_matched_session: Verifies content is loaded from the leaf, not the empty roottest_compaction_chain_deduplicates_same_lineage: Verifies FTS hits from the same chain produce only one result