Skip to content

fix(session_search): load content from FTS-matched session, not empty compaction root#5447

Closed
AlsayedHoota wants to merge 2 commits into
NousResearch:mainfrom
AlsayedHoota:fix/session-search-compaction-chains
Closed

fix(session_search): load content from FTS-matched session, not empty compaction root#5447
AlsayedHoota wants to merge 2 commits into
NousResearch:mainfrom
AlsayedHoota:fix/session-search-compaction-chains

Conversation

@AlsayedHoota

@AlsayedHoota AlsayedHoota commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Problem

Context compaction creates chains of sessions linked via parent_session_id. The session search tool had two bugs related to how it handled these chains:

Bug 1: Search results silently dropped

When FTS5 found a match in a compaction child session, the code resolved it to the root parent for both deduplication AND content loading. The root session is often empty (0 messages) after compaction, so get_messages_as_conversation(root) returned nothing, and the result was silently skipped.

Example: A 193-message session (9 levels deep in a compaction chain) with 66 FTS matches for "deployment" was completely invisible to search because the root had 0 messages.

Bug 2: Recent sessions listing hid compaction leaves

_list_recent_sessions skipped ALL sessions with parent_session_id, assuming they were delegation sub-tasks. However, parent_session_id is exclusively used for context compaction chains — delegation subagents don't set it. This meant the leaf sessions (where the real conversation lives) were hidden from the recent listing.

Fix

Search (session_search)

  • Use lineage root only for deduplication (so the same chain doesn't appear multiple times)
  • Load conversation content from the FTS-matched session (where messages actually live)
  • Use _get_lineage_ids() for proper current-session exclusion across the chain

Recent listing (_list_recent_sessions)

  • Only skip intermediate compaction nodes (end_reason='compression', typically 0 messages)
  • Show the leaf session of each chain (it has the real conversation)
  • Deduplicate chains so the same conversation doesn't appear multiple times

New helper

  • _get_lineage_ids(db, session_id): walks the parent chain to collect all session IDs in a compaction lineage

Tests

  • All 25 existing tests pass unchanged
  • Added 2 new tests:
    • test_compaction_chain_search_uses_matched_session: Verifies content is loaded from the leaf, not the empty root
    • test_compaction_chain_deduplicates_same_lineage: Verifies FTS hits from the same chain produce only one result

Copilot AI review requested due to automatic review settings April 6, 2026 07:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes how session_search and the “recent sessions” mode handle context-compaction chains linked via parent_session_id, ensuring search results load content from the matched (non-empty) fragment while deduplicating by lineage.

Changes:

  • Add a lineage helper and update current-session exclusion logic to operate across compaction chains.
  • Update session_search to deduplicate by lineage root while loading conversation content from the FTS-matched session.
  • Add new tests covering compaction-chain search behavior and deduplication.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
tools/session_search_tool.py Updates search + recent-session listing logic to be compaction-chain aware; adds lineage helper.
tests/tools/test_session_search.py Adjusts an existing test docstring and adds tests for compaction-chain search content loading and deduplication.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/session_search_tool.py Outdated
Comment on lines +196 to +209
"""Return the set of all session IDs in a compaction chain (parents + children)."""
ids = set()
# Walk up to root
sid = session_id
visited = set()
while sid and sid not in visited:
visited.add(sid)
ids.add(sid)
try:
s = db.get_session(sid)
parent = s.get("parent_session_id") if s else None
sid = parent if parent else None
except Exception:
break

Copilot AI Apr 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_lineage_ids docstring claims it returns parents + children, but the implementation only walks up the parent chain and never discovers descendants. This is misleading and also means callers (e.g., recent-session exclusion) won't exclude child fragments when current_session_id is a root. Either update the docstring/name to reflect “ancestors”, or extend the helper to also collect descendants (e.g., by scanning sessions with include_children=True and selecting those whose resolved root matches).

Suggested change
"""Return the set of all session IDs in a compaction chain (parents + children)."""
ids = set()
# Walk up to root
sid = session_id
visited = set()
while sid and sid not in visited:
visited.add(sid)
ids.add(sid)
try:
s = db.get_session(sid)
parent = s.get("parent_session_id") if s else None
sid = parent if parent else None
except Exception:
break
"""Return all session IDs in the same compaction chain as `session_id`."""
def _resolve_root_and_chain(start_sid: str) -> tuple:
chain_ids = set()
sid = start_sid
visited = set()
last_sid = start_sid
while sid and sid not in visited:
visited.add(sid)
chain_ids.add(sid)
last_sid = sid
try:
session = db.get_session(sid)
parent = session.get("parent_session_id") if session else None
sid = parent if parent else None
except Exception:
break
return last_sid, chain_ids
if not session_id:
return set()
root_id, ids = _resolve_root_and_chain(session_id)
# Also collect descendants by scanning sessions and selecting those that
# resolve to the same root. Keep ancestor results even if enumeration fails.
try:
try:
sessions = db.list_sessions_rich(
limit=10000,
exclude_sources=list(_HIDDEN_SESSION_SOURCES),
include_children=True,
)
except TypeError:
sessions = db.list_sessions_rich(
limit=10000,
exclude_sources=list(_HIDDEN_SESSION_SOURCES),
)
root_cache = {session_id: root_id}
for session in sessions:
candidate_id = session.get("id")
if not candidate_id or candidate_id in ids:
continue
candidate_root = root_cache.get(candidate_id)
if candidate_root is None:
candidate_root, candidate_chain = _resolve_root_and_chain(candidate_id)
root_cache[candidate_id] = candidate_root
if candidate_root == root_id:
ids.update(candidate_chain)
ids.add(candidate_id)
elif candidate_root == root_id:
ids.add(candidate_id)
except Exception:
pass

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6146652. Rewrote the lineage helper to properly walk up the parent chain AND perform BFS downward from the root to collect all fragments in a compaction chain. Docstring updated to match.

Comment thread tools/session_search_tool.py Outdated
# typically 0 messages) are just handoff points — the real conversation
# lives in the leaf. We skip intermediates but show the leaf even
# though it has a parent_session_id.
sessions = db.list_sessions_rich(limit=limit + 20, exclude_sources=list(_HIDDEN_SESSION_SOURCES))

Copilot AI Apr 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.list_sessions_rich() excludes child sessions by default (include_children=False in hermes_state.SessionDB). Since this function is intended to surface compaction leaves (which have parent_session_id), it needs to call list_sessions_rich(..., include_children=True); otherwise none of the new compaction-leaf logic will ever run and bug #2 remains unfixed.

Suggested change
sessions = db.list_sessions_rich(limit=limit + 20, exclude_sources=list(_HIDDEN_SESSION_SOURCES))
sessions = db.list_sessions_rich(
limit=limit + 20,
exclude_sources=list(_HIDDEN_SESSION_SOURCES),
include_children=True,
)

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6146652. Recent session listing now fetches children for correct leaf inclusion and chain deduplication.

Comment thread tools/session_search_tool.py Outdated
Comment on lines +241 to +244
# Skip intermediate compaction nodes — they have a parent and
# ended due to compression (typically 0 messages). The leaf
# session of the chain will appear later and has the real content.
if s.get("parent_session_id") and s.get("end_reason") == "compression":

Copilot AI Apr 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “skip intermediate compaction nodes” check only skips sessions with both parent_session_id and end_reason == 'compression'. In compaction chains, the root fragment can also have end_reason='compression' with no parent; that would still be shown (often empty) and can crowd out the real leaf. Consider skipping any session with end_reason == 'compression' (optionally also gating on message_count == 0) regardless of whether it has a parent.

Suggested change
# Skip intermediate compaction nodes — they have a parent and
# ended due to compression (typically 0 messages). The leaf
# session of the chain will appear later and has the real content.
if s.get("parent_session_id") and s.get("end_reason") == "compression":
# Skip compaction fragments. In compaction chains, both
# intermediate nodes and the root fragment can end due to
# compression; these are handoff artifacts (often empty) and can
# crowd out the real leaf session that has the actual content.
if s.get("end_reason") == "compression":

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6146652. Improved root fragment filtering to handle sessions with end_reason='compression' regardless of whether they have a parent, preventing empty roots from showing.

Comment thread tools/session_search_tool.py Outdated
Comment on lines +246 to +254
# Deduplicate: if this session belongs to a compaction chain
# we've already shown, skip it.
if s.get("parent_session_id"):
lineage = _get_lineage_ids(db, sid)
# Use the earliest ID in the lineage as the chain key
chain_key = min(lineage) if lineage else sid
if chain_key in seen_lineage_roots:
continue
seen_lineage_roots.add(chain_key)

Copilot AI Apr 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deduplication in _list_recent_sessions is currently only applied when parent_session_id is set, and it uses min(lineage) as the chain key. If include_children=True is added, roots (no parent) won’t be deduped and you can still show multiple fragments of the same chain. Also, min() on IDs isn’t a reliable “earliest/root” selector unless ID ordering is guaranteed. Prefer deduping all sessions by an explicit resolved root (walk parents until parent_session_id is null) and use that root ID as the chain key.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6146652. Deduplication now resolves the explicit root by walking parents until parent_session_id is null, and uses that root ID as the chain key for all sessions. Handles orphans and partial chains gracefully.

Comment thread tests/tools/test_session_search.py Outdated
@@ -257,7 +257,7 @@ def _get_session(session_id):
assert result["sessions_searched"] == 0

def test_current_root_session_excludes_child_lineage(self):
"""Delegation child hits should be excluded when they resolve to the current root session."""
"""Compression/delegation parents should be excluded for the active child session."""

Copilot AI Apr 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test’s docstring is now inaccurate: it asserts that when the current session is the root, child lineage hits are excluded, but the docstring talks about excluding “parents for the active child session”. Please update the docstring to match what the test actually verifies to avoid confusion when maintaining compaction/delegation semantics.

Suggested change
"""Compression/delegation parents should be excluded for the active child session."""
"""Compression/delegation child lineage should be excluded for the active root session."""

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6146652. Updated the test docstring to accurately describe what the test verifies.


assert result["success"] is True
# Should deduplicate: only 1 result for the whole chain
assert result["sessions_searched"] == 1

Copilot AI Apr 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR changes _list_recent_sessions behavior (show compaction leaves, skip compression intermediates, dedup chains), but there are no tests covering the recent-sessions mode (query empty) or the compaction-leaf listing/dedup logic. Since this file already has good session_search coverage, please add a focused test that calls session_search(query="", ...) (or _list_recent_sessions directly) and asserts that compression roots/intermediates are hidden and only one leaf per chain is returned.

Suggested change
assert result["sessions_searched"] == 1
assert result["sessions_searched"] == 1
def test_recent_sessions_hides_compaction_roots_and_intermediates(self):
"""Recent-session listing should return only the visible leaf for a
compaction chain, not the compression root/intermediate entries."""
from unittest.mock import MagicMock, AsyncMock, patch as _patch
from tools.session_search_tool import session_search
mock_db = MagicMock()
mock_db.list_sessions.return_value = [
{"id": "root_sid", "source": "cli", "started_at": 1709300000, "model": "test"},
{"id": "mid_sid", "source": "cli", "started_at": 1709400000, "model": "test"},
{"id": "leaf_sid", "source": "cli", "started_at": 1709500000, "model": "test"},
]
def _get_session(session_id):
if session_id == "root_sid":
return {"parent_session_id": None, "end_reason": None}
if session_id == "mid_sid":
return {"parent_session_id": "root_sid", "end_reason": "compression"}
if session_id == "leaf_sid":
return {"parent_session_id": "mid_sid", "end_reason": "cli_close"}
if session_id == "other_session":
return {"parent_session_id": None}
return None
def _get_messages_as_conversation(session_id):
if session_id == "root_sid":
return [{"role": "user", "content": "root content should stay hidden"}]
if session_id == "mid_sid":
return [{"role": "user", "content": "mid content should stay hidden"}]
if session_id == "leaf_sid":
return [{"role": "user", "content": "leaf content should be shown"}]
return []
mock_db.get_session.side_effect = _get_session
mock_db.get_messages_as_conversation.side_effect = _get_messages_as_conversation
with _patch("tools.session_search_tool.async_call_llm",
new_callable=AsyncMock,
side_effect=RuntimeError("no provider")):
raw_result = session_search(
query="", db=mock_db, current_session_id="other_session",
)
result = json.loads(raw_result)
rendered = json.dumps(result)
assert result["success"] is True
assert result["sessions_searched"] == 1
assert "leaf content should be shown" in rendered
assert "root content should stay hidden" not in rendered
assert "mid content should stay hidden" not in rendered

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6146652. Added TestRecentSessionsCompaction test class with comprehensive coverage for: lineage deduplication, intermediate node hiding, current session exclusion, and correct leaf session selection in compaction chains.

… compaction root

Context compaction creates chains of sessions linked via parent_session_id.
The previous code resolved FTS matches to the root of the chain for both
deduplication AND content loading. This was broken because:

1. Search: The root session is often empty (0 messages) after compaction.
   Loading conversation from the root yielded nothing, silently dropping
   valid search results. Now we use the lineage root only for dedup and
   load content from the session where FTS actually found the match.

2. Recent listing: All sessions with parent_session_id were hidden, which
   excluded the leaf sessions of compaction chains (where the real
   conversation lives). Now we only skip intermediate compression nodes
   (end_reason='compression') and show the leaf.

Added _get_lineage_ids() helper to walk parent chains. Added two new tests
for compaction chain search and deduplication.

Fixes: sessions with context compaction (sometimes 5-10 levels deep)
were invisible to both session_search queries and recent session listings.
…edup, tests

Address all 6 Copilot review comments on PR NousResearch#5447:

1. _get_lineage_ids now walks BOTH directions (up to root, then BFS down
   to all descendants) so current-session exclusion works regardless of
   whether the active session is a root or leaf.

2. CRITICAL: Pass include_children=True to list_sessions_rich() — without
   this, sessions with parent_session_id were filtered out by SQL before
   our Python logic ever saw them, making the recent-listing fix a no-op.

3. Skip ANY session with end_reason='compression', not just those with
   parent_session_id. Root fragments can also be empty compression nodes.

4. Replace unreliable min(lineage) chain key with explicit
   _resolve_to_lineage_root() walk. Extracted the inline _resolve_to_root
   closure into a module-level _resolve_to_lineage_root() to eliminate
   duplication and ensure consistent behavior.

5. Fixed misleading test docstring on
   test_current_root_session_excludes_child_lineage.

6. Added TestRecentSessionsCompaction class with 4 tests:
   - test_leaf_shown_intermediates_hidden
   - test_dedup_same_chain
   - test_current_session_lineage_excluded
   - test_normal_sessions_unaffected

All 31 tests pass.
@AlsayedHoota AlsayedHoota force-pushed the fix/session-search-compaction-chains branch from 6146652 to dac7ede Compare April 8, 2026 16:01
AlsayedHoota added a commit to AlsayedHoota/hermes-agent that referenced this pull request Apr 11, 2026
…edup, tests

Address all 6 Copilot review comments on PR NousResearch#5447:

1. _get_lineage_ids now walks BOTH directions (up to root, then BFS down
   to all descendants) so current-session exclusion works regardless of
   whether the active session is a root or leaf.

2. CRITICAL: Pass include_children=True to list_sessions_rich() — without
   this, sessions with parent_session_id were filtered out by SQL before
   our Python logic ever saw them, making the recent-listing fix a no-op.

3. Skip ANY session with end_reason='compression', not just those with
   parent_session_id. Root fragments can also be empty compression nodes.

4. Replace unreliable min(lineage) chain key with explicit
   _resolve_to_lineage_root() walk. Extracted the inline _resolve_to_root
   closure into a module-level _resolve_to_lineage_root() to eliminate
   duplication and ensure consistent behavior.

5. Fixed misleading test docstring on
   test_current_root_session_excludes_child_lineage.

6. Added TestRecentSessionsCompaction class with 4 tests:
   - test_leaf_shown_intermediates_hidden
   - test_dedup_same_chain
   - test_current_session_lineage_excluded
   - test_normal_sessions_unaffected

All 31 tests pass.
iamagenius00 added a commit to iamagenius00/hermes-agent that referenced this pull request Apr 21, 2026
…nt sessions

Context compaction splits the active session: the parent is closed with
end_reason='compression' and a new child becomes the active session. The
parent's messages are summarized into a compact handoff, and the original
messages are no longer in the agent's context window.

Previously, session_search excluded the entire current session lineage,
including compression-ended parents. This created a 'memory black hole' —
the agent could not recall specific details from compacted portions of the
conversation, even though those messages still exist in the database.

The fix narrows the exclusion: only skip sessions in the current lineage
whose content the agent still has access to. Compression-ended sessions
are no longer excluded, since their original content is gone from context
and should be recoverable via search.

Closes NousResearch#5447, NousResearch#6256, NousResearch#6507, NousResearch#8803
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 1, 2026
@teknium1

Copy link
Copy Markdown
Contributor

This looks implemented on current main by later session-search/session-listing work. Automated hermes-sweeper review.

Evidence:

  • tools/session_search_tool.py:429 deduplicates FTS hits by lineage root while preserving the raw matched session row.
  • tools/session_search_tool.py:450 / tools/session_search_tool.py:453 load the anchored window from hit_sid, i.e. the FTS-matched session, not the lineage root.
  • tools/session_search_tool.py:463 returns session_id as that matched session and only exposes parent_session_id separately when the lineage root differs.
  • hermes_state.py:1864 and hermes_state.py:2047 implement default compression-tip projection in list_sessions_rich, so recent-session listing surfaces the live continuation tip instead of hiding compressed child rows.
  • The search-side implementation is in commit abf1af540193c30047ff3e7e759c330faf3a880f, and the full behavior is contained by release v2026.5.28.

@teknium1 teknium1 closed this Jun 10, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists sweeper:implemented-on-main Sweeper: behavior already present on current main type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants