Skip to content

Context compression creates orphan sessions missing from state.db #33907

@MiserableKnight

Description

@MiserableKnight

Context compression creates orphan sessions missing from state.db

Problem

When automatic context compression fires during a long WebUI session, the agent rotates its session_id to create a continuation session. Occasionally, the new session's JSON file is created on disk but no corresponding row is written to state.db. The WebUI sidebar then shows this continuation as an orphan entry with a branch icon (↺) instead of nesting it under its parent with the "N child" badge.

Expected behavior

After compression rotation, state.db should always have a record for the new session with parent_session_id pointing to the old session, so the sidebar can properly collapse the lineage.

Actual behavior

The orphan session appears as a separate entry in the sidebar. It has the same title as the parent but displays with the ↺ icon because the frontend cannot locate its parent in state.db.

Root cause

Two code paths handle compression rotation, and neither guarantees the state.db write:

1. Agent core (run_agent.py:10418-10455)

if self._session_db:
    try:
        old_title = self._session_db.get_session_title(self.session_id)
        self.commit_memory_session(messages)
        self._session_db.end_session(self.session_id, "compression")
        old_session_id = self.session_id
        self.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
        # ...
        self._session_db.create_session(
            session_id=self.session_id,
            source=self.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
            model=self.model,
            model_config=self._session_init_model_config,
            parent_session_id=old_session_id,
        )
    except Exception as e:
        logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)

The entire block is wrapped in try/except. If create_session() fails (SQLite lock contention, disk IO, WSL unclean shutdown, etc.), the agent continues with the new session_id but state.db has no record. The warning log is the only signal.

2. WebUI (streaming.py:3598-3694)

The WebUI detects the rotation by comparing agent.session_id != session_id, then updates the in-memory session object and writes the JSON file. However, it never independently writes to state.db for the new session. It relies entirely on the agent core having done so.

Reproduction conditions

  • Long conversation triggering automatic context compression (multiple rounds)
  • WSL environment with potential unclean shutdowns (most common trigger)
  • SQLite lock contention between gateway and WebUI (both access the same state.db)
  • Any transient disk IO failure during create_session()

Observed in the wild

Session 20260528_104535_fdc667 was created as a compression continuation of 9cc6e43a9bf4. The JSON file exists on disk with 114 messages, but state.db has no row for this session. Two other sibling continuations (20260528_102637_6e5377, 20260528_102818_59222f) were correctly registered.

Suggested fixes

Option A — WebUI defensive write (preferred, belt-and-suspenders)

In streaming.py, after detecting rotation and before saving, add a state.db write:

# streaming.py, inside the `_agent_sid != session_id` block (~line 3677)
s.parent_session_id = old_sid

# Ensure state.db has the new session (guard against agent-side write failure)
try:
    from api.state_sync import _get_state_db
    _db = _get_state_db()
    if _db:
        _db.ensure_session(session_id=new_sid, source='webui', model=s.model)
        _db.close()
except Exception:
    logger.debug("Failed to register compression continuation in state.db", exc_info=True)

Option B — Agent retry

In run_agent.py:10436, add retry logic around create_session() instead of silently swallowing the exception. At minimum, fail the compression (keep the old session_id) rather than continuing with an unregistered new one.

Option C — Both A and B

Option A catches the gap from the WebUI side. Option B prevents the root cause in the agent. Together they eliminate the failure mode entirely.

Environment

  • Hermes Agent WebUI on localhost:8787
  • WSL2 (Ubuntu)
  • state.db shared between gateway and WebUI processes

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderduplicateThis issue or pull request already existstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions