Skip to content

fix(compression): eliminate session duplication -- adopt in-place compaction like Claude Code and Codex #38763

@gnarzadigital

Description

@gnarzadigital

Problem

Every time Hermes compresses context, it creates a new session chained via parent_session_id. This spawns duplicate entries in the WebUI sidebar (My Chat, My Chat #2, My Chat #3, etc.) for a single conversation. Over time this accumulates hundreds of orphan child sessions in state.db.

This is not how other agentic coding tools handle it, and it causes a cascade of downstream bugs:

How Claude Code and Codex solve this

Both tools keep the same session/thread ID for the entire conversation. Compression never creates a new session.

Claude Code (Anthropic Messages API)

  • Single session_id for the entire conversation. No parent_session_id field exists in the schema.
  • When context fills up, Claude Code calls the API with context_management containing a compact_20260112 strategy.
  • The API handles it server-side: summarizes old messages into a compaction content block, returns it alongside the response.
  • Client appends the compaction block to the messages array and keeps going in the same session.
  • All messages before the latest compaction block are transparently ignored by the API on subsequent requests.
  • Evidence: 9,131 sessions in a Claude Code install, none with parent chains. Longest session has 4,033 messages in a single session_id.

Codex (OpenAI Responses API)

  • Single thread_id for the entire conversation. No parent_thread_id field (only thread_spawn_edges for actual subagent delegation).
  • When context fills up, Codex calls the API with compact_threshold set.
  • The API returns a compaction item that the client appends to the conversation.
  • Everything before the compaction item gets dropped on the next request.
  • Thread ID stays the same forever. Sessions with 500M+ tokens exist as single rows in the threads table.
  • Compaction blobs are AES-encrypted server-side so the client never even sees the summary text.

Key Insight

Both tools leverage their provider's native server-side compaction API. The provider returns a compaction artifact that the client injects into the message stream. The session/thread ID never changes.

Proposed Fix

Hermes should adopt in-place compaction for all models, not just providers with native compaction APIs.

For providers with native compaction (Anthropic, OpenAI)

Use the provider's built-in compaction API (Claude's context_management, OpenAI's compact_threshold). This is the ideal path -- the provider handles summarization and the client just manages message truncation. Zero custom summarization logic needed.

For providers without native compaction (z.ai/GLM, Google, etc.)

Replicate the same pattern in client code:

  1. Generate summary -- use the model to summarize the conversation history (similar to what Hermes already does in conversation_compression.py)
  2. Inject summary as a system message -- prepend a compaction block to the messages array
  3. Truncate old messages -- drop all messages before the summary block from the in-memory messages array
  4. Keep the same session_id -- do NOT end the session, do NOT create a new session row, do NOT set end_reason="compression", do NOT chain via parent_session_id
  5. Continue conversation -- the next user message appends after the summary block as normal

Config changes

Add a new option to control the strategy:

compression:
  enabled: true
  threshold: 0.95
  mode: inplace  # "inplace" (default, no session split) | "split" (legacy, creates new session)

This preserves backward compat if anyone relies on the split behavior, but defaults to the new in-place approach.

What changes in the code

In conversation_compression.py, the compress_context function currently:

1. Generate summary
2. end_session(end_reason="compression")
3. create_session(title=f"{title} #{n}", parent_session_id=old_id)
4. Reset flush cursors

It should instead:

1. Generate summary (existing logic, keep this)
2. Inject summary as system message at start of messages array
3. Truncate all messages before the summary
4. Update session title if desired (same session_id)
5. Continue in the same session

Lines 381-396 (the session split logic) get replaced with the in-place truncation. Everything else (summary generation, threshold calculation, protection of recent messages) stays the same.

Schema changes

  • parent_session_id column becomes unused for compression. Can be deprecated or kept for backward compat with existing session chains.
  • No new columns needed. The same messages storage in the session works -- compaction is just another set of messages in the array.

Benefits

  1. No more duplicate sessions -- one conversation = one session row in the DB, from first message to last
  2. Fixes all related bugs -- session ID desync, goal loss, preflight loops, orphan sessions, gateway desync all become impossible since the session_id never changes mid-conversation
  3. Cleaner WebUI -- sidebar shows actual conversations, not chains of numbered copies
  4. Simpler state management -- no parent_session_id lineage to track, no session chain queries needed
  5. Consistent with industry standard -- Claude Code and Codex both work this way
  6. No data loss -- compaction summaries are preserved as messages in the session, not lost when old sessions get pruned

Environment

  • Hermes v0.14.0
  • Claude Code v2.1.143 (for reference behavior)
  • Codex CLI v0.130.0 (for reference behavior)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildertype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions