Skip to content

session commit recursively summarizes messages.jsonl via generic file summary #564

@dddgogogo

Description

@dddgogogo

Summary

During session commit, OpenViking recursively enqueues the session temp directory into SemanticQueue, which causes messages.jsonl (the archived session transcript) to be summarized via the generic file-summary prompt and sent to the configured VLM (ov-llm).

From a production usage perspective, this looks wasteful and low-value:

  • it re-sends large chunks of already-known chat history to the model,
  • increases token cost and latency,
  • adds load to ov-llm,
  • and does not seem to provide meaningful additional retrieval value for messages.jsonl specifically.

What I observed

The current flow appears to be:

  1. session.commit() archives current messages.
  2. It enqueues session / user / agent temp trees into SemanticQueue with recursive=True.
  3. SemanticProcessor walks files and summarizes them.
  4. messages.jsonl is treated as a generic text file and goes through semantic.file_summary.
  5. The resulting prompt is literally:
Please generate a summary for the following file:

【File Name】
messages.jsonl

【File Content】
...

At runtime this produced repeated ov-llm requests containing historical session transcript content from messages.jsonl.

Why this seems problematic

messages.jsonl is not a normal user document. It is already the canonical session transcript / archive. Summarizing it again as a generic file seems to duplicate work already covered by:

  • session structured summary / compression,
  • memory extraction,
  • and directory-level semantic generation.

So the current behavior feels like a side effect of the generic recursive semantic pipeline, rather than an intentional high-value feature for session archives.

Question

Is this behavior intentional?

If yes, what retrieval / indexing benefit is expected from generating a generic file summary for messages.jsonl after every commit?

Suggestion

Possible options:

  • skip messages.jsonl in semantic file-summary generation,
  • or special-case session archive files so they do not go through generic file summarization,
  • or make this behavior configurable.

I think this would reduce unnecessary VLM traffic and make session commit cheaper / faster in production.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions