Skip to content

fix(session): skip messages.jsonl in semantic file summary generation#617

Merged
MaojiaSheng merged 1 commit intovolcengine:mainfrom
mvanhorn:osc/564-skip-messages-jsonl-semantic-summary
Mar 15, 2026
Merged

fix(session): skip messages.jsonl in semantic file summary generation#617
MaojiaSheng merged 1 commit intovolcengine:mainfrom
mvanhorn:osc/564-skip-messages-jsonl-semantic-summary

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

Skip messages.jsonl from the semantic file summary pipeline during session commit. This file is the canonical session transcript archive and should not be re-summarized as a generic document.

Why this matters

During session.commit(), the session temp directory is enqueued into SemanticQueue with recursive=True. SemanticDagExecutor processes all files uniformly, so messages.jsonl goes through VLM-based file summary generation and vectorization - wasting tokens, adding latency, and providing no retrieval value.

  • #564 - Original report with root cause analysis
  • Collaborator @qin-ctx confirmed the issue: "Traced through the code and confirmed the issue"
  • @yash27-lab traced the code path to SemanticDagExecutor._file_summary_task() in semantic_dag.py

Changes

  • Added _SKIP_FILENAMES frozenset in semantic_dag.py containing messages.jsonl
  • Added filtering in _list_dir() to exclude these files from the DAG before they reach _file_summary_task()
  • The filter sits alongside the existing dotfile skip logic for consistency

Testing

  • Added tests/storage/test_semantic_dag_skip_files.py with 2 test cases:
    • messages.jsonl excluded at root level while other files are still processed
    • messages.jsonl excluded in nested subdirectories

Fixes #564

This contribution was developed with AI assistance (Claude Code).

During session commit, messages.jsonl (the archived session transcript)
was unnecessarily summarized by the semantic file summary pipeline,
wasting tokens and adding latency. Add messages.jsonl to a skip list in
_list_dir() so it never enters the semantic DAG.

Closes volcengine#564
@MaojiaSheng MaojiaSheng merged commit b040afc into volcengine:main Mar 15, 2026
6 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

session commit recursively summarizes messages.jsonl via generic file summary

2 participants