Skip to content

perf: speed up session search with lightweight text extraction and caching #49

@matt1398

Description

@matt1398

Problem

Session search (and especially global search across projects) is slow because every search re-parses and
re-processes each JSONL file from scratch.

Per session file searched, the current code:

  1. Full file I/O — streams entire JSONL
  2. Full JSON parsing — parses every line into ParsedMessage
  3. Full chunk building — buildChunks() including message classification, semantic step extraction, tool
    execution tracking, subagent linking
  4. Then finally performs the text match

For a project with 100 sessions, that's 100 full parse+build cycles per search. Global search multiplies this
across all projects.

Root Cause

SessionSearcher.searchSessionFile() calls parseJsonlFile() + buildChunks() on every search. The
DataCache exists but is not used for search.

Search only needs user message text and AI output text — it doesn't need full chunk building.

Proposed Fix

  1. Lightweight text extraction — skip buildChunks() entirely for search. Extract only searchable text
    (user message strings + assistant text content blocks) directly from parsed messages
  2. Search cache — cache extracted searchable text per session, invalidate via mtime check. Extend existing
    DataCache or add a dedicated search cache
  3. (Stretch) ripgrep for initial filtering — use rg to search JSONL files as raw text first, then only
    parse matching files for result metadata

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions