feat: implement SearchTextCache and SearchTextExtractor for efficient…#53
feat: implement SearchTextCache and SearchTextExtractor for efficient…#53
Conversation
… text extraction and caching - Added SearchTextCache for LRU caching of extracted search text with mtime invalidation. - Introduced SearchTextExtractor for lightweight extraction of searchable text from session messages. - Updated SessionSearcher to utilize the new extractor and cache for improved search performance. - Added tests for SearchTextCache and SearchTextExtractor to ensure functionality and correctness.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75dfcf2d50
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!entry) return undefined; | ||
|
|
||
| // Stale — file was modified since we cached it | ||
| if (entry.mtimeMs !== mtimeMs) { |
There was a problem hiding this comment.
Use stronger cache invalidation than exact mtime equality
Relying on entry.mtimeMs !== mtimeMs as the sole staleness check can serve stale search data in SSH mode: SshFileSystemProvider converts SFTP mtimes to millisecond values via mtime * 1000 (second precision), so multiple updates within one second keep the same mtimeMs. In that case, a session file can change after being cached, but get() still treats the cache as fresh and search misses newly added text until a later write changes the second-level mtime.
Useful? React with 👍 / 👎.
Two performance regressions introduced in recent PRs: 1. readAgentConfigs blocked transcript rendering (PR matt1398#50) The agent config IPC call was awaited on the critical path of fetchSessionDetail, preventing any transcript data from rendering until the filesystem read completed. On macOS this was especially noticeable due to security checks on first directory access. Fixed by making the call fire-and-forget: the transcript renders immediately and subagent color badges update asynchronously. Also set the project cache key optimistically before the async call to prevent duplicate in-flight requests on rapid navigation. 2. SessionSearcher stat()-called every session file on each search (PR matt1398#53) LocalFileSystemProvider.readdir() did not populate the optional mtimeMs field on FsDirent entries. The new SearchTextCache-based SessionSearcher fell back to an individual fsProvider.stat() call per session file when mtimeMs was missing, adding N extra filesystem round-trips on every search in local mode. Fixed by statting all entries concurrently inside readdir(), so mtimeMs is always populated and the stat fallback is never triggered. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… text extraction and caching
Closes #49