Summary
src/indexer.ts:294-298 checks whether any rows already exist for a given archive_path and skips the entire file if so. There is no comparison against mtime or MAX(line_end). Any .jsonl transcript that gets appended to after its first indexing pass — which is the normal case for resumed sessions, or any long session where a sibling Claude Code session triggered a SessionStart sync mid-way — has its tail permanently excluded from the index.
Sync still copies the updated archive correctly (mtime check works); only the indexer is at fault.
Evidence (real index, plugin v1.0.15)
| Metric |
Value |
| Total archived conversations |
2,361 |
| Files with fresh index |
1,173 (49.7%) |
| Files with unindexed tail content |
1,188 (50.3%) |
| Total lines on disk |
394,291 |
| Total lines indexed |
386,526 |
| Total lines unindexed |
7,765 (2.0%) |
Distribution of per-file unindexed-line delta (stale files only):
| Stat |
Lines |
| median |
5 |
| p95 |
12 |
| max |
1,308 |
| files with delta ≥ 50 |
5 |
| files with delta ≥ 100 |
3 |
| files with delta ≥ 1000 |
1 |
So the typical case is "last few turns of a session never made it in" (consistent with a SessionStart background sync racing the still-running session that produced it). The long tail is the painful case — large multi-hour sessions where most of the conversation is silently missing from semantic search.
Detection query
For anyone wanting to check their own index:
import sqlite3, os
con = sqlite3.connect(os.path.expanduser("~/.config/superpowers/conversation-index/db.sqlite"))
rows = con.execute("SELECT archive_path, MAX(line_end) FROM exchanges GROUP BY archive_path").fetchall()
for path, last in rows:
if not os.path.exists(path): continue
with open(path, 'rb') as f: n = sum(1 for _ in f)
if n > (last or 0):
print(n - last, path)
Root cause (current behavior)
// src/indexer.ts:294
const alreadyIndexed = db.prepare(
'SELECT COUNT(*) as count FROM exchanges WHERE archive_path = ?'
).get(archivePath);
if (alreadyIndexed.count > 0) continue;
Proposed fix
The schema already stores line_start and line_end per exchange, so the data model supports incremental indexing without a migration. Replace the boolean skip with a high-water-mark check:
SELECT COALESCE(MAX(line_end), 0) FROM exchanges WHERE archive_path = ?
- Parse the file and only index exchanges whose
line_start > maxIndexedLine
- Embed/insert only the tail; existing rows untouched
The unused last_indexed column hints this was the original intent.
Workaround until fixed
Delete affected rows so the next sync re-indexes from scratch (re-embeds the whole file — non-trivial cost on a large archive):
DELETE FROM exchanges WHERE archive_path IN (...stale paths...);
Happy to open a PR if you'd like — the change is small and well-scoped.
Summary
src/indexer.ts:294-298checks whether any rows already exist for a givenarchive_pathand skips the entire file if so. There is no comparison against mtime orMAX(line_end). Any.jsonltranscript that gets appended to after its first indexing pass — which is the normal case for resumed sessions, or any long session where a sibling Claude Code session triggered aSessionStartsync mid-way — has its tail permanently excluded from the index.Sync still copies the updated archive correctly (mtime check works); only the indexer is at fault.
Evidence (real index, plugin v1.0.15)
Distribution of per-file unindexed-line delta (stale files only):
So the typical case is "last few turns of a session never made it in" (consistent with a
SessionStartbackground sync racing the still-running session that produced it). The long tail is the painful case — large multi-hour sessions where most of the conversation is silently missing from semantic search.Detection query
For anyone wanting to check their own index:
Root cause (current behavior)
Proposed fix
The schema already stores
line_startandline_endper exchange, so the data model supports incremental indexing without a migration. Replace the boolean skip with a high-water-mark check:SELECT COALESCE(MAX(line_end), 0) FROM exchanges WHERE archive_path = ?line_start > maxIndexedLineThe unused
last_indexedcolumn hints this was the original intent.Workaround until fixed
Delete affected rows so the next sync re-indexes from scratch (re-embeds the whole file — non-trivial cost on a large archive):
Happy to open a PR if you'd like — the change is small and well-scoped.