Summary
When indexing, codedb writes index output into the process's current working directory — not only the indexed root and the ~/.codedb/projects/<hash>/ data dir. In the benign case this is a stray codedb.snapshot; in the wild I hit the worse variant where the full loose index (trigram.lookup, trigram.postings, word.index, pair_freq.bin) was dumped into a tracked source subdirectory — ~55 MB showing up as untracked files in git status.
Version: codedb 0.2.5817 (latest; codedb update is a no-op).
Reproduced: stray codedb.snapshot in CWD
Drive the MCP server with its cwd set to a subdirectory of a non-temp git repo, then index the repo root:
T=~/cdbrepro; rm -rf "$T"; mkdir -p "$T/src/feat/deep"; cd "$T"; git init -q
for i in $(seq 1 60); do echo "export function fn$i(){ return $i }" > "src/feat/deep/m$i.ts"; done
cd "$T/src/feat/deep" # cwd = a subdirectory, NOT the indexed root
{
printf '%s\n' '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"t","version":"1"}}}'
printf '%s\n' '{"jsonrpc":"2.0","method":"notifications/initialized"}'
printf '%s\n' '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"codedb_index","arguments":{"path":"'"$T"'"}}}'
sleep 6
} | codedb mcp >/dev/null 2>&1
ls "$T/src/feat/deep/codedb.snapshot" # <-- stray snapshot in the cwd subdir (bug)
ls "$T/codedb.snapshot" # snapshot at indexed root (expected)
A codedb.snapshot is written into src/feat/deep/ (the cwd) even though that directory is not the indexed root.
(Note: codedb refuses /tmp roots — "refusing to index temporary root" — so the repro must live under a non-temp path like ~.)
Observed in the wild: full index shards in CWD
A real repo ended up with trigram.lookup (1.6M), trigram.postings (34M), word.index (19M) inside a deeply-nested source subdirectory (e.g. src/features/alpha/AlphaPanel/). Evidence it was the whole-repo index (root = repo root, not that subdir): word.index starts with magic CDBW, and its header references sibling paths such as src/features/beta/BetaPanel/.... The central ~/.codedb/projects/<hash>/codedb.snapshot for the repo root was updated at the same second — so the snapshot persisted correctly while the loose shards leaked into the source tree.
Suspected mechanism
The binary contains could not create data dir paired with fallback_cwd (and CwdNotSupported). It looks like when the per-project data dir under ~/.codedb/projects/<hash>/ can't be created/used, codedb falls back to writing index files into the current working directory.
Impact
- Large binary index files (tens of MB) appear as untracked files in
git status, in arbitrary source folders.
- Easy to accidentally commit; noisy; confusing to diagnose (the files reappear whenever the indexer runs from that cwd).
Suggested fix
- Never write index artifacts into the process CWD. Write the portable
codedb.snapshot only to the indexed root and/or the ~/.codedb data dir.
- If the data dir can't be created, fail loudly (or fall back to a
$TMPDIR location) instead of silently writing the index into CWD.
Summary
When indexing, codedb writes index output into the process's current working directory — not only the indexed root and the
~/.codedb/projects/<hash>/data dir. In the benign case this is a straycodedb.snapshot; in the wild I hit the worse variant where the full loose index (trigram.lookup,trigram.postings,word.index,pair_freq.bin) was dumped into a tracked source subdirectory — ~55 MB showing up as untracked files ingit status.Version:
codedb 0.2.5817(latest;codedb updateis a no-op).Reproduced: stray
codedb.snapshotin CWDDrive the MCP server with its cwd set to a subdirectory of a non-temp git repo, then index the repo root:
A
codedb.snapshotis written intosrc/feat/deep/(the cwd) even though that directory is not the indexed root.(Note: codedb refuses
/tmproots — "refusing to index temporary root" — so the repro must live under a non-temp path like~.)Observed in the wild: full index shards in CWD
A real repo ended up with
trigram.lookup(1.6M),trigram.postings(34M),word.index(19M) inside a deeply-nested source subdirectory (e.g.src/features/alpha/AlphaPanel/). Evidence it was the whole-repo index (root = repo root, not that subdir):word.indexstarts with magicCDBW, and its header references sibling paths such assrc/features/beta/BetaPanel/.... The central~/.codedb/projects/<hash>/codedb.snapshotfor the repo root was updated at the same second — so the snapshot persisted correctly while the loose shards leaked into the source tree.Suspected mechanism
The binary contains
could not create data dirpaired withfallback_cwd(andCwdNotSupported). It looks like when the per-project data dir under~/.codedb/projects/<hash>/can't be created/used, codedb falls back to writing index files into the current working directory.Impact
git status, in arbitrary source folders.Suggested fix
codedb.snapshotonly to the indexed root and/or the~/.codedbdata dir.$TMPDIRlocation) instead of silently writing the index into CWD.