fix(mcp): prevent duplicate projects from raw path IDs#138
Conversation
Root cause: when Claude Code passes a raw filesystem path (e.g. /home/wayne/git/dingo) as project_id, resolveProjectID couldn't find it by name or hash ID, so it was used as-is. EnsureProject then created a second project with the path as its ID, duplicating the hash-ID project already registered by the orchestrator. Fixes: - resolveProjectID now checks project paths as a final fallback before returning unknown input unchanged - EnsureProject checks for existing projects with the same path before inserting, preventing duplicates at the DB level - Updated hook test to handle both fallback and matched-project output paths
📝 WalkthroughWalkthroughTest assertions for session-start hook output were relaxed; project ID resolution now matches absolute filesystem paths against stored project paths; project deduplication in the memory store was changed from a transactional merge call to inline, mutex-held reassignments and deletions with path ownership checks. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/memory/store.go (1)
201-214: Silent error suppression in inline merge may hide data integrity issues.The inline merge logic ignores all errors from UPDATE and DELETE statements:
_, _ = s.db.ExecContext(ctx, stmt, id, dupID)While this avoids mutex deadlock (the reason for not calling
mergeProjectLocked), silent failures could leave orphaned records or partial merges without any indication. Consider at minimum logging errors:♻️ Suggested improvement: Log merge errors without failing
for _, stmt := range []string{ `UPDATE memories SET project_id = ? WHERE project_id = ?`, `UPDATE conversations SET project_id = ? WHERE project_id = ?`, `UPDATE tasks SET project_id = ? WHERE project_id = ?`, `UPDATE decisions SET project_id = ? WHERE project_id = ?`, `UPDATE token_usage SET project_id = ? WHERE project_id = ?`, `UPDATE audit_log SET project_id = ? WHERE project_id = ?`, } { - _, _ = s.db.ExecContext(ctx, stmt, id, dupID) + if _, err := s.db.ExecContext(ctx, stmt, id, dupID); err != nil { + s.logger.Warn("inline merge update failed", "stmt", stmt, "error", err) + } } - _, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, dupID) - _, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, dupID) + if _, err := s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, dupID); err != nil { + s.logger.Warn("inline merge ghost_state delete failed", "error", err) + } + if _, err := s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, dupID); err != nil { + s.logger.Warn("inline merge project delete failed", "error", err) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/memory/store.go` around lines 201 - 214, The inline merge loop in mergeProject (using s.db.ExecContext in the for-loop and subsequent DELETEs) currently swallows all errors which can hide failed UPDATE/DELETE operations; update the loop to capture errors from s.db.ExecContext (and the DELETE calls), and for each non-nil error call s.logger.Warn or s.logger.Error with contextual fields (e.g., stmt, dupID, id, error) so failures are visible while still avoiding the mutex deadlock that prevented calling mergeProjectLocked; keep behavior non-failing (don’t return the error) but ensure every ExecContext call logs any returned error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/memory/store.go`:
- Around line 201-214: The inline project-merge block that runs multiple
independent UPDATE/DELETE ExecContext calls while holding s.mu is not atomic and
can leave partial updates if one statement fails; add a concise comment above
this block (reference the inline loop and the surrounding code that holds s.mu)
explaining that this is intentionally non-transactional to avoid a mutex+BeginTx
deadlock, that mergeProjectLocked provides the transactional all-or-nothing
alternative, and note the limitation for future maintainers so they understand
the trade-off and can consider refactoring later.
---
Nitpick comments:
In `@internal/memory/store.go`:
- Around line 201-214: The inline merge loop in mergeProject (using
s.db.ExecContext in the for-loop and subsequent DELETEs) currently swallows all
errors which can hide failed UPDATE/DELETE operations; update the loop to
capture errors from s.db.ExecContext (and the DELETE calls), and for each
non-nil error call s.logger.Warn or s.logger.Error with contextual fields (e.g.,
stmt, dupID, id, error) so failures are visible while still avoiding the mutex
deadlock that prevented calling mergeProjectLocked; keep behavior non-failing
(don’t return the error) but ensure every ExecContext call logs any returned
error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9c0d7d81-213f-49c5-b010-6d9e3dc96816
📒 Files selected for processing (3)
internal/mcpinit/init_test.gointernal/mcpserver/mcpserver.gointernal/memory/store.go
| // Use inline merge to avoid deadlock (we already hold s.mu). | ||
| for _, stmt := range []string{ | ||
| `UPDATE memories SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE conversations SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE tasks SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE decisions SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE token_usage SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE audit_log SET project_id = ? WHERE project_id = ?`, | ||
| } { | ||
| _, _ = s.db.ExecContext(ctx, stmt, id, dupID) | ||
| } | ||
| _, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, dupID) | ||
| _, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, dupID) | ||
| s.logger.Info("auto-merged duplicate project", "old_id", dupID, "new_id", id) |
There was a problem hiding this comment.
Inline merge lacks atomicity compared to mergeProjectLocked.
The transactional mergeProjectLocked (lines 229-271) ensures all-or-nothing semantics for project merges. The inline version executes statements independently, meaning a failure partway through could leave the database in an inconsistent state (e.g., some tables updated, others not).
This is an acceptable trade-off to avoid the mutex + BeginTx deadlock, but consider documenting this limitation with a comment for future maintainers.
📝 Suggested documentation
if scanErr == nil && dupID != "" {
- // Use inline merge to avoid deadlock (we already hold s.mu).
+ // Use inline merge to avoid deadlock (we already hold s.mu).
+ // Note: Unlike mergeProjectLocked, this is NOT atomic — partial
+ // failures may leave orphaned records. Acceptable here since this
+ // is a best-effort cleanup of MCP-created duplicates.
for _, stmt := range []string{📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Use inline merge to avoid deadlock (we already hold s.mu). | |
| for _, stmt := range []string{ | |
| `UPDATE memories SET project_id = ? WHERE project_id = ?`, | |
| `UPDATE conversations SET project_id = ? WHERE project_id = ?`, | |
| `UPDATE tasks SET project_id = ? WHERE project_id = ?`, | |
| `UPDATE decisions SET project_id = ? WHERE project_id = ?`, | |
| `UPDATE token_usage SET project_id = ? WHERE project_id = ?`, | |
| `UPDATE audit_log SET project_id = ? WHERE project_id = ?`, | |
| } { | |
| _, _ = s.db.ExecContext(ctx, stmt, id, dupID) | |
| } | |
| _, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, dupID) | |
| _, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, dupID) | |
| s.logger.Info("auto-merged duplicate project", "old_id", dupID, "new_id", id) | |
| // Use inline merge to avoid deadlock (we already hold s.mu). | |
| // Note: Unlike mergeProjectLocked, this is NOT atomic — partial | |
| // failures may leave orphaned records. Acceptable here since this | |
| // is a best-effort cleanup of MCP-created duplicates. | |
| for _, stmt := range []string{ | |
| `UPDATE memories SET project_id = ? WHERE project_id = ?`, | |
| `UPDATE conversations SET project_id = ? WHERE project_id = ?`, | |
| `UPDATE tasks SET project_id = ? WHERE project_id = ?`, | |
| `UPDATE decisions SET project_id = ? WHERE project_id = ?`, | |
| `UPDATE token_usage SET project_id = ? WHERE project_id = ?`, | |
| `UPDATE audit_log SET project_id = ? WHERE project_id = ?`, | |
| } { | |
| _, _ = s.db.ExecContext(ctx, stmt, id, dupID) | |
| } | |
| _, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, dupID) | |
| _, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, dupID) | |
| s.logger.Info("auto-merged duplicate project", "old_id", dupID, "new_id", id) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/memory/store.go` around lines 201 - 214, The inline project-merge
block that runs multiple independent UPDATE/DELETE ExecContext calls while
holding s.mu is not atomic and can leave partial updates if one statement fails;
add a concise comment above this block (reference the inline loop and the
surrounding code that holds s.mu) explaining that this is intentionally
non-transactional to avoid a mutex+BeginTx deadlock, that mergeProjectLocked
provides the transactional all-or-nothing alternative, and note the limitation
for future maintainers so they understand the trade-off and can consider
refactoring later.
When EnsureProject detects a same-path duplicate, merge orphaned child records (memories, tasks, etc.) into the canonical project instead of just logging and returning. This self-heals existing duplicates on next session start without needing a CLI command.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/memory/store.go`:
- Around line 170-184: The inline merge in EnsureProject discards errors from
s.db.ExecContext for multiple UPDATE/DELETE statements (project_id updates,
ghost_state delete, projects delete), then logs success and returns nil; change
these calls to capture and check the error return and propagate the first
non-nil error back to the caller instead of ignoring it. Specifically, in the
block that iterates over the statements (and the subsequent ExecContext calls
for `DELETE FROM ghost_state` and `DELETE FROM projects`) replace the ignored
returns with capturing the error (e.g., _, err := s.db.ExecContext(...)) and if
err != nil return err so EnsureProject halts on failure and surfaces the DB
error for IDs existingID, id, and path; apply the same pattern to the second
inline merge block cited around lines 212-225.
- Around line 160-184: The current duplicate-merging branch (inside the
filepath.IsAbs(path) && path != id check after
s.db.QueryRowContext(...).Scan(&existingID)) incorrectly treats the raw-path row
(existingID == path) as canonical and returns before the later upsert of the
incoming hash-backed project; instead, detect when existingID == path and in
that case first perform the upsert/create for the incoming id (so the
hash-backed project exists), then merge the raw-path project into the incoming
id by running the same UPDATE statements (memories, conversations, tasks,
decisions, token_usage, audit_log) but with parameters swapped to set project_id
= incoming id WHERE project_id = existingID, then delete ghost_state and the
raw-path projects row (existingID), and log the auto-merge with old_id =
existingID and canonical_id = id; otherwise (existingID != path) keep the
current behavior of merging incoming id into existingID. Ensure you reference
the same DB helpers (s.db.ExecContext, s.db.QueryRowContext), variables
(existingID, id, path) and the log call (s.logger.Info) when implementing the
conditional swap.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| // Check if another project already owns this path. If so, merge | ||
| // any orphaned child records into the canonical project and skip | ||
| // creating a duplicate. This self-heals duplicates caused by MCP | ||
| // passing raw filesystem paths as project IDs. | ||
| if filepath.IsAbs(path) && path != id { | ||
| var existingID string | ||
| scanErr := s.db.QueryRowContext(ctx, | ||
| `SELECT id FROM projects WHERE path = ? AND id != ? LIMIT 1`, | ||
| path, id).Scan(&existingID) | ||
| if scanErr == nil && existingID != "" { | ||
| // Merge any child records from the incoming ID into the canonical project. | ||
| for _, stmt := range []string{ | ||
| `UPDATE memories SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE conversations SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE tasks SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE decisions SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE token_usage SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE audit_log SET project_id = ? WHERE project_id = ?`, | ||
| } { | ||
| _, _ = s.db.ExecContext(ctx, stmt, existingID, id) | ||
| } | ||
| _, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, id) | ||
| _, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, id) | ||
| s.logger.Info("auto-merged path duplicate", "old_id", id, "canonical_id", existingID, "path", path) | ||
| return nil |
There was a problem hiding this comment.
Treat the raw-path row as the duplicate, not the canonical project.
When existingID == path, this branch keeps the legacy raw-path project and returns before Lines 188-199 upsert the incoming id. In the normal orchestrator flow (id != path), that means the hash-backed project is never created, so the duplicate does not self-heal. In this case, upsert the incoming id first and merge the existingID == path row into it instead of merging id -> existingID.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/memory/store.go` around lines 160 - 184, The current
duplicate-merging branch (inside the filepath.IsAbs(path) && path != id check
after s.db.QueryRowContext(...).Scan(&existingID)) incorrectly treats the
raw-path row (existingID == path) as canonical and returns before the later
upsert of the incoming hash-backed project; instead, detect when existingID ==
path and in that case first perform the upsert/create for the incoming id (so
the hash-backed project exists), then merge the raw-path project into the
incoming id by running the same UPDATE statements (memories, conversations,
tasks, decisions, token_usage, audit_log) but with parameters swapped to set
project_id = incoming id WHERE project_id = existingID, then delete ghost_state
and the raw-path projects row (existingID), and log the auto-merge with old_id =
existingID and canonical_id = id; otherwise (existingID != path) keep the
current behavior of merging incoming id into existingID. Ensure you reference
the same DB helpers (s.db.ExecContext, s.db.QueryRowContext), variables
(existingID, id, path) and the log call (s.logger.Info) when implementing the
conditional swap.
| // Merge any child records from the incoming ID into the canonical project. | ||
| for _, stmt := range []string{ | ||
| `UPDATE memories SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE conversations SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE tasks SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE decisions SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE token_usage SET project_id = ? WHERE project_id = ?`, | ||
| `UPDATE audit_log SET project_id = ? WHERE project_id = ?`, | ||
| } { | ||
| _, _ = s.db.ExecContext(ctx, stmt, existingID, id) | ||
| } | ||
| _, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, id) | ||
| _, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, id) | ||
| s.logger.Info("auto-merged path duplicate", "old_id", id, "canonical_id", existingID, "path", path) | ||
| return nil |
There was a problem hiding this comment.
Propagate failures from the inline merge path.
All of these ExecContext calls discard their errors, but the code still logs auto-merged... and returns nil. A single failed update/delete will leave records split across two project IDs while reporting success. Even if this stays non-transactional, EnsureProject should stop on the first failure and surface it.
🛠️ Minimal fix pattern
- _, _ = s.db.ExecContext(ctx, stmt, existingID, id)
+ if _, err := s.db.ExecContext(ctx, stmt, existingID, id); err != nil {
+ return fmt.Errorf("merge duplicate project children: %w", err)
+ }Apply the same pattern to the ghost_state / projects deletes and to the second inline merge block.
Also applies to: 212-225
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/memory/store.go` around lines 170 - 184, The inline merge in
EnsureProject discards errors from s.db.ExecContext for multiple UPDATE/DELETE
statements (project_id updates, ghost_state delete, projects delete), then logs
success and returns nil; change these calls to capture and check the error
return and propagate the first non-nil error back to the caller instead of
ignoring it. Specifically, in the block that iterates over the statements (and
the subsequent ExecContext calls for `DELETE FROM ghost_state` and `DELETE FROM
projects`) replace the ignored returns with capturing the error (e.g., _, err :=
s.db.ExecContext(...)) and if err != nil return err so EnsureProject halts on
failure and surfaces the DB error for IDs existingID, id, and path; apply the
same pattern to the second inline merge block cited around lines 212-225.
Problem
Ghost had duplicate projects for the same repo:
/home/wayne/git/dingo(2 memories) — raw path used as IDa7293a04b38a(13 memories) — correct hash IDRoot cause: Two code paths create projects with different ID schemes:
sha256(path)[:12]hash IDsghost_memory_save— when Claude passes a raw filesystem path,resolveProjectIDcouldn't match it, returned it unchanged, andEnsureProjectcreated a duplicateFix
resolveProjectIDnow checks project paths as final fallback before returning unknown inputEnsureProjectchecks for existing projects with the same path before inserting — logs and skips if duplicate detectedTest plan
go test ./...— all 23 packages passSummary by CodeRabbit
Bug Fixes
Tests