Skip to content

fix(mcp): prevent duplicate projects from raw path IDs#138

Merged
wcatz merged 2 commits intomainfrom
fix/resolve-project-by-path
Apr 1, 2026
Merged

fix(mcp): prevent duplicate projects from raw path IDs#138
wcatz merged 2 commits intomainfrom
fix/resolve-project-by-path

Conversation

@wcatz
Copy link
Copy Markdown
Owner

@wcatz wcatz commented Apr 1, 2026

Problem

Ghost had duplicate projects for the same repo:

  • /home/wayne/git/dingo (2 memories) — raw path used as ID
  • a7293a04b38a (13 memories) — correct hash ID

Root cause: Two code paths create projects with different ID schemes:

  1. Orchestrator uses sha256(path)[:12] hash IDs
  2. MCP ghost_memory_save — when Claude passes a raw filesystem path, resolveProjectID couldn't match it, returned it unchanged, and EnsureProject created a duplicate

Fix

  1. resolveProjectID now checks project paths as final fallback before returning unknown input
  2. EnsureProject checks for existing projects with the same path before inserting — logs and skips if duplicate detected

Test plan

  • go test ./... — all 23 packages pass
  • No deadlocks (fixed inline merge to avoid mutex + BeginTx conflict)
  • Hook test updated for both fallback and matched-project paths

Summary by CodeRabbit

  • Bug Fixes

    • Better project recognition when supplying absolute filesystem paths
    • Improved deduplication to prevent creation of duplicate projects
    • More reliable project merge and reassignment behavior to preserve data integrity
  • Tests

    • Strengthened session-start test assertions to reduce flakiness and better validate hook output

Root cause: when Claude Code passes a raw filesystem path (e.g.
/home/wayne/git/dingo) as project_id, resolveProjectID couldn't
find it by name or hash ID, so it was used as-is. EnsureProject
then created a second project with the path as its ID, duplicating
the hash-ID project already registered by the orchestrator.

Fixes:
- resolveProjectID now checks project paths as a final fallback
  before returning unknown input unchanged
- EnsureProject checks for existing projects with the same path
  before inserting, preventing duplicates at the DB level
- Updated hook test to handle both fallback and matched-project
  output paths
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Test assertions for session-start hook output were relaxed; project ID resolution now matches absolute filesystem paths against stored project paths; project deduplication in the memory store was changed from a transactional merge call to inline, mutex-held reassignments and deletions with path ownership checks.

Changes

Cohort / File(s) Summary
Test Assertion Updates
internal/mcpinit/init_test.go
TestHandleSessionStartHook now fails if hook output is empty and accepts either "ghost_memory_save" or "Ghost context" instead of requiring "Ghost memory is active".
Path-Based Project Resolution
internal/mcpserver/mcpserver.go
resolveProjectID fallback now, when input is an absolute path, compares the input against stored project Path fields and returns the matching p.ID in addition to existing name/ID checks.
Project Merge & Ownership Checks
internal/memory/store.go
EnsureProject now checks absolute-path ownership under s.mu; if a different existing project owns the path, it reassigns child-table project_id values and deletes the duplicate projects row and any ghost_state for the incoming ID inline (replacing prior mergeProjectLocked call).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 I hopped through paths both old and new,
Found duplicate homes and stitched them true.
Hooks sing softer, outputs now play,
Projects merged the rabbit way — hooray! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main fix: preventing duplicate projects created from raw filesystem path IDs, which is the core problem addressed across all three modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/resolve-project-by-path

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/memory/store.go (1)

201-214: Silent error suppression in inline merge may hide data integrity issues.

The inline merge logic ignores all errors from UPDATE and DELETE statements:

_, _ = s.db.ExecContext(ctx, stmt, id, dupID)

While this avoids mutex deadlock (the reason for not calling mergeProjectLocked), silent failures could leave orphaned records or partial merges without any indication. Consider at minimum logging errors:

♻️ Suggested improvement: Log merge errors without failing
 			for _, stmt := range []string{
 				`UPDATE memories SET project_id = ? WHERE project_id = ?`,
 				`UPDATE conversations SET project_id = ? WHERE project_id = ?`,
 				`UPDATE tasks SET project_id = ? WHERE project_id = ?`,
 				`UPDATE decisions SET project_id = ? WHERE project_id = ?`,
 				`UPDATE token_usage SET project_id = ? WHERE project_id = ?`,
 				`UPDATE audit_log SET project_id = ? WHERE project_id = ?`,
 			} {
-				_, _ = s.db.ExecContext(ctx, stmt, id, dupID)
+				if _, err := s.db.ExecContext(ctx, stmt, id, dupID); err != nil {
+					s.logger.Warn("inline merge update failed", "stmt", stmt, "error", err)
+				}
 			}
-			_, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, dupID)
-			_, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, dupID)
+			if _, err := s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, dupID); err != nil {
+				s.logger.Warn("inline merge ghost_state delete failed", "error", err)
+			}
+			if _, err := s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, dupID); err != nil {
+				s.logger.Warn("inline merge project delete failed", "error", err)
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/memory/store.go` around lines 201 - 214, The inline merge loop in
mergeProject (using s.db.ExecContext in the for-loop and subsequent DELETEs)
currently swallows all errors which can hide failed UPDATE/DELETE operations;
update the loop to capture errors from s.db.ExecContext (and the DELETE calls),
and for each non-nil error call s.logger.Warn or s.logger.Error with contextual
fields (e.g., stmt, dupID, id, error) so failures are visible while still
avoiding the mutex deadlock that prevented calling mergeProjectLocked; keep
behavior non-failing (don’t return the error) but ensure every ExecContext call
logs any returned error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/memory/store.go`:
- Around line 201-214: The inline project-merge block that runs multiple
independent UPDATE/DELETE ExecContext calls while holding s.mu is not atomic and
can leave partial updates if one statement fails; add a concise comment above
this block (reference the inline loop and the surrounding code that holds s.mu)
explaining that this is intentionally non-transactional to avoid a mutex+BeginTx
deadlock, that mergeProjectLocked provides the transactional all-or-nothing
alternative, and note the limitation for future maintainers so they understand
the trade-off and can consider refactoring later.

---

Nitpick comments:
In `@internal/memory/store.go`:
- Around line 201-214: The inline merge loop in mergeProject (using
s.db.ExecContext in the for-loop and subsequent DELETEs) currently swallows all
errors which can hide failed UPDATE/DELETE operations; update the loop to
capture errors from s.db.ExecContext (and the DELETE calls), and for each
non-nil error call s.logger.Warn or s.logger.Error with contextual fields (e.g.,
stmt, dupID, id, error) so failures are visible while still avoiding the mutex
deadlock that prevented calling mergeProjectLocked; keep behavior non-failing
(don’t return the error) but ensure every ExecContext call logs any returned
error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c0d7d81-213f-49c5-b010-6d9e3dc96816

📥 Commits

Reviewing files that changed from the base of the PR and between 2966cc5 and a68a327.

📒 Files selected for processing (3)
  • internal/mcpinit/init_test.go
  • internal/mcpserver/mcpserver.go
  • internal/memory/store.go

Comment thread internal/memory/store.go
Comment on lines +201 to +214
// Use inline merge to avoid deadlock (we already hold s.mu).
for _, stmt := range []string{
`UPDATE memories SET project_id = ? WHERE project_id = ?`,
`UPDATE conversations SET project_id = ? WHERE project_id = ?`,
`UPDATE tasks SET project_id = ? WHERE project_id = ?`,
`UPDATE decisions SET project_id = ? WHERE project_id = ?`,
`UPDATE token_usage SET project_id = ? WHERE project_id = ?`,
`UPDATE audit_log SET project_id = ? WHERE project_id = ?`,
} {
_, _ = s.db.ExecContext(ctx, stmt, id, dupID)
}
_, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, dupID)
_, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, dupID)
s.logger.Info("auto-merged duplicate project", "old_id", dupID, "new_id", id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inline merge lacks atomicity compared to mergeProjectLocked.

The transactional mergeProjectLocked (lines 229-271) ensures all-or-nothing semantics for project merges. The inline version executes statements independently, meaning a failure partway through could leave the database in an inconsistent state (e.g., some tables updated, others not).

This is an acceptable trade-off to avoid the mutex + BeginTx deadlock, but consider documenting this limitation with a comment for future maintainers.

📝 Suggested documentation
 		if scanErr == nil && dupID != "" {
-			// Use inline merge to avoid deadlock (we already hold s.mu).
+			// Use inline merge to avoid deadlock (we already hold s.mu).
+			// Note: Unlike mergeProjectLocked, this is NOT atomic — partial
+			// failures may leave orphaned records. Acceptable here since this
+			// is a best-effort cleanup of MCP-created duplicates.
 			for _, stmt := range []string{
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Use inline merge to avoid deadlock (we already hold s.mu).
for _, stmt := range []string{
`UPDATE memories SET project_id = ? WHERE project_id = ?`,
`UPDATE conversations SET project_id = ? WHERE project_id = ?`,
`UPDATE tasks SET project_id = ? WHERE project_id = ?`,
`UPDATE decisions SET project_id = ? WHERE project_id = ?`,
`UPDATE token_usage SET project_id = ? WHERE project_id = ?`,
`UPDATE audit_log SET project_id = ? WHERE project_id = ?`,
} {
_, _ = s.db.ExecContext(ctx, stmt, id, dupID)
}
_, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, dupID)
_, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, dupID)
s.logger.Info("auto-merged duplicate project", "old_id", dupID, "new_id", id)
// Use inline merge to avoid deadlock (we already hold s.mu).
// Note: Unlike mergeProjectLocked, this is NOT atomic — partial
// failures may leave orphaned records. Acceptable here since this
// is a best-effort cleanup of MCP-created duplicates.
for _, stmt := range []string{
`UPDATE memories SET project_id = ? WHERE project_id = ?`,
`UPDATE conversations SET project_id = ? WHERE project_id = ?`,
`UPDATE tasks SET project_id = ? WHERE project_id = ?`,
`UPDATE decisions SET project_id = ? WHERE project_id = ?`,
`UPDATE token_usage SET project_id = ? WHERE project_id = ?`,
`UPDATE audit_log SET project_id = ? WHERE project_id = ?`,
} {
_, _ = s.db.ExecContext(ctx, stmt, id, dupID)
}
_, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, dupID)
_, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, dupID)
s.logger.Info("auto-merged duplicate project", "old_id", dupID, "new_id", id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/memory/store.go` around lines 201 - 214, The inline project-merge
block that runs multiple independent UPDATE/DELETE ExecContext calls while
holding s.mu is not atomic and can leave partial updates if one statement fails;
add a concise comment above this block (reference the inline loop and the
surrounding code that holds s.mu) explaining that this is intentionally
non-transactional to avoid a mutex+BeginTx deadlock, that mergeProjectLocked
provides the transactional all-or-nothing alternative, and note the limitation
for future maintainers so they understand the trade-off and can consider
refactoring later.

When EnsureProject detects a same-path duplicate, merge orphaned
child records (memories, tasks, etc.) into the canonical project
instead of just logging and returning. This self-heals existing
duplicates on next session start without needing a CLI command.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/memory/store.go`:
- Around line 170-184: The inline merge in EnsureProject discards errors from
s.db.ExecContext for multiple UPDATE/DELETE statements (project_id updates,
ghost_state delete, projects delete), then logs success and returns nil; change
these calls to capture and check the error return and propagate the first
non-nil error back to the caller instead of ignoring it. Specifically, in the
block that iterates over the statements (and the subsequent ExecContext calls
for `DELETE FROM ghost_state` and `DELETE FROM projects`) replace the ignored
returns with capturing the error (e.g., _, err := s.db.ExecContext(...)) and if
err != nil return err so EnsureProject halts on failure and surfaces the DB
error for IDs existingID, id, and path; apply the same pattern to the second
inline merge block cited around lines 212-225.
- Around line 160-184: The current duplicate-merging branch (inside the
filepath.IsAbs(path) && path != id check after
s.db.QueryRowContext(...).Scan(&existingID)) incorrectly treats the raw-path row
(existingID == path) as canonical and returns before the later upsert of the
incoming hash-backed project; instead, detect when existingID == path and in
that case first perform the upsert/create for the incoming id (so the
hash-backed project exists), then merge the raw-path project into the incoming
id by running the same UPDATE statements (memories, conversations, tasks,
decisions, token_usage, audit_log) but with parameters swapped to set project_id
= incoming id WHERE project_id = existingID, then delete ghost_state and the
raw-path projects row (existingID), and log the auto-merge with old_id =
existingID and canonical_id = id; otherwise (existingID != path) keep the
current behavior of merging incoming id into existingID. Ensure you reference
the same DB helpers (s.db.ExecContext, s.db.QueryRowContext), variables
(existingID, id, path) and the log call (s.logger.Info) when implementing the
conditional swap.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 65e6b728-33e3-481c-868c-2508c8cd9e80

📥 Commits

Reviewing files that changed from the base of the PR and between a68a327 and 6e1a4eb.

📒 Files selected for processing (1)
  • internal/memory/store.go

Comment thread internal/memory/store.go
Comment on lines +160 to +184
// Check if another project already owns this path. If so, merge
// any orphaned child records into the canonical project and skip
// creating a duplicate. This self-heals duplicates caused by MCP
// passing raw filesystem paths as project IDs.
if filepath.IsAbs(path) && path != id {
var existingID string
scanErr := s.db.QueryRowContext(ctx,
`SELECT id FROM projects WHERE path = ? AND id != ? LIMIT 1`,
path, id).Scan(&existingID)
if scanErr == nil && existingID != "" {
// Merge any child records from the incoming ID into the canonical project.
for _, stmt := range []string{
`UPDATE memories SET project_id = ? WHERE project_id = ?`,
`UPDATE conversations SET project_id = ? WHERE project_id = ?`,
`UPDATE tasks SET project_id = ? WHERE project_id = ?`,
`UPDATE decisions SET project_id = ? WHERE project_id = ?`,
`UPDATE token_usage SET project_id = ? WHERE project_id = ?`,
`UPDATE audit_log SET project_id = ? WHERE project_id = ?`,
} {
_, _ = s.db.ExecContext(ctx, stmt, existingID, id)
}
_, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, id)
_, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, id)
s.logger.Info("auto-merged path duplicate", "old_id", id, "canonical_id", existingID, "path", path)
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Treat the raw-path row as the duplicate, not the canonical project.

When existingID == path, this branch keeps the legacy raw-path project and returns before Lines 188-199 upsert the incoming id. In the normal orchestrator flow (id != path), that means the hash-backed project is never created, so the duplicate does not self-heal. In this case, upsert the incoming id first and merge the existingID == path row into it instead of merging id -> existingID.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/memory/store.go` around lines 160 - 184, The current
duplicate-merging branch (inside the filepath.IsAbs(path) && path != id check
after s.db.QueryRowContext(...).Scan(&existingID)) incorrectly treats the
raw-path row (existingID == path) as canonical and returns before the later
upsert of the incoming hash-backed project; instead, detect when existingID ==
path and in that case first perform the upsert/create for the incoming id (so
the hash-backed project exists), then merge the raw-path project into the
incoming id by running the same UPDATE statements (memories, conversations,
tasks, decisions, token_usage, audit_log) but with parameters swapped to set
project_id = incoming id WHERE project_id = existingID, then delete ghost_state
and the raw-path projects row (existingID), and log the auto-merge with old_id =
existingID and canonical_id = id; otherwise (existingID != path) keep the
current behavior of merging incoming id into existingID. Ensure you reference
the same DB helpers (s.db.ExecContext, s.db.QueryRowContext), variables
(existingID, id, path) and the log call (s.logger.Info) when implementing the
conditional swap.

Comment thread internal/memory/store.go
Comment on lines +170 to +184
// Merge any child records from the incoming ID into the canonical project.
for _, stmt := range []string{
`UPDATE memories SET project_id = ? WHERE project_id = ?`,
`UPDATE conversations SET project_id = ? WHERE project_id = ?`,
`UPDATE tasks SET project_id = ? WHERE project_id = ?`,
`UPDATE decisions SET project_id = ? WHERE project_id = ?`,
`UPDATE token_usage SET project_id = ? WHERE project_id = ?`,
`UPDATE audit_log SET project_id = ? WHERE project_id = ?`,
} {
_, _ = s.db.ExecContext(ctx, stmt, existingID, id)
}
_, _ = s.db.ExecContext(ctx, `DELETE FROM ghost_state WHERE project_id = ?`, id)
_, _ = s.db.ExecContext(ctx, `DELETE FROM projects WHERE id = ?`, id)
s.logger.Info("auto-merged path duplicate", "old_id", id, "canonical_id", existingID, "path", path)
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate failures from the inline merge path.

All of these ExecContext calls discard their errors, but the code still logs auto-merged... and returns nil. A single failed update/delete will leave records split across two project IDs while reporting success. Even if this stays non-transactional, EnsureProject should stop on the first failure and surface it.

🛠️ Minimal fix pattern
-				_, _ = s.db.ExecContext(ctx, stmt, existingID, id)
+				if _, err := s.db.ExecContext(ctx, stmt, existingID, id); err != nil {
+					return fmt.Errorf("merge duplicate project children: %w", err)
+				}

Apply the same pattern to the ghost_state / projects deletes and to the second inline merge block.

Also applies to: 212-225

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/memory/store.go` around lines 170 - 184, The inline merge in
EnsureProject discards errors from s.db.ExecContext for multiple UPDATE/DELETE
statements (project_id updates, ghost_state delete, projects delete), then logs
success and returns nil; change these calls to capture and check the error
return and propagate the first non-nil error back to the caller instead of
ignoring it. Specifically, in the block that iterates over the statements (and
the subsequent ExecContext calls for `DELETE FROM ghost_state` and `DELETE FROM
projects`) replace the ignored returns with capturing the error (e.g., _, err :=
s.db.ExecContext(...)) and if err != nil return err so EnsureProject halts on
failure and surfaces the DB error for IDs existingID, id, and path; apply the
same pattern to the second inline merge block cited around lines 212-225.

@wcatz wcatz merged commit 2ce4edf into main Apr 1, 2026
5 checks passed
@wcatz wcatz deleted the fix/resolve-project-by-path branch April 1, 2026 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant