Skip to content

fix(pi): add attribution to all insertEvent calls + session_meta fallback#605

Merged
mksglu merged 2 commits into
mksglu:nextfrom
ChengShiLiu16:fix/pi-adapter-attribution
May 18, 2026
Merged

fix(pi): add attribution to all insertEvent calls + session_meta fallback#605
mksglu merged 2 commits into
mksglu:nextfrom
ChengShiLiu16:fix/pi-adapter-attribution

Conversation

@ChengShiLiu16

Copy link
Copy Markdown
Contributor

Fix: Cross-project data leakage via missing attribution in pi adapter

Root Cause

The pi adapter (src/adapters/pi/extension.ts) calls insertEvent() 4 times but omits the 4th parameter attribution on every call. This means all 4 call sites fall through to the default chain:

attribution?.projectDir // undefined (not passed)
  ?? event.project_dir  // undefined (not set on event object)
  ?? ""                 // ← empty string fallback

Result: every event gets project_dir = "". When searchEvents runs WHERE project_dir = ? with a real project path, it excludes those orphans. But when it runs with project_dir = "" (e.g. from a session that somehow resolves to empty), it matches all projects' orphan events — a cross-project data leak.

Three Fixes

P0: extension.ts — Pass attribution to all insertEvent calls

Added _attribution object with projectDir, source: "workspace_root", confidence: 0.98 and passed it as the 4th argument to all 4 insertEvent() calls. This is the source-level fix: new events will always carry the correct project directory.

P1: db.ts — insertEvent fallback via _getSessionProjectDir

Added _getSessionProjectDir(sessionId) method that queries session_meta for the session's project_dir. Inserted into the fallback chain of both insertEvent() and bulkInsertEvents():

attribution?.projectDir
  ?? event.project_dir
  ?? this._getSessionProjectDir(sessionId)  // ← NEW: session_meta fallback
  ?? ""
``

This is defense-in-depth: even if a future caller forgets attribution, events still get the correct project_dir from the session record.

### P1: db.ts  searchEvents SQL includes orphans

Changed `WHERE project_dir = ?` to `WHERE (project_dir = ? OR project_dir = '')`. This ensures search results include legacy orphan events (project_dir='') rather than silently dropping them, while still scoping to the target project.

## Files Changed

| File | Change |
|------|--------|
| `src/adapters/pi/extension.ts` | Import `ProjectAttribution`, add `_attribution` constant, pass to 4 `insertEvent` calls |
| `src/session/db.ts` | Add `_getSessionProjectDir()` method, use in `insertEvent` + `bulkInsertEvents` fallback chains, fix `searchEvents` SQL |

## Impact

Without these fixes:
- All pi adapter events have `project_dir = ""`
- `searchEvents("", query)` matches events from **all projects**
- Cross-project data leakage in any shared SessionDB instance

With these fixes:
- New events get correct `project_dir` from the attribution parameter
- Legacy forgotten-attribution callers get project_dir from `session_meta` 
- Search includes orphan events for backward compatibility

github-actions Bot and others added 2 commits May 17, 2026 14:12
…back

Three bugs fixed:

1. P0 — pi/extension.ts: All 4 insertEvent() calls omitted the 4th
   attribution parameter, causing project_dir to default to empty string
   for every event. This leaks data across projects in shared SessionDB
   instances because searchEvents uses WHERE project_dir = ?.

   Fix: create a _attribution object from resolvePiWorkspaceDir() and
   pass it to all insertEvent calls (PostToolUse × 3, UserPromptSubmit × 1).

2. P1 — session/db.ts: insertEvent fallback chain ended at empty string
   when no attribution was provided. Added _getSessionProjectDir(sessionId)
   that queries session_meta for the session's project_dir as a
   last-resort fallback, preventing orphan events with project_dir=''.

   Applied to both insertEvent and bulkInsertEvents.

3. P1 — session/db.ts: searchEvents SQL used WHERE project_dir = ?
   which excludes historical orphan events (project_dir='') from
   search results. Changed to WHERE (project_dir = ? OR project_dir = '')
   so searches include both properly-attributed and legacy orphan events.

Impact: Without these fixes, searchEvents('query', 50, '/my-project')
would match ALL projects' orphan events via project_dir='', causing
cross-project data leakage.
@mksglu mksglu changed the base branch from main to next May 18, 2026 03:44
@mksglu mksglu merged commit 2d4f7c1 into mksglu:next May 18, 2026
mksglu added a commit that referenced this pull request May 31, 2026
…tore

Adds the cross-DB plumbing required by the ctx_search `project:` filter
(#737) without changing any existing call site. Three layers gain an
opt-in scope hook:

- `ContentStore.searchWithFallback` accepts `sessionIdAllowSet?: Set<string>`.
  When supplied, the RRF candidate pool is fetched at 8x the requested
  limit and post-filtered by `chunks.session_id` membership. Legacy
  unattributed chunks (`session_id=''`) stay visible — they predate the
  attribution wiring landed in 2d4f7c1 (#605) and represent shared
  knowledge surface that must remain reachable in shared-DB mode.
- `SessionDB.getSessionIdsForProject(projectDir)` returns the distinct
  session ids whose events match a `project_dir`. Backed by the
  composite index `idx_session_events_project(session_id, project_dir)`
  introduced alongside the project_dir column in 270a56f (#325), so
  1000-session lookups stay sub-50ms.
- `searchAllSources` gains `projectScope?: string | null`. When a string
  is passed AND a `sessionDB` is available, the resolver looks up the
  allow-set once and threads it into `store.searchWithFallback`. The
  three-state contract (undefined / null / string) matches the resolver
  surfaced in the next commit so the handler and the library agree on
  semantics.

`SearchResult.sessionId` is added to the public type so the post-filter
has the attribution column it needs; the new field is `?: string` and
defaults to `""` for legacy chunks. The eight FTS5 prepared statements
gain the `chunks.session_id` / `chunks_trigram.session_id` column so
`#mapSearchRows` can populate it.

ATTACH DATABASE is intentionally NOT used — the SQLite docs warn that
WAL mode plus ATTACH carries durability trade-offs that the unified
storage layer should not inherit. The two-step IN-clause keeps SessionDB
and ContentStore in their own connections, which also keeps the
search-only path read-only against the events DB.

Refs ead9177 (#367 — searchAllSources unification), 270a56f (#325 —
session_events.project_dir column + idx_session_events_project index).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants