Skip to content

feat: per-event project attribution for session tracking#325

Merged
mksglu merged 18 commits into
mksglu:nextfrom
sebastianbreguel:feat/project-attribution
Apr 25, 2026
Merged

feat: per-event project attribution for session tracking#325
mksglu merged 18 commits into
mksglu:nextfrom
sebastianbreguel:feat/project-attribution

Conversation

@sebastianbreguel

Copy link
Copy Markdown
Contributor

Summary

  • Adds per-event project attribution to avoid pinning all session activity to startup directory
  • Each event gets project_dir, attribution_source, and attribution_confidence (0-1)
  • Auto-migration for existing DBs
  • Insight server aggregates analytics by project with weighted confidence

Problem

When users switch projects mid-session, all events were attributed to the initial directory. This corrupted spending/activity data per project.

Solution

New project-attribution.ts module with heuristics:

  • workspace_root (0.98) > cwd_event (0.9) > input_cwd (0.88) > session_origin (0.82) > last_seen (0.76) > event_path (0.7)

Confidence propagates through session - high-confidence attributions inform subsequent events.

Test plan

  • Unit tests pass (188 session tests)
  • Manual test: file_read gets 0.88 confidence, user_prompt gets 0.45
  • Migration works on existing DBs
  • Bundles rebuilt and verified

@mksglu mksglu changed the base branch from main to next April 22, 2026 18:15
@sebastianbreguel

Copy link
Copy Markdown
Contributor Author

Before vs After: Per-Event Project Attribution

┌─────────────────────────────────────────────────────────────────────────────┐
│                              BEFORE                                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Session start: ~/projects/                                                │
│                                                                             │
│   ┌──────────────────────────────────────────────────────────────────┐     │
│   │                    SESSION EVENTS                                 │     │
│   ├──────────────────────────────────────────────────────────────────┤     │
│   │  Event 1: edit app-a/src/main.ts    → project: ~/projects/       │     │
│   │  Event 2: edit app-a/src/utils.ts   → project: ~/projects/       │     │
│   │  Event 3: edit app-b/lib/api.ts     → project: ~/projects/       │     │
│   │  Event 4: edit app-b/lib/db.ts      → project: ~/projects/       │     │
│   │  Event 5: edit app-a/test/foo.ts    → project: ~/projects/       │     │
│   └──────────────────────────────────────────────────────────────────┘     │
│                                                                             │
│   Insight Dashboard:                                                        │
│   ┌────────────────────────────┐                                           │
│   │  ~/projects/  ████████ 45m │   ← Everything lumped together            │
│   └────────────────────────────┘                                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│                               AFTER                                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Session start: ~/projects/                                                │
│                                                                             │
│   ┌──────────────────────────────────────────────────────────────────┐     │
│   │                    SESSION EVENTS                                 │     │
│   ├──────────────────────────────────────────────────────────────────┤     │
│   │  Event 1: edit app-a/src/main.ts                                  │     │
│   │           → project: app-a  (source: file_path, conf: 0.95)       │     │
│   │  Event 2: edit app-a/src/utils.ts                                 │     │
│   │           → project: app-a  (source: file_path, conf: 0.95)       │     │
│   │  Event 3: edit app-b/lib/api.ts                                   │     │
│   │           → project: app-b  (source: git, conf: 1.0)              │     │
│   │  Event 4: edit app-b/lib/db.ts                                    │     │
│   │           → project: app-b  (source: git, conf: 1.0)              │     │
│   │  Event 5: edit app-a/test/foo.ts                                  │     │
│   │           → project: app-a  (source: file_path, conf: 0.95)       │     │
│   └──────────────────────────────────────────────────────────────────┘     │
│                                                                             │
│   Insight Dashboard:                                                        │
│   ┌────────────────────────────┐                                           │
│   │  app-a  ██████████████ 30m │   ← Per-project breakdown                 │
│   │  app-b  ████████     15m   │                                           │
│   └────────────────────────────┘                                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key change: Attribution happens per-event, not per-session. Each tool call analyzed → real project detected → tracked separately.

@sebastianbreguel sebastianbreguel force-pushed the feat/project-attribution branch 2 times, most recently from d2e08c7 to b554a49 Compare April 22, 2026 20:39
github-actions Bot and others added 18 commits April 22, 2026 16:41
Adds heuristics to attribute each session event to its actual project
directory instead of pinning all activity to the startup directory.

- New project-attribution.ts module with confidence-scored attribution
- DB schema extended: project_dir, attribution_source, attribution_confidence
- Auto-migration for existing DBs
- All hooks updated to call resolveProjectAttributions()
- Insight server aggregates by project with weighted confidence

Fix: moved CREATE INDEX for project_dir to migration block to avoid
failure on existing DBs without the column.
@sebastianbreguel sebastianbreguel force-pushed the feat/project-attribution branch from b554a49 to 4950e4c Compare April 22, 2026 20:41
@mksglu

mksglu commented Apr 25, 2026

Copy link
Copy Markdown
Owner

Code Review: per-event project attribution

The feature addresses a real problem — monorepo/multi-project sessions crediting all events to the startup project_dir. DB migration is clean, core attribution logic is well-structured, and 43 unit tests is solid coverage. Good work overall.

Blockers (must fix before merge)

1. DRY violation — 7 hook files with identical attribution block

The same ~12 lines are copy-pasted into codex/posttooluse, cursor/posttooluse, kiro/posttooluse, vscode-copilot/posttooluse, gemini-cli/posttooluse, posttooluse.mjs, and userpromptsubmit.mjs:

const stats = db.getSessionStats(sessionId);
const lastProject = db.getLatestAttributedProjectDir(sessionId);
const attributions = resolveProjectAttributions(events, { ... });

Extract this into a shared helper in session-loaders.mjs (e.g., resolveAndInsertWithAttribution(db, sessionId, events, input)). One source of truth, one place to update.

2. Magic confidence numbers need named constants

The values 0.98, 0.9, 0.88, 0.82, 0.76, 0.7, 0.45, 0.4, 0.35 appear inline with no explanation. Define them as named constants with a brief rationale:

const CONFIDENCE = {
  WORKSPACE_ROOT: 0.98,   // explicit workspace config — highest signal
  CWD_EVENT: 0.9,         // user navigated here intentionally
  INPUT_CWD: 0.88,        // hook payload cwd — reliable but implicit
  SESSION_ORIGIN: 0.82,   // session startup dir
  LAST_SEEN: 0.76,        // carry-forward from previous event
  EVENT_PATH: 0.7,        // inferred from file path prefix
  // ... fallbacks
} as const;

3. Squash commits

18 commits → 1-3 commits for a clean history.

Missing: Insight UI

The backend API (insight/server.mjs) now queries session_events.project_dir with weighted aggregation — but the Insight dashboard UI (React components in insight/src/) is not updated to display per-project breakdowns. The data flows to the API but there's no way for users to see it. Is this intentional (API-first, UI follow-up) or an oversight?

Test gaps (nice-to-have, not blockers)

  • No integration test for the full hook → extract → attribution → DB → insight query pipeline
  • No test for the hasColumn() fallback path in insight/server.mjs (221 new lines, zero test coverage)
  • Non-path event types (mcp, user_prompt) fallback to low confidence but aren't tested

Summary

Aspect Grade
Problem solved ✅ Real need
DB migration ✅ Clean, backward compatible
Core logic ✅ Pure functions, well-tested
Performance ✅ Negligible overhead
DRY / code quality ❌ 7-file copy-paste
Constants ❌ Magic numbers
Insight UI ❌ Not updated

Looking forward to the next iteration.

@mksglu mksglu merged commit 270a56f into mksglu:next Apr 25, 2026
5 checks passed
@mksglu

mksglu commented Apr 25, 2026

Copy link
Copy Markdown
Owner

Merged! We'll follow up with fixes for the DRY violation, magic numbers, and Insight UI on next.

mksglu added a commit that referenced this pull request Apr 25, 2026
…constants

Follow-up to #325 addressing review feedback:

1. Extract duplicated 15-line attribution block from 7 hook files into
   shared `attributeAndInsertEvents()` helper in session-loaders.mjs.
   Net -25 lines — one source of truth, one place to update.

2. Replace all 10 inline magic confidence numbers (0.98, 0.9, 0.88, etc.)
   with named `ATTRIBUTION_CONFIDENCE` constants with JSDoc rationale.

Files: 9 modified (6 posttooluse hooks, userpromptsubmit, session-loaders,
project-attribution.ts). 266 session tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mksglu

mksglu commented Apr 25, 2026

Copy link
Copy Markdown
Owner

Follow-up fixes shipped on next

All three blockers from the review are resolved:

Aspect Before After
DRY / code quality ❌ 7-file copy-paste attributeAndInsertEvents() shared helper in session-loaders.mjs (-25 lines)
Constants ❌ Magic numbers ATTRIBUTION_CONFIDENCE named constants with JSDoc rationale
Insight UI ❌ Not updated ✅ Match Quality labels (Strong/Fair/Weak), thresholds aligned (≥80/≥55/<55), WCAG accessible icons (✓/~/!)

Additional UX improvements:

  • "Confidence: 72%" jargon → "Match Quality: Strong"
  • "3s · 42e" → "3 sessions · 42 events"
  • Scary fallback warning → subtle info with Lightbulb icon
  • Developer jargon removed from user-facing text

Commits: 79e0d7e, 773f863, cdce371

mksglu added a commit that referenced this pull request May 31, 2026
…tore

Adds the cross-DB plumbing required by the ctx_search `project:` filter
(#737) without changing any existing call site. Three layers gain an
opt-in scope hook:

- `ContentStore.searchWithFallback` accepts `sessionIdAllowSet?: Set<string>`.
  When supplied, the RRF candidate pool is fetched at 8x the requested
  limit and post-filtered by `chunks.session_id` membership. Legacy
  unattributed chunks (`session_id=''`) stay visible — they predate the
  attribution wiring landed in 2d4f7c1 (#605) and represent shared
  knowledge surface that must remain reachable in shared-DB mode.
- `SessionDB.getSessionIdsForProject(projectDir)` returns the distinct
  session ids whose events match a `project_dir`. Backed by the
  composite index `idx_session_events_project(session_id, project_dir)`
  introduced alongside the project_dir column in 270a56f (#325), so
  1000-session lookups stay sub-50ms.
- `searchAllSources` gains `projectScope?: string | null`. When a string
  is passed AND a `sessionDB` is available, the resolver looks up the
  allow-set once and threads it into `store.searchWithFallback`. The
  three-state contract (undefined / null / string) matches the resolver
  surfaced in the next commit so the handler and the library agree on
  semantics.

`SearchResult.sessionId` is added to the public type so the post-filter
has the attribution column it needs; the new field is `?: string` and
defaults to `""` for legacy chunks. The eight FTS5 prepared statements
gain the `chunks.session_id` / `chunks_trigram.session_id` column so
`#mapSearchRows` can populate it.

ATTACH DATABASE is intentionally NOT used — the SQLite docs warn that
WAL mode plus ATTACH carries durability trade-offs that the unified
storage layer should not inherit. The two-step IN-clause keeps SessionDB
and ContentStore in their own connections, which also keeps the
search-only path read-only against the events DB.

Refs ead9177 (#367 — searchAllSources unification), 270a56f (#325 —
session_events.project_dir column + idx_session_events_project index).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants