feat: per-event project attribution for session tracking#325
Conversation
Before vs After: Per-Event Project AttributionKey change: Attribution happens per-event, not per-session. Each tool call analyzed → real project detected → tracked separately. |
d2e08c7 to
b554a49
Compare
Adds heuristics to attribute each session event to its actual project directory instead of pinning all activity to the startup directory. - New project-attribution.ts module with confidence-scored attribution - DB schema extended: project_dir, attribution_source, attribution_confidence - Auto-migration for existing DBs - All hooks updated to call resolveProjectAttributions() - Insight server aggregates by project with weighted confidence Fix: moved CREATE INDEX for project_dir to migration block to avoid failure on existing DBs without the column.
b554a49 to
4950e4c
Compare
Code Review: per-event project attributionThe feature addresses a real problem — monorepo/multi-project sessions crediting all events to the startup project_dir. DB migration is clean, core attribution logic is well-structured, and 43 unit tests is solid coverage. Good work overall. Blockers (must fix before merge)1. DRY violation — 7 hook files with identical attribution block The same ~12 lines are copy-pasted into const stats = db.getSessionStats(sessionId);
const lastProject = db.getLatestAttributedProjectDir(sessionId);
const attributions = resolveProjectAttributions(events, { ... });Extract this into a shared helper in 2. Magic confidence numbers need named constants The values 0.98, 0.9, 0.88, 0.82, 0.76, 0.7, 0.45, 0.4, 0.35 appear inline with no explanation. Define them as named constants with a brief rationale: const CONFIDENCE = {
WORKSPACE_ROOT: 0.98, // explicit workspace config — highest signal
CWD_EVENT: 0.9, // user navigated here intentionally
INPUT_CWD: 0.88, // hook payload cwd — reliable but implicit
SESSION_ORIGIN: 0.82, // session startup dir
LAST_SEEN: 0.76, // carry-forward from previous event
EVENT_PATH: 0.7, // inferred from file path prefix
// ... fallbacks
} as const;3. Squash commits 18 commits → 1-3 commits for a clean history. Missing: Insight UIThe backend API ( Test gaps (nice-to-have, not blockers)
Summary
Looking forward to the next iteration. |
|
Merged! We'll follow up with fixes for the DRY violation, magic numbers, and Insight UI on |
…constants Follow-up to #325 addressing review feedback: 1. Extract duplicated 15-line attribution block from 7 hook files into shared `attributeAndInsertEvents()` helper in session-loaders.mjs. Net -25 lines — one source of truth, one place to update. 2. Replace all 10 inline magic confidence numbers (0.98, 0.9, 0.88, etc.) with named `ATTRIBUTION_CONFIDENCE` constants with JSDoc rationale. Files: 9 modified (6 posttooluse hooks, userpromptsubmit, session-loaders, project-attribution.ts). 266 session tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow-up fixes shipped on
|
| Aspect | Before | After |
|---|---|---|
| DRY / code quality | ❌ 7-file copy-paste | ✅ attributeAndInsertEvents() shared helper in session-loaders.mjs (-25 lines) |
| Constants | ❌ Magic numbers | ✅ ATTRIBUTION_CONFIDENCE named constants with JSDoc rationale |
| Insight UI | ❌ Not updated | ✅ Match Quality labels (Strong/Fair/Weak), thresholds aligned (≥80/≥55/<55), WCAG accessible icons (✓/~/!) |
Additional UX improvements:
- "Confidence: 72%" jargon → "Match Quality: Strong"
- "3s · 42e" → "3 sessions · 42 events"
- Scary fallback warning → subtle info with Lightbulb icon
- Developer jargon removed from user-facing text
Commits: 79e0d7e, 773f863, cdce371
…tore Adds the cross-DB plumbing required by the ctx_search `project:` filter (#737) without changing any existing call site. Three layers gain an opt-in scope hook: - `ContentStore.searchWithFallback` accepts `sessionIdAllowSet?: Set<string>`. When supplied, the RRF candidate pool is fetched at 8x the requested limit and post-filtered by `chunks.session_id` membership. Legacy unattributed chunks (`session_id=''`) stay visible — they predate the attribution wiring landed in 2d4f7c1 (#605) and represent shared knowledge surface that must remain reachable in shared-DB mode. - `SessionDB.getSessionIdsForProject(projectDir)` returns the distinct session ids whose events match a `project_dir`. Backed by the composite index `idx_session_events_project(session_id, project_dir)` introduced alongside the project_dir column in 270a56f (#325), so 1000-session lookups stay sub-50ms. - `searchAllSources` gains `projectScope?: string | null`. When a string is passed AND a `sessionDB` is available, the resolver looks up the allow-set once and threads it into `store.searchWithFallback`. The three-state contract (undefined / null / string) matches the resolver surfaced in the next commit so the handler and the library agree on semantics. `SearchResult.sessionId` is added to the public type so the post-filter has the attribution column it needs; the new field is `?: string` and defaults to `""` for legacy chunks. The eight FTS5 prepared statements gain the `chunks.session_id` / `chunks_trigram.session_id` column so `#mapSearchRows` can populate it. ATTACH DATABASE is intentionally NOT used — the SQLite docs warn that WAL mode plus ATTACH carries durability trade-offs that the unified storage layer should not inherit. The two-step IN-clause keeps SessionDB and ContentStore in their own connections, which also keeps the search-only path read-only against the events DB. Refs ead9177 (#367 — searchAllSources unification), 270a56f (#325 — session_events.project_dir column + idx_session_events_project index).
Summary
project_dir,attribution_source, andattribution_confidence(0-1)Problem
When users switch projects mid-session, all events were attributed to the initial directory. This corrupted spending/activity data per project.
Solution
New
project-attribution.tsmodule with heuristics:workspace_root(0.98) >cwd_event(0.9) >input_cwd(0.88) >session_origin(0.82) >last_seen(0.76) >event_path(0.7)Confidence propagates through session - high-confidence attributions inform subsequent events.
Test plan