feat(search): filter /search by tags (people, projects, topics) by louis030195 · Pull Request #3940 · screenpipe/screenpipe

louis030195 · 2026-06-09T19:21:09Z

What

Makes tags a coherent cross-store link that an AI agent can use end to end. Two pieces:

A tags filter on /search (and the MCP search-content tool): pull every item carrying given tags, e.g. ?tags=person:ada,project:atlas.
The same filter now also covers memories (content_type=memory), so the AI can retrieve memories by tag. Before this, the AI could add tags to a memory (update-memory) but had no way to get them back by tag.

The model (how the AI links a person to memories / a timeframe)

One string namespace spans three stores. The same tag on different items connects them.

flowchart LR
  subgraph "one tag namespace"
    direction LR
    T["person:ada"]
  end
  F["screen frame<br/>vision_tags"] --- T
  A["audio chunk<br/>audio_tags"] --- T
  M["memory<br/>memories.tags"] --- T
  T --> Q1["GET /search?tags=person:ada<br/>(screen + audio)"]
  T --> Q2["GET /search?content_type=memory&tags=person:ada<br/>(facts)"]

Add a tag to a frame/audio: POST /tags/vision/{id} or /tags/audio/{id}. To a memory: tags field on POST /memories / PUT /memories/{id}.
Link a person to a memory: tag the memory person:ada.
Link a person to a timeframe: tag captures person:ada and query with start_time/end_time, or rely on the memory's created_at + frame_id provenance.
Everything about a person: one call for captures (content_type=all&tags=person:ada) + one for facts (content_type=memory&tags=person:ada).
Frames are pruned by retention, so durable links belong on memories; tag frames/audio for short-term recall. The MCP tool descriptions now teach exactly this.

Behavior

tags is comma-separated. Multiple tags AND together (intersection). Matching is exact membership, not substring (person:ada does not match person:adam).
Applies to screen (OCR), audio, and memory. input and accessibility have no tag table and return nothing when a tag filter is set.
content_type=all unions tagged screen + audio (it never includes memory, tagged or not, same as today).
count_search_results agrees with the filtered set, so pagination total stays correct.
Empty / absent tags behaves exactly as before. Strictly opt-in.

Implementation

Public search() / count_search_results() signatures kept stable (~60 callers untouched) via search_with_tags() / count_search_results_with_tags() siblings; the route handler calls those.
Screen/audio matching: json_each + HAVING COUNT(DISTINCT) = json_array_length over the junction tables. Memory matching: the same exact-AND shape over the memories.tags JSON array, threaded through new list_memories / count_memories tags_all params (the public GET /memories?tags= keeps its existing single-substring filter).
MCP add-tags, update-memory, search-content descriptions updated to teach namespaced tags and retrieval. screenpipe-api SKILL.md (shipped + repo copies) gains a concise Tags section.

Tests

cargo test -p screenpipe-db --test db — 37 pass, including:

test_search_filter_by_tags — single / shared / AND / unknown / no-filter / count.
test_tag_filter_audio_and_cross_modal — audio filter, content_type=all cross-modal union, exact-not-substring, input gated, count.
test_memory_filter_by_tags — memory exact AND, no-substring, FTS compose, all-excludes-memory, count.

cargo clippy clean (no new warnings).

Performance

Benchmarked on the search hot path with tests/tag_filter_bench.rs (ignored; 200k frames / 200k vision_tags / 60k audio / 50k memories, in-memory). Adversarial case: tags are rare and on the oldest rows, fighting ORDER BY timestamp DESC LIMIT.

EXPLAIN QUERY PLAN confirms screen/audio filtering drives off the tag indexes (tags.name UNIQUE + idx_vision_tags_tag_id) and PK-looks-up frames, so it never scans all frames:

query (200k-frame DB)	best
OCR, no tags (baseline full scan + sort)	~127 ms
OCR, `tags=person:ada`	~7 ms (17x faster than baseline)
Audio, `tags=person:ada`	~1 ms
All, `tags=person:ada`	~8 ms
Memory, `tags=person:ada`	~16 ms
counts (OCR / All / Memory)	~7-12 ms

The tag filter is faster than unfiltered search because it narrows to the tagged set instead of scanning everything. The one linear path is memory: memories.tags is unindexed JSON, so the filter is a full scan + correlated json_each (~0.3 us/row, ~16 ms @ 50k → ~160 ms @ 500k). Fine at realistic memory counts; if memories ever reach millions, add a memory_tags junction table mirroring vision_tags. No-tag queries are unaffected (the json_array_length(?)=0 OR ... guard short-circuits).

Not in this PR

A graph-traversal endpoint (co-occurring tags for a given tag). Note /connections is already taken by integrations, so it would need a name like /graph or /related. Natural follow-up.

Add an optional `tags` filter to the unified /search endpoint and the MCP search-content tool, so an AI agent can pull every screen + audio capture carrying a given tag (e.g. person:ada, project:atlas). Tags are the existing junction-table labels written via POST /tags/:type/:id, and search results already return them; this closes the loop by letting you filter on them. - `tags` is comma-separated; multiple tags AND together (intersection). - Backed by vision_tags / audio_tags, so content types without tag tables (input, accessibility, memory) return nothing when a tag filter is set. Memories keep their own tag filter via GET /memories?tags=. - count_search_results agrees with the filtered result set so pagination totals stay correct. - Public search() / count_search_results() signatures kept stable via search_with_tags() / count_search_results_with_tags() siblings, so the ~60 existing callers are untouched. Namespaced tags (person:, project:, topic:) become a universal, AI-authored link primitive: two captures sharing a tag are connected. That is the foundation for an Obsidian-style graph over captures, openable to the AI with no rigid per-entity schema. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Make tags a coherent cross-store link the AI can actually use end to end, not just a screen/audio filter. - `GET /search?content_type=memory&tags=...` now filters memories by their JSON tags (exact membership, AND across multiple tags), via new list_memories/count_memories `tags_all` params. Previously the AI could tag a memory (update-memory) but had no way to retrieve memories by tag through search. content_type=all still never returns memories. - One string namespace now spans three stores: vision_tags / audio_tags (screen + audio) and memories.tags (memory). The same `person:ada` on a frame and on a memory links them. - MCP: add-tags / update-memory / search-content descriptions now teach the namespaced convention (person:/project:/topic:), how to retrieve by tag, and that frames are pruned so durable links belong on memories. - screenpipe-api SKILL.md (shipped + repo copies): concise Tags section. - Tests: add test_tag_filter_audio_and_cross_modal (audio + cross-modal all + exact-not-substring + input-gated + count) and test_memory_filter_by_tags (exact AND, no-substring, FTS compose, all-excludes-memory, count). 37 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…mes) Adds tests/tag_filter_bench.rs (#[ignore], run with --ignored) that seeds a 200k-frame / 60k-audio / 50k-memory DB with rare tags on the OLDEST rows (adversarial vs ORDER BY timestamp DESC LIMIT) and dumps EXPLAIN QUERY PLAN plus timings. Findings: screen/audio tag filtering drives off the tag indexes (tags.name UNIQUE + idx_vision_tags_tag_id) then PK-looks-up frames, so it never scans all frames — the tag filter is ~7 ms vs ~127 ms for unfiltered OCR search at 200k frames (17x faster, since it narrows to the tagged set). Audio ~1 ms, all ~8 ms, counts ~7-12 ms. Memory is the one linear path (memories.tags is unindexed JSON → full scan + correlated json_each, ~16 ms @ 50k); fine at realistic counts, junction table if memories ever hit millions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-09T23:20:03Z

Diarization eval results

Source: crates/screenpipe-audio-eval/evals/ · VoxConverse dev (CC-BY-4.0) + composed workday templates + screenpipe-shaped LibriSpeech fixtures

fixture	DER	VAD FA	VAD FN	boundary err (s)	continuity	predicted / true spk
interrupted_meeting	0.186	0.01	0.063	20.286	0.833	9 / 5
long_silence_day	0.437	0.011	0.145	11.46	0.7	14 / 10
screenpipe_meeting_rapid_handoffs	0.241	0.196	0.099	2.305	1	5 / 3
screenpipe_background_24_7_day	0.315	0.025	0.159	2.203	1	4 / 3
screenpipe_short_backchannels	0.561	0.915	0.064	0.488	n/a	3 / 3
screenpipe_mic_system_echo_leakage	0.275	0.198	0.084	3.045	0.667	5 / 3
screenpipe_overlap_crosstalk	0.254	0.84	0.042	0.667	n/a	3 / 3
abjxc	0.016	0.098	0.002	1.151	n/a	2 / 1
bxpwa	0.111	0.453	0.029	20.793	0.714	8 / 5
dhorc	0.143	0.461	0.034	3.681	1	5 / 4

_{DER, VAD FA, VAD FN, boundary err: lower is better. Continuity: higher is better, 1.0 = same hyp cluster across all silence gaps. Composed workday rows and screenpipe_* rows exercise screenpipe-shaped usage: meetings, background gaps, backchannels, echo leakage, and crosstalk. Raw VoxConverse rows score broadcast-quality stems for comparison. See crates/screenpipe-audio-eval/evals/README.md for methodology.}

Pipeline replay matrix

Source: generated screenpipe_* fixtures materialized into temp screenpipe SQLite DBs, then read back through search_audio. This catches storage/search regressions that pure DER scoring misses.

scenarios	passed	failed	skipped	avg background DER	avg background speaker err	Deepgram
41	40	0	1	0.329	0.183	skip

_{The no-secret CI matrix runs local diarization under Parakeet/Whisper engine labels across live/background and mic/system device profiles. Real Deepgram/screenpipe-cloud smoke can be run locally with --deepgram required when credentials are present.}

Transcription quality

Source: LibriSpeech test-clean (CC-BY-4.0) · per-model utterance cap · normalized lowercased word-level Levenshtein

model	utterances	WER	CER	throughput (samples/s)
tiny	50	0.085	0.033	68797
whisper-large-v3-turbo-quantized	20	0.042	0.009	1846
parakeet	50	0.04	0.026	102813

_{WER + CER on read-aloud speech. Per-model utterance caps keep wall time bounded — tiny/parakeet at 50, the heavier large-v3-turbo-quantized at 20. See README for normalization rules.}

Louis Beaumont and others added 4 commits June 9, 2026 12:19

style: cargo fmt screenpipe-db (tag filter + bench)

2d61e17

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

louis030195 merged commit 1d4a632 into main Jun 9, 2026
15 of 16 checks passed

louis030195 mentioned this pull request Jun 9, 2026

perf(memory): index memory tags via a trigger-maintained junction table #3949

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): filter /search by tags (people, projects, topics)#3940

feat(search): filter /search by tags (people, projects, topics)#3940
louis030195 merged 4 commits into
mainfrom
claude/happy-pare-011c59

louis030195 commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

louis030195 commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

The model (how the AI links a person to memories / a timeframe)

Behavior

Implementation

Tests

Performance

Not in this PR

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

Diarization eval results

Pipeline replay matrix

Transcription quality

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

louis030195 commented Jun 9, 2026 •

edited

Loading