Skip to content

feat(search): filter /search by tags (people, projects, topics)#3940

Merged
louis030195 merged 4 commits into
mainfrom
claude/happy-pare-011c59
Jun 9, 2026
Merged

feat(search): filter /search by tags (people, projects, topics)#3940
louis030195 merged 4 commits into
mainfrom
claude/happy-pare-011c59

Conversation

@louis030195

@louis030195 louis030195 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

What

Makes tags a coherent cross-store link that an AI agent can use end to end. Two pieces:

  1. A tags filter on /search (and the MCP search-content tool): pull every item carrying given tags, e.g. ?tags=person:ada,project:atlas.
  2. The same filter now also covers memories (content_type=memory), so the AI can retrieve memories by tag. Before this, the AI could add tags to a memory (update-memory) but had no way to get them back by tag.

The model (how the AI links a person to memories / a timeframe)

One string namespace spans three stores. The same tag on different items connects them.

flowchart LR
  subgraph "one tag namespace"
    direction LR
    T["person:ada"]
  end
  F["screen frame<br/>vision_tags"] --- T
  A["audio chunk<br/>audio_tags"] --- T
  M["memory<br/>memories.tags"] --- T
  T --> Q1["GET /search?tags=person:ada<br/>(screen + audio)"]
  T --> Q2["GET /search?content_type=memory&tags=person:ada<br/>(facts)"]
Loading
  • Add a tag to a frame/audio: POST /tags/vision/{id} or /tags/audio/{id}. To a memory: tags field on POST /memories / PUT /memories/{id}.
  • Link a person to a memory: tag the memory person:ada.
  • Link a person to a timeframe: tag captures person:ada and query with start_time/end_time, or rely on the memory's created_at + frame_id provenance.
  • Everything about a person: one call for captures (content_type=all&tags=person:ada) + one for facts (content_type=memory&tags=person:ada).
  • Frames are pruned by retention, so durable links belong on memories; tag frames/audio for short-term recall. The MCP tool descriptions now teach exactly this.

Behavior

  • tags is comma-separated. Multiple tags AND together (intersection). Matching is exact membership, not substring (person:ada does not match person:adam).
  • Applies to screen (OCR), audio, and memory. input and accessibility have no tag table and return nothing when a tag filter is set.
  • content_type=all unions tagged screen + audio (it never includes memory, tagged or not, same as today).
  • count_search_results agrees with the filtered set, so pagination total stays correct.
  • Empty / absent tags behaves exactly as before. Strictly opt-in.

Implementation

  • Public search() / count_search_results() signatures kept stable (~60 callers untouched) via search_with_tags() / count_search_results_with_tags() siblings; the route handler calls those.
  • Screen/audio matching: json_each + HAVING COUNT(DISTINCT) = json_array_length over the junction tables. Memory matching: the same exact-AND shape over the memories.tags JSON array, threaded through new list_memories / count_memories tags_all params (the public GET /memories?tags= keeps its existing single-substring filter).
  • MCP add-tags, update-memory, search-content descriptions updated to teach namespaced tags and retrieval. screenpipe-api SKILL.md (shipped + repo copies) gains a concise Tags section.

Tests

cargo test -p screenpipe-db --test db — 37 pass, including:

  • test_search_filter_by_tags — single / shared / AND / unknown / no-filter / count.
  • test_tag_filter_audio_and_cross_modal — audio filter, content_type=all cross-modal union, exact-not-substring, input gated, count.
  • test_memory_filter_by_tags — memory exact AND, no-substring, FTS compose, all-excludes-memory, count.

cargo clippy clean (no new warnings).

Performance

Benchmarked on the search hot path with tests/tag_filter_bench.rs (ignored; 200k frames / 200k vision_tags / 60k audio / 50k memories, in-memory). Adversarial case: tags are rare and on the oldest rows, fighting ORDER BY timestamp DESC LIMIT.

EXPLAIN QUERY PLAN confirms screen/audio filtering drives off the tag indexes (tags.name UNIQUE + idx_vision_tags_tag_id) and PK-looks-up frames, so it never scans all frames:

query (200k-frame DB) best
OCR, no tags (baseline full scan + sort) ~127 ms
OCR, tags=person:ada ~7 ms (17x faster than baseline)
Audio, tags=person:ada ~1 ms
All, tags=person:ada ~8 ms
Memory, tags=person:ada ~16 ms
counts (OCR / All / Memory) ~7-12 ms

The tag filter is faster than unfiltered search because it narrows to the tagged set instead of scanning everything. The one linear path is memory: memories.tags is unindexed JSON, so the filter is a full scan + correlated json_each (~0.3 us/row, ~16 ms @ 50k → ~160 ms @ 500k). Fine at realistic memory counts; if memories ever reach millions, add a memory_tags junction table mirroring vision_tags. No-tag queries are unaffected (the json_array_length(?)=0 OR ... guard short-circuits).

Not in this PR

A graph-traversal endpoint (co-occurring tags for a given tag). Note /connections is already taken by integrations, so it would need a name like /graph or /related. Natural follow-up.

Louis Beaumont and others added 4 commits June 9, 2026 12:19
Add an optional `tags` filter to the unified /search endpoint and the MCP
search-content tool, so an AI agent can pull every screen + audio capture
carrying a given tag (e.g. person:ada, project:atlas). Tags are the existing
junction-table labels written via POST /tags/:type/:id, and search results
already return them; this closes the loop by letting you filter on them.

- `tags` is comma-separated; multiple tags AND together (intersection).
- Backed by vision_tags / audio_tags, so content types without tag tables
  (input, accessibility, memory) return nothing when a tag filter is set.
  Memories keep their own tag filter via GET /memories?tags=.
- count_search_results agrees with the filtered result set so pagination
  totals stay correct.
- Public search() / count_search_results() signatures kept stable via
  search_with_tags() / count_search_results_with_tags() siblings, so the
  ~60 existing callers are untouched.

Namespaced tags (person:, project:, topic:) become a universal, AI-authored
link primitive: two captures sharing a tag are connected. That is the
foundation for an Obsidian-style graph over captures, openable to the AI
with no rigid per-entity schema.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make tags a coherent cross-store link the AI can actually use end to end,
not just a screen/audio filter.

- `GET /search?content_type=memory&tags=...` now filters memories by their
  JSON tags (exact membership, AND across multiple tags), via new
  list_memories/count_memories `tags_all` params. Previously the AI could
  tag a memory (update-memory) but had no way to retrieve memories by tag
  through search. content_type=all still never returns memories.
- One string namespace now spans three stores: vision_tags / audio_tags
  (screen + audio) and memories.tags (memory). The same `person:ada` on a
  frame and on a memory links them.
- MCP: add-tags / update-memory / search-content descriptions now teach the
  namespaced convention (person:/project:/topic:), how to retrieve by tag,
  and that frames are pruned so durable links belong on memories.
- screenpipe-api SKILL.md (shipped + repo copies): concise Tags section.
- Tests: add test_tag_filter_audio_and_cross_modal (audio + cross-modal all
  + exact-not-substring + input-gated + count) and test_memory_filter_by_tags
  (exact AND, no-substring, FTS compose, all-excludes-memory, count). 37 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mes)

Adds tests/tag_filter_bench.rs (#[ignore], run with --ignored) that seeds a
200k-frame / 60k-audio / 50k-memory DB with rare tags on the OLDEST rows
(adversarial vs ORDER BY timestamp DESC LIMIT) and dumps EXPLAIN QUERY PLAN
plus timings.

Findings: screen/audio tag filtering drives off the tag indexes (tags.name
UNIQUE + idx_vision_tags_tag_id) then PK-looks-up frames, so it never scans
all frames — the tag filter is ~7 ms vs ~127 ms for unfiltered OCR search at
200k frames (17x faster, since it narrows to the tagged set). Audio ~1 ms,
all ~8 ms, counts ~7-12 ms. Memory is the one linear path (memories.tags is
unindexed JSON → full scan + correlated json_each, ~16 ms @ 50k); fine at
realistic counts, junction table if memories ever hit millions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@louis030195 louis030195 merged commit 1d4a632 into main Jun 9, 2026
15 of 16 checks passed
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Diarization eval results

Source: crates/screenpipe-audio-eval/evals/ · VoxConverse dev (CC-BY-4.0) + composed workday templates + screenpipe-shaped LibriSpeech fixtures

fixture DER VAD FA VAD FN boundary err (s) continuity predicted / true spk
interrupted_meeting 0.186 0.01 0.063 20.286 0.833 9 / 5
long_silence_day 0.437 0.011 0.145 11.46 0.7 14 / 10
screenpipe_meeting_rapid_handoffs 0.241 0.196 0.099 2.305 1 5 / 3
screenpipe_background_24_7_day 0.315 0.025 0.159 2.203 1 4 / 3
screenpipe_short_backchannels 0.561 0.915 0.064 0.488 n/a 3 / 3
screenpipe_mic_system_echo_leakage 0.275 0.198 0.084 3.045 0.667 5 / 3
screenpipe_overlap_crosstalk 0.254 0.84 0.042 0.667 n/a 3 / 3
abjxc 0.016 0.098 0.002 1.151 n/a 2 / 1
bxpwa 0.111 0.453 0.029 20.793 0.714 8 / 5
dhorc 0.143 0.461 0.034 3.681 1 5 / 4

DER, VAD FA, VAD FN, boundary err: lower is better. Continuity: higher is better, 1.0 = same hyp cluster across all silence gaps. Composed workday rows and screenpipe_* rows exercise screenpipe-shaped usage: meetings, background gaps, backchannels, echo leakage, and crosstalk. Raw VoxConverse rows score broadcast-quality stems for comparison. See crates/screenpipe-audio-eval/evals/README.md for methodology.

Pipeline replay matrix

Source: generated screenpipe_* fixtures materialized into temp screenpipe SQLite DBs, then read back through search_audio. This catches storage/search regressions that pure DER scoring misses.

scenarios passed failed skipped avg background DER avg background speaker err Deepgram
41 40 0 1 0.329 0.183 skip

The no-secret CI matrix runs local diarization under Parakeet/Whisper engine labels across live/background and mic/system device profiles. Real Deepgram/screenpipe-cloud smoke can be run locally with --deepgram required when credentials are present.

Transcription quality

Source: LibriSpeech test-clean (CC-BY-4.0) · per-model utterance cap · normalized lowercased word-level Levenshtein

model utterances WER CER throughput (samples/s)
tiny 50 0.085 0.033 68797
whisper-large-v3-turbo-quantized 20 0.042 0.009 1846
parakeet 50 0.04 0.026 102813

WER + CER on read-aloud speech. Per-model utterance caps keep wall time bounded — tiny/parakeet at 50, the heavier large-v3-turbo-quantized at 20. See README for normalization rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant