Skip to content

fix: wire --ignored-urls through to capture — flag parsed but never applied#3878

Merged
louis030195 merged 3 commits into
screenpipe:mainfrom
guillaumedeshayes:fix/ignored-urls-wiring
Jun 8, 2026
Merged

fix: wire --ignored-urls through to capture — flag parsed but never applied#3878
louis030195 merged 3 commits into
screenpipe:mainfrom
guillaumedeshayes:fix/ignored-urls-wiring

Conversation

@guillaumedeshayes

Copy link
Copy Markdown
Contributor

description

--ignored-urls is parsed by the CLI and counted in telemetry, but the values never reach the capture layer — so URL privacy filtering silently does nothing:

  • TreeWalkerConfig has no ignored_urls field, and VisionManagerConfig never carries it from RecorderConfig
  • both production WindowFilters::new(...) call sites in event_driven_capture.rs pass &[] as the URL list
  • is_url_blocked is fully implemented and unit-tested, but always runs against an empty list → always returns false

Looks like an incomplete refactor: when window filters moved into TreeWalkerConfig, ignored_windows/included_windows came along but ignored_urls was orphaned. App filtering works; URL filtering doesn't.

This PR:

  1. plumbs ignored_urls through RecorderConfig → VisionManagerConfig → TreeWalkerConfig and fixes both &[] call sites (frames leg — SCK window exclusion + title fallback)
  2. adds URL filtering to the a11y tree walker (new UrlFilteredWalker decorator in create_tree_walker): the walker extracts the focused tab's real URL but never consulted the ignore list, so accessibility text from blocked pages was captured even once (1) is fixed. The decorator drops the snapshot with a new SkipReason::BlockedUrl, uniformly across macOS/Windows/Linux, exact host match (stronger than the title heuristic available to the vision leg)
  3. lifts the domain matcher into screenpipe_a11y::url_filter (dependency direction is screen → a11y) and has WindowFilters::is_url_blocked delegate to it, so the two capture paths can't drift apart. Matching semantics unchanged — all existing is_url_blocked tests pass as-is

related issue: none filed — happy to open one if preferred.

before

$ screenpipe record --data-dir /tmp/sp-url-test --ignored-urls linkedin.com
# browse linkedin.com (focused) for ~20s, then a control site

$ sqlite3 /tmp/sp-url-test/db.sqlite \
    "SELECT COUNT(*) FROM frames WHERE browser_url LIKE '%linkedin%'"
3         ← frames from the "ignored" URL captured anyway (main @ 0.4.8)
$ sqlite3 /tmp/sp-url-test/db.sqlite \
    "SELECT COUNT(*) FROM elements e JOIN frames f ON e.frame_id=f.id
     WHERE f.browser_url LIKE '%linkedin%'"
97        ← full accessibility text of the page captured too

after

$ sqlite3 /tmp/sp-url-test/db.sqlite \
    "SELECT COUNT(*) FROM frames WHERE browser_url LIKE '%linkedin%'"
0
$ sqlite3 /tmp/sp-url-test/db.sqlite \
    "SELECT COUNT(*) FROM elements e JOIN frames f ON e.frame_id=f.id
     WHERE f.browser_url LIKE '%linkedin%'"
0
# control site captured normally in the same session:
$ sqlite3 /tmp/sp-url-test/db.sqlite \
    "SELECT COUNT(*) FROM elements e JOIN frames f ON e.frame_id=f.id
     WHERE f.browser_url LIKE '%github%'"
918

how to test

  1. cargo build --release --bin screenpipe
  2. ./target/release/screenpipe record --data-dir /tmp/sp-url-test --port 3043 --disable-audio --ignored-urls linkedin.com
  3. browse linkedin.com in a focused browser window for ~20s, then any other site
  4. sqlite3 /tmp/sp-url-test/db.sqlite "SELECT COUNT(*) FROM frames WHERE browser_url LIKE '%linkedin%'" → expect 0, while the other site's frames/elements are present

test status

  • cargo test -p screenpipe-a11y --lib → 199 passed (6 test_get_clipboard_* tests skipped — 3 of them fail identically on unpatched main, pre-existing/environment-dependent pasteboard round-trips; see fix(a11y/macos): avoid main-queue deadlock in headless clipboard reads #3877 for the baseline)
  • cargo test -p screenpipe-screen --lib → 118 passed, including all 11 pre-existing is_url_blocked tests now running through the shared matcher
  • cargo test -p screenpipe-engine --lib → 636 passed, 1 failed: sleep_monitor::tests::test_recently_woke_flag — fails identically on unpatched main in the full parallel run and passes in isolation on both (pre-existing concurrency flake, unrelated)
  • new unit tests: 7 for url_filter::is_url_blocked, 3 for the UrlFilteredWalker decorator (block / pass-other-URL / pass-non-browser)

… but never applied

TreeWalkerConfig had no ignored_urls field, so the two WindowFilters
call sites passed &[] and is_url_blocked always returned false. URLs
configured via --ignored-urls produced both frames and accessibility
snapshots.

- plumb ignored_urls RecorderConfig → VisionManagerConfig → TreeWalkerConfig
- block at both legs: SCK/window exclusion (frames) and a new
  UrlFilteredWalker decorator in create_tree_walker (a11y snapshots,
  uniform across platforms)
- lift the domain matcher into screenpipe-a11y::url_filter (dependency
  direction is screen → a11y); WindowFilters delegates to it

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@louis030195 louis030195 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated by the screenpipe pr-review pipe (https://screenpi.pe), not written by a human — reply and tag @louis030195 if it got something wrong.


// For patterns without a TLD (e.g. "chase" instead of "chase.com"),
// expand across common TLDs at domain boundaries.
if !blocked.contains('.') {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not split host by '.' and check segments? avoids hardcoding tlds

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call — done in f0b8315.

for context: that hardcoded ["com","net","org","bank"] loop wasn't new in this PR, it's the existing WindowFilters::is_url_blocked logic on main that I lifted verbatim into the shared url_filter matcher (kept semantics identical so the 11 existing tests passed as-is). so this fix improves both capture paths at once.

new logic for TLD-less patterns is just a domain-label match:

if !blocked.contains('.') {
    return host_lower.split('.').any(|label| label == blocked);
}

so chase now blocks chase.com, chase.co.uk, chase.io, online.chase.de — any TLD, no allowlist — while domain boundaries still hold (purchase.com splits to ["purchase","com"], never matches). strictly broader than before; for a privacy filter over-blocking is the safe direction. added test coverage for the new TLDs + the purchase.co.uk non-match.

…label

Replace the hardcoded com/net/org/bank TLD list with label-level matching:
a TLD-less pattern (e.g. "chase") now matches any TLD — chase.com,
chase.co.uk, chase.io — without an allowlist, while still respecting
domain boundaries so "chase" never matches "purchase.com".

Per review feedback on screenpipe#3878. Strictly broader than before; all existing
is_url_blocked tests pass unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@louis030195

Copy link
Copy Markdown
Collaborator

Thanks for catching this. The diagnosis is correct and the fix is sound. I traced do_capture -> walk_accessibility_tree -> create_tree_walker and confirmed the UrlFilteredWalker decorator engages in the real capture path, and that Skipped(BlockedUrl) drops the whole capture (frame + elements) via the existing result: None gate, which is why your after-numbers go to 0/0. Lifting the matcher into screenpipe_a11y::url_filter so both legs share one implementation is the right call.

Two capture paths still leak after this PR though, both on the pixel side (the a11y text and the focused frame are properly handled now):

1. HD recorder is a third WindowFilters::new call site, still passing &[]. In crates/screenpipe-engine/src/hd_recorder.rs the recorder builds WindowFilters::new(&config.ignored_windows, &config.included_windows, &[]), and HdRecorderConfig has no ignored_urls field at all. So during an HD session, high-fps video of a blocked site is captured unfiltered (the HD leg has no a11y walk, so the decorator never runs there). The struct comment already claims "privacy parity with the event-driven capture loop" for ignored windows, so URL parity is a reasonable expectation it currently misses. Plumbing ignored_urls into HdRecorderConfig (it is built in vision_manager/manager.rs) closes it.

2. URL-only config skips SCK window exclusion. get_excluded_sck_window_ids early-returns empty when both ignore_patterns and include_patterns are empty, without checking ignored_urls. So a user who sets only --ignored-urls (no ignored/included windows) gets no exclusion of a blocked site sitting in an unfocused but visible window: its pixels still land in the full-monitor screenshot. The focused case is fine (the whole frame is dropped via the decorator). One-liner:

if window_filters.ignore_patterns.is_empty()
    && window_filters.include_patterns.is_empty()
    && window_filters.ignored_urls.is_empty()
{
    return Vec::new();
}

Neither is a regression (on main, ignored_urls did nothing at all), so this is already a clear improvement. I am happy to merge as-is and file a follow-up for the HD leg + the guard, or if you want to fold them into this PR that works too. Your call.

…SCK URL-only)

Follow-up to wiring --ignored-urls through capture. The a11y text and the
focused frame are now filtered, but two pixel paths still leaked:

1. HD recorder ignored ignored_urls entirely. hd_recorder built
   WindowFilters::new(.., .., &[]) and HdRecorderConfig had no ignored_urls
   field, so an HD session recorded high-fps video of a blocked site
   unfiltered (the HD leg has no a11y walk, so the decorator never runs).
   Plumb ignored_urls: VisionManagerConfig -> HdRecorderConfig -> WindowFilters,
   matching the existing ignored/included-window privacy parity.

2. get_excluded_sck_window_ids early-returned empty when both window pattern
   lists were empty, ignoring ignored_urls. A URL-only config (--ignored-urls
   with no window filters) got no SCK exclusion, so a blocked site in an
   unfocused-but-visible window still landed in the full-monitor screenshot.
   Add ignored_urls to the guard; is_valid already calls
   is_title_suggesting_blocked_url, which catches the unfocused window by title.

Neither was a regression (on main ignored_urls did nothing). Per maintainer
review on screenpipe#3878.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@guillaumedeshayes

Copy link
Copy Markdown
Contributor Author

thanks for the careful trace — both confirmed and folded into this PR (72c3964), since they're genuine privacy leaks and the fix feels incomplete leaving the pixel side open.

1. HD recorder — plumbed ignored_urls through VisionManagerConfig → HdRecorderConfig → WindowFilters::new, replacing the &[] at the third call site. Mirrors the existing ignored/included-window parity, so the struct's "privacy parity" comment now holds for URLs too.

2. SCK URL-only guard — added && window_filters.ignored_urls.is_empty() to the get_excluded_sck_window_ids early-return, as you suggested. Worth noting why the one-liner is sufficient: once we don't bail, the loop calls is_valid(), which already invokes is_title_suggesting_blocked_url() — so the unfocused blocked window gets excluded by the title heuristic. No extra logic needed.

Neither is a regression (on main ignored_urls did nothing). screenpipe-engine + screenpipe-screen build clean. Ready for another look.

@louis030195 louis030195 merged commit 70c018e into screenpipe:main Jun 8, 2026
louis030195 pushed a commit that referenced this pull request Jun 10, 2026
The BlockedUrl variant was added to SkipReason (a11y) in #3878 but the
ee/sdk recorder-core tag match was never updated, leaving a
non-exhaustive match (E0004) that broke the SDK build on main. The
a11y crate compiled fine in Rust CI because Rust CI doesn't build
ee/sdk, so the break only surfaced in the SDK workflow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants