fix: wire --ignored-urls through to capture — flag parsed but never applied#3878
Conversation
… but never applied TreeWalkerConfig had no ignored_urls field, so the two WindowFilters call sites passed &[] and is_url_blocked always returned false. URLs configured via --ignored-urls produced both frames and accessibility snapshots. - plumb ignored_urls RecorderConfig → VisionManagerConfig → TreeWalkerConfig - block at both legs: SCK/window exclusion (frames) and a new UrlFilteredWalker decorator in create_tree_walker (a11y snapshots, uniform across platforms) - lift the domain matcher into screenpipe-a11y::url_filter (dependency direction is screen → a11y); WindowFilters delegates to it Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
louis030195
left a comment
There was a problem hiding this comment.
generated by the screenpipe pr-review pipe (https://screenpi.pe), not written by a human — reply and tag @louis030195 if it got something wrong.
|
|
||
| // For patterns without a TLD (e.g. "chase" instead of "chase.com"), | ||
| // expand across common TLDs at domain boundaries. | ||
| if !blocked.contains('.') { |
There was a problem hiding this comment.
why not split host by '.' and check segments? avoids hardcoding tlds
There was a problem hiding this comment.
good call — done in f0b8315.
for context: that hardcoded ["com","net","org","bank"] loop wasn't new in this PR, it's the existing WindowFilters::is_url_blocked logic on main that I lifted verbatim into the shared url_filter matcher (kept semantics identical so the 11 existing tests passed as-is). so this fix improves both capture paths at once.
new logic for TLD-less patterns is just a domain-label match:
if !blocked.contains('.') {
return host_lower.split('.').any(|label| label == blocked);
}so chase now blocks chase.com, chase.co.uk, chase.io, online.chase.de — any TLD, no allowlist — while domain boundaries still hold (purchase.com splits to ["purchase","com"], never matches). strictly broader than before; for a privacy filter over-blocking is the safe direction. added test coverage for the new TLDs + the purchase.co.uk non-match.
…label Replace the hardcoded com/net/org/bank TLD list with label-level matching: a TLD-less pattern (e.g. "chase") now matches any TLD — chase.com, chase.co.uk, chase.io — without an allowlist, while still respecting domain boundaries so "chase" never matches "purchase.com". Per review feedback on screenpipe#3878. Strictly broader than before; all existing is_url_blocked tests pass unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for catching this. The diagnosis is correct and the fix is sound. I traced Two capture paths still leak after this PR though, both on the pixel side (the a11y text and the focused frame are properly handled now): 1. HD recorder is a third 2. URL-only config skips SCK window exclusion. if window_filters.ignore_patterns.is_empty()
&& window_filters.include_patterns.is_empty()
&& window_filters.ignored_urls.is_empty()
{
return Vec::new();
}Neither is a regression (on |
…SCK URL-only) Follow-up to wiring --ignored-urls through capture. The a11y text and the focused frame are now filtered, but two pixel paths still leaked: 1. HD recorder ignored ignored_urls entirely. hd_recorder built WindowFilters::new(.., .., &[]) and HdRecorderConfig had no ignored_urls field, so an HD session recorded high-fps video of a blocked site unfiltered (the HD leg has no a11y walk, so the decorator never runs). Plumb ignored_urls: VisionManagerConfig -> HdRecorderConfig -> WindowFilters, matching the existing ignored/included-window privacy parity. 2. get_excluded_sck_window_ids early-returned empty when both window pattern lists were empty, ignoring ignored_urls. A URL-only config (--ignored-urls with no window filters) got no SCK exclusion, so a blocked site in an unfocused-but-visible window still landed in the full-monitor screenshot. Add ignored_urls to the guard; is_valid already calls is_title_suggesting_blocked_url, which catches the unfocused window by title. Neither was a regression (on main ignored_urls did nothing). Per maintainer review on screenpipe#3878. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
thanks for the careful trace — both confirmed and folded into this PR (72c3964), since they're genuine privacy leaks and the fix feels incomplete leaving the pixel side open. 1. HD recorder — plumbed 2. SCK URL-only guard — added Neither is a regression (on |
The BlockedUrl variant was added to SkipReason (a11y) in #3878 but the ee/sdk recorder-core tag match was never updated, leaving a non-exhaustive match (E0004) that broke the SDK build on main. The a11y crate compiled fine in Rust CI because Rust CI doesn't build ee/sdk, so the break only surfaced in the SDK workflow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
description
--ignored-urlsis parsed by the CLI and counted in telemetry, but the values never reach the capture layer — so URL privacy filtering silently does nothing:TreeWalkerConfighas noignored_urlsfield, andVisionManagerConfignever carries it fromRecorderConfigWindowFilters::new(...)call sites inevent_driven_capture.rspass&[]as the URL listis_url_blockedis fully implemented and unit-tested, but always runs against an empty list → always returnsfalseLooks like an incomplete refactor: when window filters moved into
TreeWalkerConfig,ignored_windows/included_windowscame along butignored_urlswas orphaned. App filtering works; URL filtering doesn't.This PR:
ignored_urlsthroughRecorderConfig → VisionManagerConfig → TreeWalkerConfigand fixes both&[]call sites (frames leg — SCK window exclusion + title fallback)UrlFilteredWalkerdecorator increate_tree_walker): the walker extracts the focused tab's real URL but never consulted the ignore list, so accessibility text from blocked pages was captured even once (1) is fixed. The decorator drops the snapshot with a newSkipReason::BlockedUrl, uniformly across macOS/Windows/Linux, exact host match (stronger than the title heuristic available to the vision leg)screenpipe_a11y::url_filter(dependency direction is screen → a11y) and hasWindowFilters::is_url_blockeddelegate to it, so the two capture paths can't drift apart. Matching semantics unchanged — all existingis_url_blockedtests pass as-isrelated issue: none filed — happy to open one if preferred.
before
after
how to test
cargo build --release --bin screenpipe./target/release/screenpipe record --data-dir /tmp/sp-url-test --port 3043 --disable-audio --ignored-urls linkedin.comsqlite3 /tmp/sp-url-test/db.sqlite "SELECT COUNT(*) FROM frames WHERE browser_url LIKE '%linkedin%'"→ expect 0, while the other site's frames/elements are presenttest status
cargo test -p screenpipe-a11y --lib→ 199 passed (6test_get_clipboard_*tests skipped — 3 of them fail identically on unpatchedmain, pre-existing/environment-dependent pasteboard round-trips; see fix(a11y/macos): avoid main-queue deadlock in headless clipboard reads #3877 for the baseline)cargo test -p screenpipe-screen --lib→ 118 passed, including all 11 pre-existingis_url_blockedtests now running through the shared matchercargo test -p screenpipe-engine --lib→ 636 passed, 1 failed:sleep_monitor::tests::test_recently_woke_flag— fails identically on unpatchedmainin the full parallel run and passes in isolation on both (pre-existing concurrency flake, unrelated)url_filter::is_url_blocked, 3 for theUrlFilteredWalkerdecorator (block / pass-other-URL / pass-non-browser)