Skip to content

[bug] PII phone regex too aggressive — redacts database IDs and numeric identifiers as [PHONE] #2340

@railapex

Description

@railapex

This is something I ran into while drilling in to some database id's in a session history. If PII redaction is on, numeric ID's can be picked up in that. I'm not sure if there's a good way to differentiate -- could look for formats as Claude suggests below but that's brittle.

Feel free to close if useless.

One thought that would take more work: a more detailed option on what PII to redact -- add options to redact phone and email separately from other PII like passwords and OAuth creds.

Description

The phone number regex in PII removal is too broad — it matches 7-10 digit numeric sequences regardless of context, corrupting legitimate data like database IDs, timestamps in URLs, and numeric identifiers.

Evidence

Clipboard event captured via Screenpipe with usePiiRemoval: true:

  • Input: LinkedAccountId: 215805592
  • Stored: 215[PHONE]8

The phone regex fires on the 9-digit database ID, corrupting the value. This is data-destructive — the original value cannot be recovered.

Affected File

crates/screenpipe-core/src/pii_removal.rs — the phone number pattern matches any 7-10 digit sequence.

Impact

  • Corrupts database IDs in clipboard content, OCR text, and accessibility tree text
  • Corrupts numeric identifiers in URLs (entity IDs, autofill params)
  • Corrupts timestamps embedded in URL paths
  • Data loss is permanent — redacted values cannot be reconstructed

Suggested Fix

Tighten the phone regex to require phone-specific context:

  • Require country code prefix, parenthesized area code, or dash/dot separators
  • Exempt pure digit sequences without formatting (likely IDs, not phones)
  • Consider a word-boundary + formatting heuristic: \b(?:\+\d{1,3}[\s-])?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}\b

Workaround

None — PII removal runs at capture time. Disabling usePiiRemoval exposes actual sensitive data.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions