[bug] PII phone regex too aggressive — redacts database IDs and numeric identifiers as [PHONE]

This is something I ran into while drilling in to some database id's in a session history. If PII redaction is on, numeric ID's can be picked up in that. I'm not sure if there's a good way to differentiate -- could look for formats as Claude suggests below but that's brittle.

Feel free to close if useless.

One thought that would take more work: a more detailed option on what PII to redact -- add options to redact phone and email separately from other PII like passwords and OAuth creds.

## Description

The phone number regex in PII removal is too broad — it matches 7-10 digit numeric sequences regardless of context, corrupting legitimate data like database IDs, timestamps in URLs, and numeric identifiers.

## Evidence

Clipboard event captured via Screenpipe with `usePiiRemoval: true`:

- **Input**: `LinkedAccountId: 215805592`
- **Stored**: `215[PHONE]8`

The phone regex fires on the 9-digit database ID, corrupting the value. This is data-destructive — the original value cannot be recovered.

## Affected File

`crates/screenpipe-core/src/pii_removal.rs` — the phone number pattern matches any 7-10 digit sequence.

## Impact

- Corrupts database IDs in clipboard content, OCR text, and accessibility tree text
- Corrupts numeric identifiers in URLs (entity IDs, autofill params)
- Corrupts timestamps embedded in URL paths
- Data loss is permanent — redacted values cannot be reconstructed

## Suggested Fix

Tighten the phone regex to require phone-specific context:
- Require country code prefix, parenthesized area code, or dash/dot separators
- Exempt pure digit sequences without formatting (likely IDs, not phones)
- Consider a word-boundary + formatting heuristic: `\b(?:\+\d{1,3}[\s-])?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}\b`

## Workaround

None — PII removal runs at capture time. Disabling `usePiiRemoval` exposes actual sensitive data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] PII phone regex too aggressive — redacts database IDs and numeric identifiers as [PHONE] #2340

Description

Evidence

Affected File

Impact

Suggested Fix

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[bug] PII phone regex too aggressive — redacts database IDs and numeric identifiers as [PHONE] #2340

Description

Description

Evidence

Affected File

Impact

Suggested Fix

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions