This is something I ran into while drilling in to some database id's in a session history. If PII redaction is on, numeric ID's can be picked up in that. I'm not sure if there's a good way to differentiate -- could look for formats as Claude suggests below but that's brittle.
Feel free to close if useless.
One thought that would take more work: a more detailed option on what PII to redact -- add options to redact phone and email separately from other PII like passwords and OAuth creds.
Description
The phone number regex in PII removal is too broad — it matches 7-10 digit numeric sequences regardless of context, corrupting legitimate data like database IDs, timestamps in URLs, and numeric identifiers.
Evidence
Clipboard event captured via Screenpipe with usePiiRemoval: true:
- Input:
LinkedAccountId: 215805592
- Stored:
215[PHONE]8
The phone regex fires on the 9-digit database ID, corrupting the value. This is data-destructive — the original value cannot be recovered.
Affected File
crates/screenpipe-core/src/pii_removal.rs — the phone number pattern matches any 7-10 digit sequence.
Impact
- Corrupts database IDs in clipboard content, OCR text, and accessibility tree text
- Corrupts numeric identifiers in URLs (entity IDs, autofill params)
- Corrupts timestamps embedded in URL paths
- Data loss is permanent — redacted values cannot be reconstructed
Suggested Fix
Tighten the phone regex to require phone-specific context:
- Require country code prefix, parenthesized area code, or dash/dot separators
- Exempt pure digit sequences without formatting (likely IDs, not phones)
- Consider a word-boundary + formatting heuristic:
\b(?:\+\d{1,3}[\s-])?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}\b
Workaround
None — PII removal runs at capture time. Disabling usePiiRemoval exposes actual sensitive data.
This is something I ran into while drilling in to some database id's in a session history. If PII redaction is on, numeric ID's can be picked up in that. I'm not sure if there's a good way to differentiate -- could look for formats as Claude suggests below but that's brittle.
Feel free to close if useless.
One thought that would take more work: a more detailed option on what PII to redact -- add options to redact phone and email separately from other PII like passwords and OAuth creds.
Description
The phone number regex in PII removal is too broad — it matches 7-10 digit numeric sequences regardless of context, corrupting legitimate data like database IDs, timestamps in URLs, and numeric identifiers.
Evidence
Clipboard event captured via Screenpipe with
usePiiRemoval: true:LinkedAccountId: 215805592215[PHONE]8The phone regex fires on the 9-digit database ID, corrupting the value. This is data-destructive — the original value cannot be recovered.
Affected File
crates/screenpipe-core/src/pii_removal.rs— the phone number pattern matches any 7-10 digit sequence.Impact
Suggested Fix
Tighten the phone regex to require phone-specific context:
\b(?:\+\d{1,3}[\s-])?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}\bWorkaround
None — PII removal runs at capture time. Disabling
usePiiRemovalexposes actual sensitive data.