-
Notifications
You must be signed in to change notification settings - Fork 614
[EPIC][SECURITY][PLUGINS]: PII Advanced filter (Presidio + pattern library) #2553
Description
🛡️ Epic: PII Advanced Filter Plugin (Presidio + Pattern Library + Compliance)
Goal
Deliver a production‑grade PII detection and anonymization plugin using Microsoft Presidio, with:
- high‑confidence detection for common and regulated entities
- deterministic masking/redaction strategies
- configurable thresholds and allow/deny lists
- strong test coverage and real‑world examples
The plugin should operate as both native gateway plugin and standalone MCP server (stdio/HTTP), and be safe to run by default.
Why Now?
PII handling is a top compliance and security requirement for customers, and ContextForge already provides hooks ideal for this:
- Regulatory pressure: GDPR, HIPAA, PCI‑DSS require enforceable masking controls
- Enterprise adoption: Large tenants demand consistent PII handling across prompts, tools, resources
- A2A + Federation: PII needs to be scrubbed before leaving trust boundaries
- Operational safety: Baseline plugin must be reliable and easy to enable without false positives or odd replacements
- Developer velocity: A first‑class PII plugin reduces custom per‑deployment work
📖 User Stories
US‑1: Platform Admin - Enable PII protection globally
As a Platform Administrator
I want to enable a single PII plugin that covers prompts, tools, and resources
So that all traffic is scrubbed automatically before leaving the gateway
Acceptance Criteria:
Given plugins are enabled in the gateway
When I configure PIIAdvancedPlugin with hooks:
- prompt_post_fetch
- tool_post_invoke
- resource_post_fetch
Then PII is detected and anonymized consistently
And the response contains no raw PIIUS‑2: Security Engineer - Reliable SSN + Date Handling
As a Security Engineer
I want SSNs to be detected reliably, and dates to be masked clearly
So that users never see odd or misleading replacements
Acceptance Criteria:
Given use_pattern_library=true and US_SSN enabled
When input contains "My SSN is 123-45-6789"
Then SSN is detected and masked to ***-**-6789
Given DATE_TIME masking is configured as [DATE]
When input contains "2024-01-01"
Then it is replaced with [DATE]US‑3: Compliance Officer - Entity Policies + Audit Trail
As a Compliance Officer
I want entity‑specific strategies and thresholds with audit logging
So that I can prove compliance and tune sensitivity
Acceptance Criteria:
Given entity_thresholds and anonymization_strategies are configured
When the plugin runs
Then only entities above threshold are returned
And each entity uses its configured strategy
And audit logging records decisionsUS‑4: Developer - Extend with Custom Recognizers
As a Developer
I want to add custom recognizers via config
So that I can detect organization‑specific identifiers
Acceptance Criteria:
Given a custom recognizer for EMPLOYEE_ID
When input contains EMP-123456
Then EMPLOYEE_ID is detected and masked✅ Acceptance Criteria (Epic)
- Presidio‑based detection with spaCy NLP integration
- Pattern library for regex‑based detection (SSN, phone, etc.)
- Entity‑specific thresholds and allow/deny lists
- Deterministic masking strategies (incl. DATE_TIME placeholder)
- Configurable anonymization strategies per entity
- Works for prompt/tool/resource hooks
- Standalone usage supported (CLI + MCP server)
- Tests for core detection and edge cases
- Documentation with sample patterns and troubleshooting
🧠 Design Notes
Key Config Options
use_pattern_library: true
anonymization_strategies:
EMAIL_ADDRESS: "mask"
PHONE_NUMBER: "mask"
US_SSN: "mask"
DATE_TIME: "mask"
masking_patterns:
DATE_TIME: "[DATE]"
entity_thresholds:
PHONE_NUMBER: 0.4Behavior Principles
- No odd replacements: avoid static fake values for dates by default
- Deterministic placeholders: allow explicit
[DATE],[SSN], etc. - Safe defaults: pattern library on, key entities enabled
- Extensible: custom recognizers via config
🧰 THE WORKS! (Implementation Checklist)
- Core detector: Presidio analyzer + spaCy NLP
- Pattern library with common + regulated patterns
- Per‑entity thresholds + allow/deny lists
- Anonymization strategies (mask/redact/hash/encrypt)
- DATE_TIME placeholder support
- Gateway config defaults updated
- Standalone usage (CLI + MCP server integration)
- Test suite expanded (SSN, phone, credit card, date)
- Documentation: README + TESTING guide
- Benchmarks and validation script
🔗 Related
- External plugins via MCP (stdio/HTTP)
- Plugin framework hooks
- PII basic filter