Skip to content

[plan] add content inspection for sensitive data patterns #308

@github-actions

Description

@github-actions

Objective

Implement optional DLP (Data Loss Prevention) scanning in Squid to detect and prevent exfiltration of API keys, tokens, and credentials in outgoing requests.

Context

Current state: Domain allowlisting restricts which hosts can be contacted, but doesn't inspect request content.

Risk: Attacker could encode sensitive data (API keys, tokens) in HTTP requests to allowed domains (e.g., creating GitHub gists with credentials).

Risk level: 🟡 MEDIUM - Information disclosure via allowed domains

Implementation Approach

  1. Add --enable-dlp flag to enable content inspection (opt-in for performance)
  2. Define regex patterns for common credential formats:
    • GitHub tokens: ghp_[A-Za-z0-9]{36}, gho_[A-Za-z0-9]{36}, ghs_[A-Za-z0-9]{36}
    • OpenAI API keys: sk-[A-Za-z0-9]{48}
    • AWS keys: AKIA[0-9A-Z]{16}
    • Generic patterns: [Aa]pi[_-]?[Kk]ey, [Tt]oken
  3. Use Squid's adaptation_service_set with ICAP or eCAP adapter
  4. Log blocked requests with [DLP_BLOCKED] prefix
  5. Return 403 Forbidden when sensitive pattern detected

Files to Modify

  • src/cli.ts - Add --enable-dlp flag
  • src/squid-config.ts - Generate ICAP/eCAP configuration when enabled
  • src/types.ts - Add DLP config to WrapperConfig
  • containers/squid/dlp-adapter.sh - Simple ICAP adapter script for pattern matching
  • containers/squid/Dockerfile - Install ICAP adapter dependencies
  • README.md - Document DLP feature and detected patterns

Testing

  • Test detection of GitHub personal access token (ghp_...)
  • Test detection of OpenAI API key (sk-...)
  • Test detection of AWS access key (AKIA...)
  • Verify legitimate requests without credentials pass through
  • Confirm DLP only active when --enable-dlp flag used

Acceptance Criteria

  • --enable-dlp flag enables content inspection
  • GitHub tokens detected and blocked
  • OpenAI API keys detected and blocked
  • AWS credentials detected and blocked
  • Blocked requests logged with [DLP_BLOCKED] prefix
  • Performance impact documented (< 10% latency increase)
  • Feature documented with security benefits
    Related to [plan] security improvements from threat model review #306

AI generated by Plan Command for discussion #302

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions