Skip to content

Feature Request: Patch Storage Management and Cleanup #1635

@codemedic

Description

@codemedic

Problem

Patch files in ~/.cache/prek/patches/ accumulate indefinitely without cleanup, creating security and storage concerns for users working with proprietary code.

The Issue

Patches created during the stash/restore cycle remain on disk after successful restoration. While the original pre-commit maintainer stated they're kept "for exceptional cases to recover" (pre-commit #1969), there's no documented recovery procedure or commands that reference old patches.

This appears to be defensive programming - keeping patches "just in case" - but creates real problems for theoretical benefit.

Real-World Impact

Security Concerns

  1. Proprietary code exposure: Patches contain source code diffs from active development
  2. Insecure permissions: Cache directory created with rwxrwxr-x - other users on shared systems can read the code
  3. Indefinite retention: Sensitive code accumulates for weeks/months
  4. Backup exposure: Patches persist in backups if ~/.cache is included

Example: After 10 days of development, we had 75 patch files containing proprietary code readable by any user on the system.

Storage Concerns

  • Patches accumulate indefinitely (316KB for us in 10 days; pre-commit users reported 2.4GB caches with 158MB individual patches)
  • No automatic cleanup mechanism
  • prek cache gc doesn't touch patches

Current Workarounds

Users must manually manage patches:

# Fix permissions immediately
chmod 700 ~/.cache/prek/patches/

# Manual cleanup
find ~/.cache/prek/patches/ -name "*.patch" -mtime +7 -delete

# Cron job for automation
0 13 * * 1 find ~/.cache/prek/patches/ -name "*.patch" -mtime +7 -delete

# Nuclear option
prek cache clean  # Removes everything, requires re-downloading repos

Proposed Solutions

Option 1: Delete patches after successful restoration (recommended)

If the stash/restore cycle completes without error, the patch served its purpose. Delete it immediately.

// Pseudo-code
fn restore_stashed_changes(patch_file: Path) -> Result<()> {
    git_apply(patch_file)?;  // If this succeeds, we're done
    fs::remove_file(patch_file)?;  // Clean up immediately
    Ok(())
}

Add optional flag to preserve current behavior:

# .pre-commit-config.yaml
keep_patches: true  # Explicitly opt-in to retention

Option 2: Automatic retention policy

# .pre-commit-config.yaml
patch_retention:
  max_age_days: 7      # Delete patches older than 7 days
  max_count: 50        # Keep only last 50 patches
  max_size_mb: 100     # Delete oldest when total exceeds limit

Option 3: Enhanced prek cache gc

prek cache gc --patches              # Remove all patches
prek cache gc --patches-older 7d     # Remove patches older than 7 days
prek cache gc --patches-keep 10      # Keep only 10 most recent

Option 4: Secure permissions by default

  • Create patches directory with 700 permissions (owner-only)
  • Warn if directory has loose permissions
  • Document security implications

Use Cases

  1. Proprietary/commercial code: Cannot leave sensitive diffs readable on disk
  2. Shared development systems: Multiple users shouldn't see each other's code
  3. CI/CD environments: No need for recovery patches in automated runs
  4. Security compliance: Data retention policies prohibit indefinite code storage
  5. Storage-constrained systems: Large caches become problematic

Context

The original pre-commit tool had the same issue. The maintainer closed similar requests, treating patches as "implementation details" - but when those details contain proprietary code with insecure permissions, they become user concerns.

The design rationale for using patch files (instead of git stash) makes sense - avoiding data loss from stash conflicts (pre-commit #1505). However, keeping patches indefinitely after successful restoration doesn't follow the same logic.

Related issues:

Environment

  • prek version: 0.3.1
  • OS: Linux
  • Use case: Professional development with proprietary code

Bottom line: Patches serve a purpose during execution, but keeping them indefinitely creates real security risks for theoretical recovery benefits that may not exist in practice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ideaJust an idea

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions