Skip to content

parttimenerd/jfr-redact

jfr-redact

CI Maven Central Version

This is an early prototype of the SapMachine team, use at your own risk. We don't provide any guarantees regarding functionality or security.

A tool to redact sensitive information from Java Flight Recorder (JFR) recordings and text files, replacing it with ***.

Table of Contents

Quick Start

Redact a JFR file with default settings:

# Download the JAR from releases
java -jar jfr-redact.jar redact recording.jfr redacted.jfr

Redact a Java error log (hs_err_pid.log):*

java -jar jfr-redact.jar redact-text hs_err_pid12345.log hs_err_redacted.log

# Use the hserr preset optimized for crash reports:
java -jar jfr-redact.jar redact-text hs_err_pid12345.log --config hserr

That's it! The tool will automatically redact:

  • Passwords, tokens, API keys, and other sensitive properties
  • User home directories and file paths
  • Email addresses and IP addresses
  • System environment variables and process information

Features

  • Property Redaction: Redact sensitive properties in events with key and value fields
    • Patterns: password, passwort, pwd, secret, token, key, ... (case-insensitive)
  • Event Removal: Remove entire event types that could leak information
    • Examples: jdk.OSInformation, SystemProcess, InitialEnvironmentVariable, ProcessStart
  • Event Filtering: Advanced filtering similar to jfr scrub command (docs)
    • Filter by event name, category, or thread name
    • Supports glob patterns (*, ?) and comma-separated lists
    • Include/exclude filters with flexible combinations
  • String Pattern Redaction: Redact sensitive patterns in string fields
    • Home folders: /Users/[^/]+, C:\Users\[a-zA-Z0-9_\-]+, /home/[^/]+
    • Email addresses, UUIDs, IP addresses
    • Configurable to exclude method names, class names, or thread names
  • Two-Pass Discovery: Automatically discover sensitive values and redact them everywhere
    • First pass: Extract usernames, hostnames, and other values from patterns (e.g., extract johndoe from /Users/johndoe)
    • Second pass: Redact discovered values wherever they appear in the file
    • Configurable minimum occurrences and allowlists to reduce false positives
    • Use --discovery-mode=fast for single-pass (faster), --discovery-mode=default for two-pass (more thorough)
  • Words Mode: Discover and redact specific words/identifiers
    • Discover all distinct words in a file: jfr-redact words discover recording.jfr words.txt
    • Create rules to keep or redact specific words
    • Apply rules: jfr-redact words redact app.log redacted.log -r rules.txt
  • Network Redaction: Redact ports and addresses from SocketRead/SocketWrite events
  • Path Redaction: Redact directory paths while keeping filenames (configurable)
  • Pseudonymization: Preserve relationships between values while protecting data
    • Hash mode: Consistent mapping to pseudonyms (e.g., <redacted:a1b2c3>)
    • Counter mode: Sequential numbering (value1→1, value2→2)
    • Realistic mode: Generate plausible alternatives (e.g., john.doe@company.comalice.smith@test.com)
    • Custom replacements: Define specific mappings in config (e.g., johndoealice, /home/johndoe/home/testuser)
    • Optional, enabled via --pseudonymize flag
  • Text File Redaction: Apply the same redaction patterns to arbitrary text files
    • Perfect for redacting Java error logs (hs_err_pid*.log) which contain system properties, environment variables, and file paths

As a utility, you can also concatenate multiple JFR files into a single recording without redaction, saving space.

Installation

This tool requires Java 21 or higher.

As a Command-Line Tool

Download the standalone JAR or executable from the releases page.

Using JBang

jbang jfr-redact@parttimenerd/jfr-redact

As a Library

Use jfr-redact as a library to programmatically redact JFR files in your own applications:

<dependency>
  <groupId>me.bechberger</groupId>
  <artifactId>jfr-redact</artifactId>
  <version>0.2.1</version>
</dependency>

Usage

JFR File Redaction (redact command)

The redact command is specifically designed for Java Flight Recorder (JFR) files:

# Use default config (recommended for most cases)
java -jar jfr-redact.jar redact recording.jfr redacted.jfr

# Use strict preset (maximum redaction)
java -jar jfr-redact.jar redact recording.jfr redacted.jfr --config strict

# Use custom configuration file
java -jar jfr-redact.jar redact recording.jfr redacted.jfr --config my-config.yaml

# Enable pseudonymization to preserve relationships between values
java -jar jfr-redact.jar redact recording.jfr redacted.jfr --pseudonymize

# Filter events (similar to jfr scrub command)
# Keep only specific events
java -jar jfr-redact.jar redact recording.jfr redacted.jfr --include-events "jdk.ThreadSleep,jdk.JavaMonitorWait"

# Exclude specific event patterns
java -jar jfr-redact.jar redact recording.jfr redacted.jfr --exclude-events "jdk.GC*"

# Filter by category
java -jar jfr-redact.jar redact recording.jfr redacted.jfr --include-categories "Java Application"

# Filter by thread name
java -jar jfr-redact.jar redact recording.jfr redacted.jfr --exclude-threads "GC Thread*"

# Combine multiple filters
java -jar jfr-redact.jar redact recording.jfr redacted.jfr \
  --include-events "jdk.*" \
  --exclude-categories "Flight Recorder" \
  --exclude-threads "Service Thread"

Text File Redaction (redact-text command)

The redact-text command applies the same redaction patterns to arbitrary text files (logs, configuration files, error dumps, etc.). Note: Use this command for text files, not the redact command which only works with JFR files.

# Redact a Java error log file (hs_err_pid*.log)
# Uses the preset hserr by default
java -jar jfr-redact.jar redact-text hs_err_pid12345.log hs_err_pid12345.redacted.log

# Redact any text file with pseudonymization
java -jar jfr-redact.jar redact-text debug-output.txt debug-output.redacted.txt --pseudonymize

Supports piping from stdin and writing to stdout:

cat hs_err_pid12345.log | java -jar jfr-redact.jar redact-text - -

Words Mode

Discover and redact specific words/identifiers manually.

# Discover all distinct words in a file
java -jar jfr-redact.jar words discover recording.jfr words.txt

# Review words.txt and mark sensitive words with '-' prefix:
#   - secretpassword
#   - internalhost
#   + safe-to-keep

# Apply redaction rules
java -jar jfr-redact.jar words redact app.log redacted.log -r rules.txt

Concatenate JFR Files

Concatenate multiple JFR recordings into a single file without any redaction. This is useful for combining multiple recording sessions or chunks.

# Concatenate two JFR files
java -jar jfr-redact.jar concat one.jfr two.jfr -o combined.jfr

# Concatenate multiple files
java -jar jfr-redact.jar concat *.jfr -o all-recordings.jfr

# Concatenate with verbose output
java -jar jfr-redact.jar concat first.jfr second.jfr third.jfr -o merged.jfr --verbose

# Ignore empty files (with warning) instead of failing
java -jar jfr-redact.jar concat *.jfr -o merged.jfr -i

Note: The concat command performs no redaction - it simply merges the recordings as-is. If you need to redact the combined file, run the redact command on the output afterwards.

Command-Line Options

Redact Command (default) - Redact JFR recordings
Usage: jfr-redact redact [-hiqvV] [--debug] [--dry-run] [--pseudonymize]
                         [--stats] [--config=<preset|file|url>]
                         [--decisions-file=<file>] [--discovery-mode=<mode>]
                         [--min-occurrences=<count>]
                         [--pseudonymize-mode=<mode>] [--seed=<seed>]
                         [--add-redaction-regex=<pattern>]...
                         [--exclude-categories=<filter>]...
                         [--exclude-events=<filter>]...
                         [--exclude-threads=<filter>]...
                         [--include-categories=<filter>]...
                         [--include-events=<filter>]...
                         [--include-threads=<filter>]...
                         [--remove-event=<type>]... <input-file> [<output-file>]
Redact sensitive information from Java Flight Recorder (JFR) recordings
      <input-file>      Input file to redact
      [<output-file>]   Output file with redacted data (default: auto-generated)
      --add-redaction-regex=<pattern>
                        Add a custom regular expression pattern for string
                          redaction. This option can be specified multiple
                          times to add multiple patterns.
      --config=<preset|file|url>
                        Load configuration from a preset name (default, strict,
                          hserr), YAML file, or URL. If not specified, uses the
                          default preset. You can also create a config file
                          that inherits from a preset using 'parent:
                          <preset-name>'.
      --debug           Enable debug output (DEBUG level logging)
      --decisions-file=<file>
                        Path to file for storing interactive decisions
                          (default: <input>.decisions.yaml)
      --discovery-mode=<mode>
                        Pattern discovery mode. Valid values: none (no
                          discovery, single-pass), fast (on-the-fly discovery),
                          default (two-pass, reads file twice for complete
                          discovery). Default: default (two-pass). Note:
                          Per-pattern discovery is configured in the config
                          file via enable_discovery.
      --dry-run         Process the file without writing output, useful for
                          testing configuration with --stats
      --exclude-categories=<filter>
                        Exclude events matching a category name
                          (comma-separated list, supports glob patterns).
                          Similar to jfr scrub --exclude-categories.
      --exclude-events=<filter>
                        Exclude events matching an event name (comma-separated
                          list, supports glob patterns). Similar to jfr scrub
                          --exclude-events.
      --exclude-threads=<filter>
                        Exclude events matching a thread name (comma-separated
                          list, supports glob patterns). Similar to jfr scrub
                          --exclude-threads.
  -h, --help            Show this help message and exit.
  -i, --interactive     Enable interactive mode. Prompts for decisions about
                          discovered usernames, hostnames, folders, and custom
                          patterns. Decisions are saved to a file for future
                          automatic use. Note: Ignores the 'ignore' list from
                          config in interactive mode.
      --include-categories=<filter>
                        Select events matching a category name (comma-separated
                          list, supports glob patterns). Similar to jfr scrub
                          --include-categories.
      --include-events=<filter>
                        Select events matching an event name (comma-separated
                          list, supports glob patterns). Similar to jfr scrub
                          --include-events.
      --include-threads=<filter>
                        Select events matching a thread name (comma-separated
                          list, supports glob patterns). Similar to jfr scrub
                          --include-threads.
      --min-occurrences=<count>
                        Minimum occurrences required to redact a discovered
                          value (prevents false positives, default: 1)
      --pseudonymize    Enable pseudonymization mode. When enabled, the same
                          sensitive value always maps to the same pseudonym (e.
                          g., <redacted:a1b2c3>), preserving relationships.
                          Without this flag, all values are redacted to ***.
      --pseudonymize-mode=<mode>
                        Pseudonymization mode (requires --pseudonymize). Valid
                          values: hash (default, stateless deterministic),
                          counter (sequential numbers), realistic (plausible
                          alternatives like alice@example.com)
  -q, --quiet           Minimize output (only show errors and completion
                          message)
      --remove-event=<type>
                        Remove an additional event type from the output. This
                          option can be specified multiple times to remove
                          multiple event types.
      --seed=<seed>     Seed for reproducible pseudonymization (only with
                          --pseudonymize)
      --stats           Show statistics after redaction
  -v, --verbose         Enable verbose output (INFO level logging)
  -V, --version         Print version information and exit.

Examples:

  Simple redaction with default config:
    jfr-redact redact recording.jfr
    (creates recording.redacted.jfr using default preset)

  Specify output file:
    jfr-redact redact recording.jfr output.jfr

  Use strict preset:
    jfr-redact redact recording.jfr --config strict

  Use strict preset with pseudonymization:
    jfr-redact redact recording.jfr --config strict --pseudonymize

  Custom config file with additional event removal:
    jfr-redact redact recording.jfr --config my-config.yaml --remove-event jdk.
CustomEvent

  Add custom redaction pattern:
    jfr-redact redact recording.jfr --add-redaction-regex '\b[A-Z]{3}-\d{6}\b'
Redact-Text Command - Redact text files (logs, hs_err, etc.)
Usage: jfr-redact redact-text [-hqvV] [--debug] [--pseudonymize] [--stats]
                              [--config=<preset|file|url>]
                              [--pseudonymize-mode=<mode>] [--seed=<seed>]
                              [--add-redaction-regex=<pattern>]... <input-file>
                              [<output-file>]
Redact sensitive information from text files, especially hserr files, but also
logs, configuration files, etc.
      <input-file>      Input file to redact
      [<output-file>]   Output file with redacted data (default: auto-generated)
      --add-redaction-regex=<pattern>
                        Add a custom regular expression pattern for string
                          redaction. This option can be specified multiple
                          times to add multiple patterns.
      --config=<preset|file|url>
                        Load configuration from a preset name (default, strict,
                          hserr), YAML file, or URL. If not specified, uses the
                          default preset. You can also create a config file
                          that inherits from a preset using 'parent:
                          <preset-name>'.
      --debug           Enable debug output (DEBUG level logging)
  -h, --help            Show this help message and exit.
      --pseudonymize    Enable pseudonymization mode. When enabled, the same
                          sensitive value always maps to the same pseudonym (e.
                          g., <redacted:a1b2c3>), preserving relationships.
                          Without this flag, all values are redacted to ***.
      --pseudonymize-mode=<mode>
                        Pseudonymization mode (requires --pseudonymize). Valid
                          values: hash (default, stateless deterministic),
                          counter (sequential numbers), realistic (plausible
                          alternatives like alice@example.com)
  -q, --quiet           Minimize output (only show errors and completion
                          message)
      --seed=<seed>     Seed for reproducible pseudonymization (only with
                          --pseudonymize)
      --stats           Show statistics after redaction
  -v, --verbose         Enable verbose output (INFO level logging)
  -V, --version         Print version information and exit.

Examples:

  Redact a log file with default config (hserr preset):
    jfr-redact redact-text application.log
    (creates application.redacted.log)

  Redact Java crash reports (uses hserr preset by default):
    jfr-redact redact-text hs_err_pid12345.log

  Read from stdin, write to stdout:
    cat hs_err_pid12345.log | jfr-redact redact-text - -

  Use strict preset:
    jfr-redact redact-text app.log --config strict

  Custom config with pseudonymization:
    jfr-redact redact-text app.log --config my-config.yaml --pseudonymize

  Add custom redaction pattern:
    jfr-redact redact-text app.log --add-redaction-regex '\b[A-Z]{3}-\d{6}\b'
Generate-Config Command - Generate configuration templates
Usage: jfr-redact generate-config [-hqvV] [--debug] [--minimal] [-o=<file>]
                                  [<preset|output.yaml>]
Generate a configuration template for JFR redaction
      [<preset|output.yaml>]
                        Preset name to generate config from (default, strict,
                          hserr), or output file path. If not specified or is a
                          preset name, generates full template.
      --debug           Enable debug output (DEBUG level logging)
  -h, --help            Show this help message and exit.
      --minimal         Generate minimal configuration template
  -o, --output=<file>   Output file for the configuration
  -q, --quiet           Minimize output (only show errors and completion
                          message)
  -v, --verbose         Enable verbose output (INFO level logging)
  -V, --version         Print version information and exit.

Examples:

  Generate default template to stdout:
    jfr-redact generate-config

  Generate template to file:
    jfr-redact generate-config -o my-config.yaml

  Generate config from a preset (default, strict, or hserr):
    jfr-redact generate-config default -o my-config.yaml
    jfr-redact generate-config strict -o strict.yaml

  Generate minimal config:
    jfr-redact generate-config --minimal -o minimal-config.yaml

  Quick way to use a preset:
    echo 'parent: strict' > strict.yaml
    jfr-redact redact recording.jfr --config strict.yaml
Test/Validate Command - Test or validate configuration
Usage: jfr-redact test [-hqvV] [--debug] [--pseudonymize]
                       [--config=<preset|file|url>] [--event=<type>]
                       [--property=<name>] [--pseudonymize-mode=<mode>]
                       [--seed=<seed>] [--thread=<name>] [--value=<value>]
Test configuration by showing how specific values would be redacted
Also validates configuration when run without test values
      --config=<preset|file|url>
                          Load configuration from a preset name (default,
                            strict, hserr), YAML file, or URL. If not
                            specified, uses the default preset. You can also
                            create a config file that inherits from a preset
                            using 'parent: <preset-name>'.
      --debug             Enable debug output (DEBUG level logging)
      --event=<type>      Event type to test (e.g., jdk.JavaMonitorEnter)
  -h, --help              Show this help message and exit.
      --property=<name>   Property/field name to test (e.g., address, message)
      --pseudonymize      Enable pseudonymization mode
      --pseudonymize-mode=<mode>
                          Pseudonymization mode (requires --pseudonymize).
                            Valid values: hash (default, stateless
                            deterministic), counter (sequential numbers),
                            realistic (plausible alternatives like
                            alice@example.com)
  -q, --quiet             Minimize output (only show errors and completion
                            message)
      --seed=<seed>       Seed for reproducible pseudonymization (only with
                            --pseudonymize)
      --thread=<name>     Thread name to test filtering
  -v, --verbose           Enable verbose output (INFO level logging)
  -V, --version           Print version information and exit.
      --value=<value>     Value to test redaction on

Examples:

  Validate a configuration:
    jfr-redact test --config my-config.yaml
    jfr-redact validate --config my-config.yaml

  Test a property redaction:
    jfr-redact test --config my-config.yaml --event jdk.JavaMonitorEnter
--property address --value '0x7f8a4c001000'

  Test thread name filtering:
    jfr-redact test --config my-config.yaml --thread 'MyThread-1'

  Test string redaction:
    jfr-redact test --config strict --value 'user@example.com'
Generate-Schema Command - Generate JSON Schema for IDE support
Usage: jfr-redact generate-schema [-hqvV] [--debug] [<output.json>]
Generate JSON Schema for the YAML configuration files
      [<output.json>]   Output file for the JSON schema (default: stdout)
      --debug           Enable debug output (DEBUG level logging)
  -h, --help            Show this help message and exit.
  -q, --quiet           Minimize output (only show errors and completion
                          message)
  -v, --verbose         Enable verbose output (INFO level logging)
  -V, --version         Print version information and exit.

Examples:

  Generate schema to stdout:
    jfr-redact generate-schema

  Generate schema to a file:
    jfr-redact generate-schema config-schema.json
Concat Command - Concatenate multiple JFR files
Usage: jfr-redact concat [-hivV] -o=<output.jfr> <input.jfr>...
Concatenate multiple JFR recordings into a single file without any redaction
      <input.jfr>...   Input JFR files to concatenate
  -h, --help           Show this help message and exit.
  -i, --ignore-empty   Ignore empty files (with a warning) instead of failing
  -o, --output=<output.jfr>
                       Output JFR file (required)
  -v, --verbose        Enable verbose output
  -V, --version        Print version information and exit.

Examples:

  Concatenate two JFR files:
    jfr-redact concat one.jfr two.jfr -o combined.jfr

  Concatenate multiple files:
    jfr-redact concat *.jfr -o all-recordings.jfr

  Ignore empty files (with warning):
    jfr-redact concat *.jfr -o merged.jfr -i
Words Command - Discover and redact words/identifiers
Usage: jfr-redact words [-hV] [COMMAND]
Discover and redact words/strings in JFR events or text files
  -h, --help      Show this help message and exit.
  -V, --version   Print version information and exit.
Commands:
  discover  Discover all distinct strings in JFR events or text files
  redact    Apply word redaction rules to JFR events or text files
Usage: jfr-redact words discover [-hV] [--ignore-classes] [--ignore-methods]
                                 [--ignore-modules] [--ignore-packages]
                                 [--ignore-events=<ignoreEventTypes>[,
                                 <ignoreEventTypes>...]]... <inputFile>
                                 <outputFile>
Discover all distinct strings in JFR events or text files
      <inputFile>         Input JFR file or text file to analyze
      <outputFile>        Output file for discovered words
  -h, --help              Show this help message and exit.
      --ignore-classes    Ignore class names (default: true)
      --ignore-events=<ignoreEventTypes>[,<ignoreEventTypes>...]
                          Event types to ignore (comma-separated)
      --ignore-methods    Ignore method names (default: true)
      --ignore-modules    Ignore module names (default: true)
      --ignore-packages   Ignore package names (default: true)
  -V, --version           Print version information and exit.

Examples:

  Discover words from JFR file and save to file:
    jfr-redact words discover recording.jfr words.txt

  Discover words from text file:
    jfr-redact words discover application.log words.txt

  Include method and class names (normally ignored):
    jfr-redact words discover recording.jfr words.txt --ignore-methods=false
--ignore-classes=false

  Ignore specific event types:
    jfr-redact words discover recording.jfr words.txt --ignore-events=jdk.
GarbageCollection,jdk.ThreadSleep
Usage: jfr-redact words redact [-hV] [-r=<rulesFile>] <inputFile> <outputFile>
Apply word redaction rules to JFR events or text files
      <inputFile>           Input JFR file or text file to redact
      <outputFile>          Output file for redacted content
  -h, --help                Show this help message and exit.
  -r, --rules=<rulesFile>   File containing redaction rules (default: stdin)
  -V, --version             Print version information and exit.

Rule Format (one rule per line):
  - word              Redact this word (replace with ***)
  + word              Keep this word (allowlist, don't redact)
  - prefix*           Redact all words starting with 'prefix'
  - *suffix           Redact all words ending with 'suffix'
  - *contains*        Redact all words containing 'contains'
  - *any*glob*        Redact all words matching the '.*any.*glob.*' pattern
with globs
  - /regex/           Redact all words matching the given regex pattern
  ! pattern repl      Replace the redaction pattern with 'repl' instead of ***
  # comment           Comment line (ignored)
  (empty lines)       Ignored
  other lines         Ignored (no -, +, or ! prefix)

Examples:

  Redact using rules file:
    jfr-redact words redact app.log redacted.log -r rules.txt

  Redact using rules from stdin:
    echo "- secretpassword" | jfr-redact words redact app.log redacted.log

  Example rules.txt:
    # Redact specific sensitive values
    - secretpassword
    - internalhost.corp.com

    # Redact all words starting with 'secret'
    - secret*

    # Keep safe words (allowlist)
    + localhost
    + example.com
    # Ignore everything else
    nonlocalhost.corp.com

Configuration

  • Preset names: default, strict, hserr
  • File paths: ./my-parent-config.yaml, /absolute/path/to/config.yaml
  • URLs: https://example.com/configs/base.yaml, file:///path/to/config.yaml
A customizable template is available at config-template.yaml
# Save as: my-config.yaml

# You can base your configuration on a preset and override specific options
# Or build from scratch by commenting out the parent line
#   parent: default

# ============================================================================
# Pattern Discovery - Automatically discover and redact sensitive values
# ============================================================================
# Discovery mode controls HOW discovery is performed (globally)
# Per-pattern settings (min_occurrences, case_sensitive, allowlist) are configured
# individually for each pattern type under strings.patterns
discovery:
  mode: default  # Options: none, fast, default (two-pass)

  # Property-based extraction - discover values from JFR event properties
  # Extracts values based on property key names (e.g., "user.name" -> extract username)
  # Supports two modes:
  #   1. Direct field matching: event.userName = "john" (matches field name "userName")
  #   2. Key-value pair matching: event.key = "user.name", event.value = "john"
  property_extractions:
  # Example: Extract usernames from properties like user.name, username, etc.
  # - name: "user_name_property"
  #   description: "Extract usernames from JFR event properties"
  #   key_pattern: "(?i)(user\\.name|username|user_name|user)"  # Regex to match property key
  #   key_property_pattern: "key"          # Property name for key in key-value pairs (default: "key")
  #   value_pattern: ".*"                  # Regex to match value content (default: ".*")
  #   value_property_pattern: "value"      # Property name for value in key-value pairs (default: "value")
  #   event_type_filter: ".*"              # Optional: only process specific event types (regex)
  #   type: USERNAME                       # USERNAME, HOSTNAME, EMAIL_LOCAL_PART, or CUSTOM
  #   case_sensitive: false                # Case sensitivity for discovered values
  #   min_occurrences: 1                   # Minimum occurrences to redact
  #   enabled: true

  # Example with custom key-value property names:
  # - name: "config_hostname"
  #   key_pattern: "server\\.host"
  #   key_property_pattern: "configKey"    # Custom property name for key
  #   value_property_pattern: "configValue"  # Custom property name for value
  #   type: HOSTNAME

  # Example with value pattern filtering:
  # - name: "corporate_emails"
  #   key_pattern: ".*email.*"
  #   value_pattern: ".*@company\\.com"    # Only extract @company.com emails
  #   type: EMAIL_LOCAL_PART

  # Note: Allowlists are handled by discovery_allowlist in strings.patterns
  # The property extractor respects the same allowlist as the pattern type

  # Custom extraction patterns - define your own patterns to discover
  # These are independent from strings.patterns and can extract any type of value
  custom_extractions:

  # Example 1: Extract usernames from SSH connection strings
  # - name: "ssh_usernames"
  #   description: "Extract usernames from SSH commands like 'user@hostname'"
  #   pattern: '([a-zA-Z0-9_-]+)@[a-zA-Z0-9.-]+'  # Captures username before @
  #   capture_group: 1         # Extract group 1 (the username)
  #   type: USERNAME           # Categorize as USERNAME (options: USERNAME, HOSTNAME, EMAIL_LOCAL_PART, CUSTOM)
  #   case_sensitive: false    # Treat "Alice", "alice", "ALICE" as same
  #   min_occurrences: 2       # Only redact if appears 2+ times
  #   allowlist:               # Never redact these usernames
  #     - root
  #     - admin
  #     - git
  #   enabled: true

  # Example 2: Extract build usernames from build logs
  # - name: "build_user"
  #   description: "Username from build info"
  #   pattern: 'built on .* by "([^"]+)"'
  #   capture_group: 1
  #   type: USERNAME
  #   case_sensitive: false
  #   min_occurrences: 1
  #   allowlist:
  #     - jenkins
  #   enabled: true

  # Example 3: Extract hostnames from URLs
  # - name: "url_hostnames"
  #   description: "Extract hostnames from HTTP/HTTPS URLs"
  #   pattern: 'https?://([a-zA-Z0-9.-]+)/'
  #   capture_group: 1
  #   type: HOSTNAME
  #   case_sensitive: false
  #   min_occurrences: 1
  #   allowlist:
  #     - localhost
  #     - example.com
  #   enabled: true

  # Example 4: Extract project codes (custom type)
  # - name: "project_codes"
  #   description: "Extract project identifiers like PROJ-ABC123"
  #   pattern: 'PROJ-([A-Z0-9]+)'
  #   capture_group: 1
  #   type: CUSTOM           # Will be categorized as custom
  #   case_sensitive: true   # Project codes are case-sensitive
  #   min_occurrences: 1
  #   enabled: true

# Property redaction - matches patterns in field names
properties:
  enabled: true
  case_sensitive: false  # If true, patterns are case-sensitive

  # Full match mode: if true, pattern must match entire field name
  # If false (default), pattern can match anywhere in field name
  # Example with pattern "password":
  #   full_match=false: matches "password", "user_password", "myPasswordField"
  #   full_match=true:  matches only "password" (exact match)
  full_match: false

  patterns:  # Regex patterns to match in field names
    - (pass(word|wort|wd)?|pwd)  # Matches: password, passwort, passwd, pwd
    - secret
    - token
    - (api[_-]?)?key       # Matches: key, api_key, api-key, apikey
    - auth
    - credential

    # - myapp_secret
    # - custom_token

# Event removal - completely remove these event types from the recording
events:
  remove_enabled: true
  removed_types:
    - jdk.OSInformation
    - jdk.SystemProcess
    - jdk.InitialEnvironmentVariable
    - jdk.ProcessStart
    # Add additional event types to remove:
    # - jdk.SystemProperty
    # - jdk.NativeLibrary

  # Advanced filtering (similar to jfr scrub command)
  # See: https://docs.oracle.com/en/java/javase/21/docs/specs/man/jfr.html
  # Filters are comma-separated lists and support glob patterns (* and ?)
  filtering:
    # Include only events matching these patterns (if specified, only matching events are kept)
    include_events: []
    # Examples:
    # - jdk.ThreadSleep,jdk.JavaMonitorWait  # Only these specific events
    # - jdk.*                                 # All JDK events
    # - my.app.*                              # All events from my.app package

    # Exclude events matching these patterns
    exclude_events: []
    # Examples:
    # - jdk.GCPhasePause*                    # Exclude all GC phase pause events
    # - jdk.ThreadSleep                      # Exclude thread sleep events

    # Include only events from these categories
    include_categories: []
    # Examples:
    # - Java Application                     # Only application events
    # - Java Virtual Machine                 # Only JVM events

    # Exclude events from these categories
    exclude_categories: []
    # Examples:
    # - Flight Recorder                      # Exclude JFR internal events

    # Include only events from these threads
    include_threads: []
    # Examples:
    # - main                                 # Only main thread
    # - worker-*                             # All worker threads

    # Exclude events from these threads
    exclude_threads: []
    # Examples:
    # - GC Thread*                           # Exclude all GC threads
    # - Service Thread                       # Exclude service thread

# String pattern redaction - redact matching patterns in string fields
strings:
  enabled: true

  # Normally you don't want to redact code artifacts
  redact_in_method_names: false
  redact_in_class_names: false
  redact_in_thread_names: false

  patterns:
    # Home directory paths - discovers usernames from paths
    home_directories:
      enabled: true

      # === Discovery Settings (per-pattern) ===

      # Enable pattern discovery: Extract usernames and redact them everywhere
      # If false, only the full path is redacted (e.g., "/Users/alice" redacted, but not standalone "alice")
      # If true, extracts "alice" and redacts it everywhere in the file
      discovery:
        enabled: true

        # Which regex capture group contains the value to extract (1 = first group)
        capture_group: 1

        # Minimum occurrences before a discovered value is redacted (prevents false positives)
        # Only values appearing at least this many times will be redacted
        min_occurrences: 1

        # Case sensitivity for discovered value matching
        # If false, "Alice", "alice", and "ALICE" are treated as the same value
        case_sensitive: false

        # Allowlist of values that should NEVER be discovered/redacted by this pattern
        # Useful for common/generic usernames
        allowlist:
          - root
          - admin
          - test
          - user
          - guest
          - system
          # Add pattern-specific safe values:
          # - jenkins
          # - builduser

      # Regex patterns for matching (with capture groups for extraction)
      patterns:
        - '/Users/([^/]+)'                    # macOS: /Users/username (group 1 = username)
        - 'C:\\Users\\([a-zA-Z0-9_\-]+)'     # Windows: C:\Users\username (group 1 = username)
        - '/home/[^/]+'                     # Linux: /home/username

    # Email addresses
    emails:
      enabled: true
      patterns:
        - '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

    # UUIDs (often used as identifiers)
    uuids:
      enabled: false  # Set to true if UUIDs are sensitive in your context
      regex: '[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}'

    # IP addresses
    ip_addresses:
      enabled: true
      patterns:
        - '\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b'
        - '\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b'

    # SSH host patterns - redact hostnames in SSH connection strings
    # Matches: user@hostname, ssh://hostname, hostname:port
    ssh_hosts:
      enabled: false  # Set to true if SSH hosts are sensitive
      patterns:
        - 'ssh://[a-zA-Z0-9.-]+'                           # ssh://hostname
        - '(?:ssh|sftp)://(?:[^@]+@)?[a-zA-Z0-9.-]+'      # ssh://user@hostname
        - '[a-zA-Z0-9_-]+@[a-zA-Z0-9.-]+(?::[0-9]+)?'     # user@host or user@host:port
        - '(?<=ssh\s)[a-zA-Z0-9_-]+@[a-zA-Z0-9.-]+'       # after "ssh " command

    # Custom patterns - add your own regex patterns here
    custom:
    # Example: AWS access keys (no discovery - just redact the pattern itself)
    # - name: aws_access_keys
    #   patterns:
    #     - 'AKIA[0-9A-Z]{16}'
    #   discovery:
    #     enabled: false  # Only redact "AKIA..." keys, don't extract parts

    # Example: Build IDs with discovery
    # - name: build_ids
    #   patterns:
    #     - 'build-([A-Z0-9]+)-\d+'  # e.g., build-ABC123-001
    #   discovery:
    #     enabled: true           # Extract "ABC123" and redact everywhere
    #     capture_group: 1        # Group 1 = the build code
    #
    #   # Optional: ignore certain values
    #   ignore_exact:
    #     - JENKINS  # Don't redact if the build code is "JENKINS"
    #
    #   # Optional: ignore patterns
    #   ignore:
    #     - 'TEST.*'  # Don't redact build codes starting with TEST

    # Example: JWT tokens (no discovery - just redact full tokens)
    # - name: jwt_tokens
    #   patterns:
    #     - 'eyJ[a-zA-Z0-9_-]+\.eyJ[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+'
    #   discovery:
    #     enabled: false

# Network event redaction - redact addresses/ports in socket events
network:
  enabled: true
  redact_ports: true
  redact_addresses: true
  keep_local_addresses: false  # Set to true to preserve localhost/127.0.0.1
  event_types:
    - jdk.SocketRead
    - jdk.SocketWrite

# Path redaction - control how file paths are redacted
paths:
  enabled: true
  mode: keep_filename  # Options: keep_filename, redact_all, keep_all
  # keep_filename: /path/to/***/ and filename
  # redact_all: complete path becomes ***
  # keep_all: path unchanged
  fields:
    - path
    - directory
    - file
    - destination

# General settings
general:
  redaction_text: "***"  # Text to replace redacted values with

  # Partial redaction - show some info while hiding sensitive parts
  # When false: "my_secret_password" -> "***"
  # When true:  "my_secret_password" -> "my***" (shows prefix/suffix)
  # Useful for: debugging (identify which value without exposing it),
  #             compliance (show value format without actual data),
  #             log analysis (distinguish between different redacted values)
  partial_redaction: false

  # Pseudonymization - preserves relationships between values
  # When enabled, the same input value always maps to the same redacted output
  # e.g., "user@example.com" -> "<redacted:a1b2c3>" (consistent across the recording)
  pseudonymization:
    enabled: false  # Set to true to enable pseudonymization

    # Pseudonymization mode:
    # - "hash": Hash-based (stateless, deterministic, default)
    #           No state required, same value always produces same hash
    #           Best for: Most use cases, low memory, deterministic
    # - "counter": Simple counter (stateful, requires hash map)
    #              Maps values to sequential numbers: value1->1, value2->2
    #              Best for: Debugging, smaller output, when you want readable IDs
    # - "realistic": Generates plausible-looking alternatives (stateful)
    #                Replaces sensitive data with realistic alternatives
    #                Examples: "john.doe@company.com" -> "alice.smith@test.com"
    #                          "/home/johndoe" -> "/home/user01"
    #                          "johndoe" -> "user01"
    #                Best for: Creating shareable test data, demos, public bug reports
    mode: "hash"

    format: "redacted"  # Options: "redacted", "hash", "custom"
    # - redacted: <redacted:abc123>
    # - hash: <hash:abc123>
    # - custom: use custom_prefix and custom_suffix
    custom_prefix: "<redacted:"  # Used when format is "custom"
    custom_suffix: ">"
    hash_length: 8  # Length of hash suffix (6-32), only for mode="hash"
    hash_algorithm: "SHA-256"  # Options: SHA-256, SHA-1, MD5, only for mode="hash"

    # Scope of pseudonymization - what types of redacted values to pseudonymize
    scope:
      properties: true      # Property values (passwords, tokens, etc.)
      strings: true         # String pattern matches (emails, IPs, etc.)
      network: true         # Network addresses
      paths: true           # File paths
      ports: true           # Port numbers (always uses counter, mapped to 1000+ range)
      # Example: port 8080 -> 1001, port 443 -> 1002

    # Custom replacements for specific values (highest priority, overrides all modes)
    # Map exact values to specific replacements
    # Useful for replacing known usernames, email addresses, or paths
    replacements:
    # Example username replacements:
    # "johndoe": "alice"
    # "admin": "user01"

    # Example email replacements:
    # "john.doe@company.com": "user@example.com"
    # "admin@internal.net": "contact@test.org"

    # Example path replacements:
    # "/home/johndoe": "/home/testuser"
    # "C:\\Users\\JohnDoe": "C:\\Users\\TestUser"
    # "/Users/johndoe": "/Users/testuser"

    # Pattern-based replacement generators (using RgxGen)
    # Define regex patterns for generating realistic replacements by pattern type
    #
    # Two modes of operation:
    # 1. Redaction mode (pseudonymization disabled):
    #    - Generates a random value from the pattern each time
    #    - Used for simple redaction with ***
    #    - Example: "user42" -> "user73" (random each time)
    #
    # 2. Pseudonymization mode (pseudonymization enabled):
    #    - Generates consistent deterministic mappings
    #    - Same input always produces same output
    #    - Example: "user42" -> "user17" (always the same)
    #    - Warns if pattern has too few possible values (<100 recommended)
    #
    # ============================================================================
    # IMPORTANT: Special placeholders
    # ============================================================================
    #
    # Special placeholders are automatically replaced with realistic data:
    #   {users}  - Realistic user folder names (alice, bob, charlie, diana, etc.)
    #   {emails} - Realistic email addresses (alice.smith@example.com, etc.)
    #   {names}  - Realistic usernames (alice.smith, bob.jones, etc.)
    #
    # These placeholders are replaced with equivalent regex patterns before
    # regex generation, so they work seamlessly with any regex pattern.
    #
    # YAML ESCAPING RULES (for regex special characters):
    # In YAML strings, backslash is an escape character, so:
    #   - To match a literal dot (.):     use \\. in YAML (becomes \. in regex)
    #   - To match a literal backslash:   use \\\\ in YAML (becomes \\ in regex)
    #
    # EXAMPLES:
    #   Unix home with placeholder:
    #     "/home/{users}"               → generates "/home/alice"
    #
    #   Windows home with placeholder:
    #     "C:\\\\Users\\\\{users}"      → generates "C:\Users\alice"
    #     Note: \\\\ in YAML becomes \\ in regex (matches single backslash)
    #
    #   Email domain:
    #     "[a-z]+@example\\.com"        → generates "user@example.com"
    #     (\\. becomes \. in regex, matches literal dot)
    #
    #   IP addresses:
    #     "10\\.0\\.[0-9]{1,3}\\.[0-9]{1,3}"  → generates "10.0.123.45"
    #
    #   Mixed path and placeholder:
    #     "/data/{users}/files"         → generates "/data/bob/files"
    #
    #   Multiple placeholders:
    #     "/home/{users} owned by {names}"  → generates "/home/alice owned by bob.smith"
    #
    #   Server logs with pattern and placeholder:
    #     "srv[0-9]{2}/{users}/app\\.log"   → generates "srv42/charlie/app.log"
    #
    # ============================================================================
    pattern_generators:
    # SSH host patterns - generates hostnames matching the regex
    # "ssh_hosts": "host[0-9]{2}\\.example\\.com"

    # IP address patterns - generates IP addresses in specific ranges
    # "ip_addresses": "10\\.0\\.[0-9]{1,3}\\.[0-9]{1,3}"
    # "ipv4_private": "192\\.168\\.[0-9]{1,3}\\.[0-9]{1,3}"

    # Username patterns - generates consistent usernames
    # "usernames": "user[0-9]{3}"
    # "service_accounts": "svc_[a-z]{4}[0-9]{2}"

    # User path patterns with {users} placeholder
    # "unix_home": "/home/{users}"
    # "mac_home": "/Users/{users}"
    # "win_home": "C:\\\\Users\\\\{users}"

    # Temporary file patterns
    # "temp_files": "temp_[a-z0-9]{8}"
    # "session_ids": "[a-f0-9]{32}"

    # Email patterns with placeholder
    # "user_emails": "{emails}"
    # "internal_emails": "[a-z]{5}\\.[a-z]{5}@internal\\.example\\.com"

    # Custom application-specific patterns
    # "app_tokens": "tok_[A-Za-z0-9]{16}"
    # "customer_ids": "CUST[0-9]{8}"

# Usage examples:
#
# Use this custom config:
#   java -jar jfr-redact.jar input.jfr output.jfr --config my-config.yaml
#
# Start with a preset and override:
#   java -jar jfr-redact.jar input.jfr output.jfr --preset strict --keep-local-addresses
#
# Enable pseudonymization to preserve relationships:
#   java -jar jfr-redact.jar input.jfr output.jfr --config my-config.yaml --pseudonymize
#
# Use pseudonymization with custom format:
#   java -jar jfr-redact.jar input.jfr output.jfr --pseudonymize --pseudonym-format hash
#
# Test without creating output:
#   java -jar jfr-redact.jar input.jfr output.jfr --config my-config.yaml --dry-run --verbose

Development

To preview changes without modifying files:

./sync-documentation.py --dry-run

To install as a git pre-commit hook (auto-syncs on commit):

./sync-documentation.py --install

Requires: GitHub CLI (gh)

Git Hooks Setup

The bin/sync-documentation.py script keeps documentation in sync and can install a pre-commit hook:

# Install pre-commit hook (runs tests and syncs docs on every commit)
./bin/sync-documentation.py --install

# Manually sync documentation
./bin/sync-documentation.py

# Preview changes without modifying files
./bin/sync-documentation.py --dry-run

The pre-commit hook will:

  1. Run mvn test to ensure all tests pass
  2. Sync version from Version.java to pom.xml
  3. Update README.md with latest configuration examples

To skip the hook temporarily: git commit --no-verify

Deployment

To release a new version to Maven Central and GitHub Releases, run release.py (requires GitHub CLI (gh)).

IDE Support for Configuration Files

The project automatically generates a JSON Schema (config-schema.json) during build, enabling autocomplete and validation for YAML configuration files.

Getting the Schema:

  • Build locally: mvn package && java -jar target/jfr-redact.jar generate-schema config-schema.json
  • Download from CI: Check the Actions tab and download the config-schema artifact from recent builds

VS Code: The schema reference is already included in config files:

# yaml-language-server: $schema=./config-schema.json

You'll get autocomplete and validation automatically when editing config files.

IntelliJ IDEA: The schema reference should work automatically. To configure manually:

  • Go to Settings → Languages & Frameworks → Schemas and DTDs → JSON Schema Mappings
  • Add mapping for *.yaml files to config-schema.json

Support, Feedback, Contributing

This project is open to feature requests/suggestions, bug reports etc. via GitHub issues. Contribution and feedback are encouraged and always welcome.

License

MIT, Copyright 2025 SAP SE or an SAP affiliate company, Johannes Bechberger and contributors

About

Redact sensitive information from JFR files

Resources

License

Stars

Watchers

Forks

Packages