Skip to content

perf: use Aho-Corasick for efficient redaction#7931

Merged
jdx merged 4 commits intomainfrom
perf/aho-corasick-redactions
Jan 31, 2026
Merged

perf: use Aho-Corasick for efficient redaction#7931
jdx merged 4 commits intomainfrom
perf/aho-corasick-redactions

Conversation

@jdx
Copy link
Owner

@jdx jdx commented Jan 31, 2026

Summary

  • Replace naive O(n*m) redaction with Aho-Corasick automaton that finds all patterns in single pass - O(n + z)
  • Add Redactor struct that caches the compiled automaton
  • Fix logger to redact once instead of separately for file and terminal output (was doing double work when both enabled)

Changes

  • Add aho-corasick as direct dependency (was already transitive via regex)
  • Add Redactor struct to redactions.rs with cached Aho-Corasick automaton
  • Update Config to use Redactor instead of IndexSet<String>
  • Update CmdLineRunner to use Redactor for output redaction
  • Refactor logger to redact once at the top of log() before rendering

Test plan

  • Unit tests for Redactor pass
  • E2E redaction tests pass (e2e/tasks/test_task_redactions, e2e/cli/test_env_redacted_flags)
  • All lints pass

🤖 Generated with Claude Code


Note

Medium Risk
Moderate risk because it changes the core redaction implementation used for logs and command/task output; bugs could lead to missed or incorrect redactions despite being mostly a performance refactor.

Overview
Switches string redaction from repeated str::replace loops to a cached Aho-Corasick automaton via a new Redactor type (with unit tests), and adds aho-corasick as a direct dependency.

Wires the new redactor through Config (replacing the global IndexSet storage), CmdLineRunner output handling, and task command rendering, and refactors Logger::log to redact once and reuse the sanitized message for both terminal and file outputs.

Written by Cursor Bugbot for commit 9287026. This will update automatically on new commits. Configure here.

Replace the naive O(n*m) redaction implementation with Aho-Corasick
automaton that finds all patterns in a single pass - O(n + z) where
n is input length and z is number of matches.

Changes:
- Add Redactor struct with cached Aho-Corasick automaton
- Update Config to use Redactor instead of IndexSet<String>
- Update CmdLineRunner to use Redactor
- Fix logger to redact once instead of separately for file/terminal
- Add aho-corasick as direct dependency (was already transitive via regex)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings January 31, 2026 16:46
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes string redaction by replacing the naive O(n*m) pattern-matching approach with the Aho-Corasick algorithm, achieving O(n + z) complexity. The change introduces a Redactor struct that caches the compiled automaton and refactors the logger to perform redaction once instead of separately for each output destination.

Changes:

  • Introduced Redactor struct with Aho-Corasick automaton for efficient multi-pattern matching
  • Refactored Config to use Redactor instead of IndexSet<String> for redactions
  • Updated logger to redact once before rendering for both file and terminal outputs

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Cargo.toml Added aho-corasick dependency
src/redactions.rs Added Redactor struct with Aho-Corasick-based redaction and comprehensive tests
src/config/mod.rs Refactored to use Redactor instead of IndexSet<String>, updated redaction methods
src/cmd.rs Updated CmdLineRunner to use Redactor for output redaction
src/logger.rs Refactored to redact once at the top of log() before rendering
src/task/task_executor.rs Updated redact calls to pass string references

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

input = input.replace(redaction, "[redacted]");
}
input
Arc::new(_REDACTOR.lock().unwrap().patterns().clone())
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method creates a new Arc wrapping a clone of the patterns on every call. Since patterns() already returns a reference to an Arc<IndexSet<String>> internally, consider returning Arc::clone(&self.patterns) from within the Redactor to avoid the double allocation and clone.

Suggested change
Arc::new(_REDACTOR.lock().unwrap().patterns().clone())
Arc::clone(_REDACTOR.lock().unwrap().patterns())

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +40
let args = if config::is_loaded() {
Config::get_().redact(&args)
} else {
args
};
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redaction is performed even when both will_log_file and will_log_term would never use the result. Consider moving this redaction inside the conditional blocks after checking will_log_file || will_log_term to avoid unnecessary work when logging is disabled for the current level.

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +103
pub fn redact_string(&self, input: String) -> String {
if self.automaton.is_none() {
return input;
}
self.redact(&input)
}
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redact_string method doesn't actually redact in place as its documentation suggests. It converts to &str, calls redact() which allocates a new String, and discards the original. This provides no performance benefit over redact(). Consider either implementing true in-place redaction using AhoCorasick::try_stream_replace_all or removing this method.

Copilot uses AI. Check for mistakes.
@jdx
Copy link
Owner Author

jdx commented Jan 31, 2026

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is ON, but a Cloud Agent failed to start.

@github-actions
Copy link

github-actions bot commented Jan 31, 2026

Hyperfine Performance

mise x -- echo

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.1.12 x -- echo 20.3 ± 0.6 18.9 22.7 1.00
mise x -- echo 20.8 ± 0.8 19.3 23.2 1.03 ± 0.05

mise env

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.1.12 env 19.8 ± 0.8 18.5 25.8 1.00
mise env 20.1 ± 0.7 18.6 22.2 1.02 ± 0.05

mise hook-env

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.1.12 hook-env 20.4 ± 0.5 19.2 21.8 1.00
mise hook-env 20.7 ± 0.8 19.3 23.4 1.02 ± 0.05

mise ls

Command Mean [ms] Min [ms] Max [ms] Relative
mise-2026.1.12 ls 18.1 ± 0.5 17.0 22.1 1.00
mise ls 18.6 ± 0.7 17.2 20.4 1.02 ± 0.05

xtasks/test/perf

Command mise-2026.1.12 mise Variance
install (cached) 111ms 111ms +0%
ls (cached) 70ms 70ms +0%
bin-paths (cached) 74ms 74ms +0%
task-ls (cached) 534ms 532ms +0%

- Remove unused `redactor()` method from Config
- Remove unused `redact_string()` and `is_empty()` from Redactor
- Add `patterns_arc()` to avoid double allocation in `redactions()`
- Add fallback to naive approach if automaton fails to build

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jdx jdx merged commit 8ebb406 into main Jan 31, 2026
35 checks passed
@jdx jdx deleted the perf/aho-corasick-redactions branch January 31, 2026 17:44
mise-en-dev added a commit that referenced this pull request Feb 1, 2026
### 🚀 Features

- **(edit)** add interactive config editor (`mise edit`) by @jdx in
[#7930](#7930)
- **(lockfile)** graduate lockfiles from experimental by @jdx in
[#7929](#7929)
- **(task)** add support for usage values in task confirm dialog by
@roele in [#7924](#7924)
- **(task)** improve source freshness checking with edge case handling
by @jdx in [#7932](#7932)

### 🐛 Bug Fixes

- **(activate)** preserve ordering of paths appended after mise activate
by @jdx in [#7919](#7919)
- **(install)** sort failed installations for deterministic error output
by @jdx in [#7936](#7936)
- **(lockfile)** preserve URL and prefer sha256 when merging platform
info by @jdx in [#7923](#7923)
- **(lockfile)** add atomic writes and cache invalidation by @jdx in
[#7927](#7927)
- **(templates)** use sha256 for hash filter instead of blake3 by @jdx
in [#7925](#7925)
- **(upgrade)** respect tracked configs when pruning old versions by
@jdx in [#7926](#7926)

### 🚜 Refactor

- **(progress)** migrate from indicatif to clx by @jdx in
[#7928](#7928)

### 📚 Documentation

- improve clarity on uvx and pipx dependencies by @ygormutti in
[#7878](#7878)

### ⚡ Performance

- **(install)** use Kahn's algorithm for dependency scheduling by @jdx
in [#7933](#7933)
- use Aho-Corasick for efficient redaction by @jdx in
[#7931](#7931)

### 🧪 Testing

- remove flaky test_http_version_list test by @jdx in
[#7934](#7934)

### Chore

- use github backend instead of ubi in mise.lock by @jdx in
[#7922](#7922)

### New Contributors

- @ygormutti made their first contribution in
[#7878](#7878)
lucasew pushed a commit to lucasew/CONTRIB-mise that referenced this pull request Feb 18, 2026
## Summary
- Replace naive O(n*m) redaction with Aho-Corasick automaton that finds
all patterns in single pass - O(n + z)
- Add `Redactor` struct that caches the compiled automaton
- Fix logger to redact once instead of separately for file and terminal
output (was doing double work when both enabled)

## Changes
- Add `aho-corasick` as direct dependency (was already transitive via
regex)
- Add `Redactor` struct to `redactions.rs` with cached Aho-Corasick
automaton
- Update `Config` to use `Redactor` instead of `IndexSet<String>`
- Update `CmdLineRunner` to use `Redactor` for output redaction
- Refactor logger to redact once at the top of `log()` before rendering

## Test plan
- [x] Unit tests for `Redactor` pass
- [x] E2E redaction tests pass (`e2e/tasks/test_task_redactions`,
`e2e/cli/test_env_redacted_flags`)
- [x] All lints pass

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Moderate risk because it changes the core redaction implementation
used for logs and command/task output; bugs could lead to missed or
incorrect redactions despite being mostly a performance refactor.
> 
> **Overview**
> Switches string redaction from repeated `str::replace` loops to a
cached Aho-Corasick automaton via a new `Redactor` type (with unit
tests), and adds `aho-corasick` as a direct dependency.
> 
> Wires the new redactor through `Config` (replacing the global
`IndexSet` storage), `CmdLineRunner` output handling, and task command
rendering, and refactors `Logger::log` to redact once and reuse the
sanitized message for both terminal and file outputs.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
9287026. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
lucasew pushed a commit to lucasew/CONTRIB-mise that referenced this pull request Feb 18, 2026
### 🚀 Features

- **(edit)** add interactive config editor (`mise edit`) by @jdx in
[jdx#7930](jdx#7930)
- **(lockfile)** graduate lockfiles from experimental by @jdx in
[jdx#7929](jdx#7929)
- **(task)** add support for usage values in task confirm dialog by
@roele in [jdx#7924](jdx#7924)
- **(task)** improve source freshness checking with edge case handling
by @jdx in [jdx#7932](jdx#7932)

### 🐛 Bug Fixes

- **(activate)** preserve ordering of paths appended after mise activate
by @jdx in [jdx#7919](jdx#7919)
- **(install)** sort failed installations for deterministic error output
by @jdx in [jdx#7936](jdx#7936)
- **(lockfile)** preserve URL and prefer sha256 when merging platform
info by @jdx in [jdx#7923](jdx#7923)
- **(lockfile)** add atomic writes and cache invalidation by @jdx in
[jdx#7927](jdx#7927)
- **(templates)** use sha256 for hash filter instead of blake3 by @jdx
in [jdx#7925](jdx#7925)
- **(upgrade)** respect tracked configs when pruning old versions by
@jdx in [jdx#7926](jdx#7926)

### 🚜 Refactor

- **(progress)** migrate from indicatif to clx by @jdx in
[jdx#7928](jdx#7928)

### 📚 Documentation

- improve clarity on uvx and pipx dependencies by @ygormutti in
[jdx#7878](jdx#7878)

### ⚡ Performance

- **(install)** use Kahn's algorithm for dependency scheduling by @jdx
in [jdx#7933](jdx#7933)
- use Aho-Corasick for efficient redaction by @jdx in
[jdx#7931](jdx#7931)

### 🧪 Testing

- remove flaky test_http_version_list test by @jdx in
[jdx#7934](jdx#7934)

### Chore

- use github backend instead of ubi in mise.lock by @jdx in
[jdx#7922](jdx#7922)

### New Contributors

- @ygormutti made their first contribution in
[jdx#7878](jdx#7878)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants