Comparing changes

…DE.md Add three sections to project CLAUDE.md, lifted from feedback memories that were duplicating the same rules across multiple files: - Project communication: full-scope messaging, honest comparisons, design spec purity (3 rules) - Repo layout: fallow-2/fallow split, .internal/ symlink target, vendored npm/fallow/skills/, gh repo flag, fallow vs fallow check (6 rules) - Worktree / parallel-agent rules: commit WIP early, verify authors, no push-through-dirty-worktree, combined.rs contention, fmt after cherry-pick, conflict markers, cargo cache, missing commits (7 rules) Memories archived under ~/.claude/projects/.../memory/_archive/.

…audit base-snapshot skip - Add per-project token cache at .fallow/cache/dupes-tokens-v2/, threshold-gated by duplicates.minCorpusSizeForTokenCache (default 5000). Bitcode-encoded with TokenKind round-trip; auto-writes .gitignore on save. - Add k-token shingle prefilter for focused-mode runs (audit, --changed-since dupes), threshold-gated by duplicates.minCorpusSizeForShingleFilter (default 1024). Drops unchanged files whose shingles do not overlap any focused file before suffix array construction. - Add audit base-snapshot fast path: when every changed file is either a non-behavioral doc or token-equivalent at the base ref, reuse current run's results as the base snapshot and skip the second worktree analysis. Surfaced via base_snapshot_skipped under --performance. - Bump CACHE_VERSION; persist TokenKind losslessly via bitcode derive. - Tighten is_fallow_cache_artifact to use opts.root/.fallow with canonical fallback so symlinked-tempdir tests stay correct. Resolves #243 perf direction.

…rvalIndex inserts `fallow dupes --changed-since` was running a full-corpus suffix-array scan and post-filtering instead of engaging the focused-mode fast path that already existed for `audit`. The standalone CLI hardcoded `changed_files: None` so `find_duplicates_touching_files` was never reached. Resolve `--changed-since` to a concrete file set up front and pass it through; the existing post-filter becomes a no-op safety net. `IntervalIndex::insert` claimed an "ascending start order per slot" invariant that neither caller satisfied: both `remove_token_subsets` and `remove_line_subsets` process groups in length-/token-DESC order, so per-slot inserts arrived in arbitrary offset order. The single-prev-merge logic fragmented intervals and produced false negatives in `is_covered`, keeping more groups than necessary and bloating the slot vecs. Replace with a coalesce-on-insert that merges every existing interval touching or overlapping the new range. Also harden `path_to_idx` indexing in `remove_line_subsets` to use `.get()` with a `tracing::error!` skip path, removing an unconditional panic site flagged in the issue (no repro available, but the indexing was unsafe). Add per-step `tracing::debug!` breakdown inside `build_groups` so future perf work has subsecond-level visibility into where time goes. Measured on MUI master (16k tokenized files, 3.2M tokens, 639k raw groups): fallow dupes --no-cache: before: 41.4s total, build_groups 31.5s after: 17.8s total, build_groups 8.1s token_subset_us: 24.5s -> 0.94s (-26x) fallow dupes --changed-since HEAD~5: before: 42.8s; after: 1.3s (-33x) fallow audit --base HEAD~5 duplication step: before: 0.95s; after: 0.63s fallow dupes --no-cache on next.js fixture: before: 2.66s; after: 2.20s Output equivalence: MUI full-corpus dropped from 329,165 to 329,163 duplicated lines (12 groups removed). Inspection confirms the previous fragmentation was masking a small number of legitimate line-level subsets. The 3 extra groups that focused-mode now finds for `--changed-since` match what `audit` already produces and are real clones touching changed files that the old post-filter pass was hiding via cross-corpus subsumption. Refs #243.

…ry points Extract `rayon::ThreadPoolBuilder::build_global()` into a single `rayon_pool::configure_global_pool(threads)` helper that pins worker stack size to 16 MiB (deep visitor and graph traversals overflow Rust's default 8 MiB worker stack on large real-world projects). Apply it from both `main.rs::validate_inputs` and `programmatic.rs::AnalysisOptions`, so embedded NAPI consumers get the same thread count and stack size as the CLI rather than inheriting Rayon's defaults. Includes a stack-probe regression test that recurses 5,000 frames in a worker to assert the pinned stack size holds.

… .turbo, ...) fallow dupes now skips **/.next/**, **/.nuxt/**, **/.svelte-kit/**, **/.turbo/**, **/.parcel-cache/**, **/.vite/**, **/.cache/**, **/out/**, and **/storybook-static/** before tokenization. Authored-looking lib/, legacy/, and nested build/ directories stay in scope. Defaults merge with duplicates.ignore; set duplicates.ignoreDefaults: false to opt out. Human and markdown output show a one-line skipped-file note; --explain-skipped expands to per-pattern counts. JSON, SARIF, CodeClimate, and compact output stay unchanged. fallow init now scaffolds a commented-out [duplicates] block with common ignore additions (lib/, legacy/, __generated__/, generated/), and the JSON variant now parses as JSONC end-to-end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on May 1, 2026

This comparison is taking too long to generate.

Uh oh!