Skip to content

audit --base <ref> takes ~34s on a 27k-file monorepo with realistic duplication, barrels and tsconfig paths — what's the achievable floor? #243

@OmerGronich

Description

@OmerGronich

What happened?

Cold fallow audit --base HEAD~10 on a 27k-file synthetic monorepo (with realistic copy-paste duplication, nested barrel files, and tsconfig path aliases) takes ~34 seconds:

$ time fallow audit --base HEAD~10 --no-cache --performance --root .
[…]
✗ dead code: 172 issues · duplication: 37 clone groups · 10 changed files (33.86s)

real    0m34.0s

The post-#224 v2.55 fix landed and works on the original synthetic — thanks for the quick turnaround. This issue follows up with a heavier fixture that mirrors a few cost-shape dimensions a typical production monorepo has.

┌─ Pipeline Performance ─────────────────────────────
│  discover files:      311.0ms  (26816 files)
│  workspaces:           12.8ms  (64 workspaces)
│  plugins:            1719.2ms
│  parse/extract:       743.4ms  (26816 modules)
│  entry points:        268.7ms  (448 entries)
│  resolve imports:    2672.3ms
│  analyze:            7247.3ms
│  TOTAL:             13036.1ms
└───────────────────────────────────────────────────

── Dead Code ──────────────────────────────────────
✗ 40 exports · 64 unused dependencies · 50 unresolved imports · 18 unlisted dependencies (14.10s)

── Duplication ────────────────────────────────────
✗ 78,709 lines (3.9%) duplicated across 1613 files (17.64s)

Pipeline ~13s, post-pipeline Duplication ~17.6s, total ~34s wall-clock. --base HEAD~10 only changed 10 files but the duplication stage still scans all 26,816.

Why this matters

The docs make a series of speed claims:

For a fixture this size, ~34s is ~30-100x off the "sub-second" / "milliseconds" framing. What's the realistic floor for this scenario, and what would need to change to get there?

The bigger question I'd value your perspective on is what's realistically achievable here — given fallow's architecture, hardware-bound floors, and the cost shape this fixture exposes. Once that's clear, the how tends to follow. The reproduction below should let you profile and explore it directly.

Reproduction

~10 minutes end-to-end on Apple Silicon. No node_modules needed.

The generator below extends the v1 generator from #224 with three flags that better mirror real-monorepo cost shape: --commit-distribution power-law (a few hot files get many commits, most get one), --barrel-depth N (nested index.ts re-export trees), --tsconfig-paths N (path aliases like @pkg/foo).

Step 1 — generator script (click to expand)
mkdir /tmp/fallow-perf-repro && cd /tmp/fallow-perf-repro
````bash
cat > gen-monorepo.cjs <<'GEN_EOF'
#!/usr/bin/env node
const fs=require('node:fs'),path=require('node:path'),cp=require('node:child_process');
function arg(n,d){const i=process.argv.indexOf(n);if(i===-1)return d;const v=process.argv[i+1];return v===undefined?true:v;}
const N_WS=parseInt(arg('--workspaces','20'),10),FILES_PER_WS=parseInt(arg('--files-per-ws','50'),10),ROOT=path.resolve(arg('--root','.')),IMPORTS_PER_FILE=parseInt(arg('--imports-per-file','3'),10),CROSS_WS=parseInt(arg('--cross-ws-imports','0'),10),COMMITS=parseInt(arg('--commits','0'),10),NEST_DEPTH=parseInt(arg('--nest-depth','0'),10),DUPE_BLOCKS=parseInt(arg('--dupe-blocks','0'),10),DUPE_BLOCK_LINES=parseInt(arg('--dupe-block-lines','30'),10),DUPE_VARIANTS=parseInt(arg('--dupe-variants','50'),10);
const COMMIT_DIST=String(arg('--commit-distribution','power-law')),BARREL_DEPTH=parseInt(arg('--barrel-depth','0'),10),TSCONFIG_PATHS=parseInt(arg('--tsconfig-paths','0'),10);
const PLUGINS=String(arg('--plugins','typescript,@nx/workspace,eslint')).split(',').filter(Boolean);
const PV={vite:'5.4.10',vitest:'1.6.1',jest:'29.7.0','@storybook/react':'8.4.0','@nx/workspace':'20.0.0','@angular/core':'18.2.0',next:'14.2.18',playwright:'1.48.0',cypress:'13.15.0',eslint:'9.13.0',tailwindcss:'3.4.14',react:'18.3.1','react-router':'6.27.0',remix:'1.19.3',gatsby:'5.13.7',nuxt:'3.13.2',astro:'4.16.0',rollup:'4.24.0',webpack:'5.95.0',parcel:'2.12.0',typescript:'5.6.3'};
const PC={vite:['vite.config.ts','export default { plugins: [] };\n'],vitest:['vitest.config.ts','export default { test: { globals: true } };\n'],jest:['jest.config.js','module.exports = { testEnvironment: "node" };\n'],next:['next.config.js','module.exports = {};\n'],playwright:['playwright.config.ts','export default { testDir: "./e2e" };\n'],eslint:['.eslintrc.cjs','module.exports = { root: false, rules: {} };\n'],tailwindcss:['tailwind.config.js','module.exports = { content: ["./src/**/*.{ts,tsx}"] };\n'],remix:['remix.config.js','module.exports = { ignoredRouteFiles: ["**/.*"] };\n'],gatsby:['gatsby-config.ts','export default { plugins: [] };\n'],rollup:['rollup.config.js','export default { input: "src/index.ts", output: { file: "dist/index.js" } };\n'],webpack:['webpack.config.js','module.exports = { entry: "./src/index.ts" };\n'],parcel:['.parcelrc','{ "extends": "@parcel/config-default" }\n']};
function mk(p){fs.mkdirSync(p,{recursive:true});}
function w(p,b){mk(path.dirname(p));fs.writeFileSync(p,b);}
function wsPkg(i){const d={};for(const p of PLUGINS) d[p]=PV[p]||'0.0.0';return{name:`@repro/pkg-${String(i).padStart(3,'0')}`,version:'0.0.0',private:true,main:'src/index.ts',types:'src/index.ts',dependencies:d};}
function body(wi,fi){const L=[];for(let k=1;k<=IMPORTS_PER_FILE;k++){const t=(fi+k)%FILES_PER_WS;if(t!==fi) L.push(`import { fn${t} } from "./mod-${t}";`);}for(let k=0;k<CROSS_WS&&N_WS>1;k++){const o=(wi+k+1)%N_WS;L.push(`import { fn0 as cross${k} } from "@repro/pkg-${String(o).padStart(3,'0')}";`);}if(TSCONFIG_PATHS>0&&N_WS>1){const a=(wi+fi)%Math.min(TSCONFIG_PATHS,N_WS);L.push(`import { fn0 as aliased } from "@repro/alias-${String(a).padStart(3,'0')}";`);}L.push('',`export function fn${fi}(input:number):number{`,'  let acc=input;','  if(acc>10) acc+=1; else acc-=1;','  for(let i=0;i<3;i++) acc*=2;','  return acc;','}',`export const CONST_${fi}=${fi};`);return L.join('\n')+'\n';}
function sub(fi){if(NEST_DEPTH<=0) return 'src';const s=['src'];let n=fi;for(let d=0;d<NEST_DEPTH;d++){s.push(`g${n%5}`);n=Math.floor(n/5);}return s.join('/');}
function rootIdx(){const L=[];if(BARREL_DEPTH>0&&NEST_DEPTH>0){for(let g=0;g<5;g++) L.push(`export * from "./g${g}";`);}else{for(let i=0;i<FILES_PER_WS;i++) L.push(`export * from "./mod-${i}";`);}return L.join('\n')+'\n';}
function nestedBarrels(wsRoot,files){if(BARREL_DEPTH<=0||NEST_DEPTH<=0) return [];const dirToFiles=new Map();for(const fp of files){const rel=path.relative(wsRoot,fp);const dir=path.dirname(rel);if(!dirToFiles.has(dir)) dirToFiles.set(dir,[]);dirToFiles.get(dir).push(path.basename(fp,'.ts'));}const allDirs=new Set();for(const dir of dirToFiles.keys()){const parts=dir.split('/');for(let i=1;i<=parts.length;i++){const sub=parts.slice(0,i).join('/');if(sub.startsWith('src')) allDirs.add(sub);}}const barrels=[];for(const dir of allDirs){const depth=dir.split('/').length-1;if(depth===0||depth>BARREL_DEPTH) continue;const lines=[];const mods=dirToFiles.get(dir)||[];for(const m of mods) lines.push(`export * from "./${m}";`);const childDirs=new Set();for(const otherDir of allDirs){if(otherDir.startsWith(dir+'/')&&otherDir.split('/').length===depth+2) childDirs.add(otherDir.split('/')[depth+1]);}for(const cd of childDirs) lines.push(`export * from "./${cd}";`);if(lines.length>0) barrels.push([path.join(wsRoot,dir,'index.ts'),lines.join('\n')+'\n']);}return barrels;}
function dupes(){const V=[];for(let v=0;v<DUPE_VARIANTS;v++){const L=[`// dupe-${v}`,`export class DupeBuilder${v}{`,`  private readonly tag='variant-${v}';`,'  private parts:string[]=[];','  add(p:string):this{this.parts.push(p);return this;}','  build():string{let acc=this.tag+":";for(const p of this.parts){if(p.length===0) continue;acc+=" | "+p.toLowerCase().trim();}return acc;}','  reset():this{this.parts.length=0;return this;}',`  static def${v}():DupeBuilder${v}{const b=new DupeBuilder${v}();b.add('a-${v}').add('b-${v}').add('c-${v}');return b;}`,'}'];while(L.length<DUPE_BLOCK_LINES-1) L.push(`// pad-${L.length}`);L.push('');V.push(L.join('\n')+'\n');}return V;}
function pickCommitTarget(c,total){if(COMMIT_DIST==='flat') return c%total;if(COMMIT_DIST==='hot-files'){const hot=Math.max(1,Math.floor(total*0.05));if((c*13)%10<8) return ((c*31)^(c*17))%hot;return hot+(c%(total-hot));}let s=(c+1)*2654435761>>>0;s^=s<<13;s>>>=0;s^=s>>>17;s^=s<<5;s>>>=0;const r=(s&0xffffffff)/0x100000000;return Math.floor(total*(1-r*r*r))%total;}
const WSP=String(arg('--workspace-parent','packages')),WP=String(arg('--workspace-pattern',`${WSP}/*`));
console.log(`[gen v2] ws=${N_WS} files/ws=${FILES_PER_WS} plugins=${PLUGINS.length} pat=${WP} nest=${NEST_DEPTH} barrel-depth=${BARREL_DEPTH} ts-paths=${TSCONFIG_PATHS} commit-dist=${COMMIT_DIST} dupes=${DUPE_BLOCKS}@${DUPE_BLOCK_LINES}L×${DUPE_VARIANTS}var commits=${COMMITS}`);
mk(ROOT);
w(path.join(ROOT,'package.json'),JSON.stringify({name:'fallow-perf-repro',version:'0.0.0',private:true,workspaces:[WP],devDependencies:{typescript:'5.0.0'}},null,2)+'\n');
const tsPaths={};if(TSCONFIG_PATHS>0){for(let i=0;i<TSCONFIG_PATHS;i++){const wi=i%N_WS;tsPaths[`@repro/alias-${String(i).padStart(3,'0')}`]=[`./${WSP.replace(/\/.*/,'')}/pkg-${String(wi).padStart(3,'0')}/src/index.ts`];tsPaths[`@repro/alias-${String(i).padStart(3,'0')}/*`]=[`./${WSP.replace(/\/.*/,'')}/pkg-${String(wi).padStart(3,'0')}/src/*`];}}
w(path.join(ROOT,'tsconfig.json'),JSON.stringify({compilerOptions:{target:'ES2022',module:'ESNext',moduleResolution:'Bundler',strict:false,baseUrl:'.',paths:tsPaths}},null,2)+'\n');
const all=[];
for(let wi=0;wi<N_WS;wi++){const r=path.join(ROOT,WSP.replace(/\/.*/,''),`pkg-${String(wi).padStart(3,'0')}`);w(path.join(r,'package.json'),JSON.stringify(wsPkg(wi),null,2)+'\n');w(path.join(r,'src','index.ts'),rootIdx());const wsFiles=[];for(let f=0;f<FILES_PER_WS;f++){const fp=path.join(r,sub(f),`mod-${f}.ts`);w(fp,body(wi,f));all.push(fp);wsFiles.push(fp);}for(const [bp,bc] of nestedBarrels(r,wsFiles)) w(bp,bc);for(const p of PLUGINS){const c=PC[p];if(c) w(path.join(r,c[0]),c[1]);}}
if(DUPE_BLOCKS>0){const V=dupes();console.log(`[gen v2] inject ${DUPE_BLOCKS} dupes...`);for(let i=0;i<DUPE_BLOCKS;i++){fs.appendFileSync(all[(i*7919)%all.length],`\n// dup-${i}\nexport namespace dupeNs${i} {\n${V[i%DUPE_VARIANTS]}}\n`);}}
if(COMMITS>0){console.log(`[gen v2] git init + ${COMMITS} commits (${COMMIT_DIST})...`);cp.execSync('git init -q',{cwd:ROOT,stdio:'inherit'});cp.execSync('git config user.email perf@example.com',{cwd:ROOT});cp.execSync('git config user.name "Perf Reproducer"',{cwd:ROOT});cp.execSync('git add -A',{cwd:ROOT,stdio:'inherit'});cp.execSync('git commit -q -m "initial"',{cwd:ROOT});for(let c=1;c<=COMMITS;c++){fs.appendFileSync(all[pickCommitTarget(c,all.length)],`// touch ${c}\n`);cp.execSync(`git -c user.email=perf${c%5}@example.com -c user.name="A${c%5}" commit -q -am "touch ${c}"`,{cwd:ROOT});if(c%500===0) console.log(`[gen v2]   ${c}/${COMMITS}`);}}
console.log(`[gen v2] done.`);
GEN_EOF
# Step 2 — generate (~10 min)
node gen-monorepo.cjs \
  --workspaces 64 --files-per-ws 256 \
  --plugins 'next,gatsby,remix,vite,vitest,webpack,parcel,rollup' \
  --imports-per-file 5 --cross-ws-imports 5 \
  --workspace-pattern 'apps/*' --workspace-parent apps --nest-depth 3 \
  --barrel-depth 3 \
  --tsconfig-paths 64 \
  --dupe-blocks 32000 --dupe-block-lines 50 --dupe-variants 400 \
  --commits 5036 \
  --commit-distribution power-law \
  --root .

# Step 3 — measure
fallow audit --base HEAD~10 --no-cache --performance --root .

Expected on Apple Silicon M3 Max, fallow 2.56.0: ~34 s wall-clock, ~13 s pipeline, ~17.6 s post-pipeline Duplication. Repro produces ~26,800 TS files (16,384 mod files + ~10,400 nested barrel index.ts files).

What this fixture still doesn't fully model

The --commit-distribution power-law flag skews commits toward a few hot files but the resulting density on this run was avg ~1.2 commits/file (max 213, ~50 files with ≥10 commits). Real production monorepos commonly average 5–50 commits per file on hot files (orders of magnitude higher), which makes git churn cost 5–10s rather than the ~1s observable here. --commit-distribution hot-files is also available in the generator if you want a sharper concentration. Worth flagging in case the audit perf path picks up git churn cost for CRAP-score weighting on heavier histories.

Expected behavior

Open question — that's why I'm filing this as a bug rather than a feature request with a specific solution. The docs frame fallow as sub-second / millisecond-scale, and --base <ref> looks designed for diff-scoped runs, so my naïve expectation is that a 10-file diff on a 27k-file repo should be much closer to a few seconds than 34s. But I don't know what the realistic floor is on this fixture given the architecture.

A few specific questions you'd be better placed than me to answer:

  1. What's the realistic cold floor for audit --base HEAD~10 on a fixture this size, with 10 changed files? 3-4 s? 10 s? 20 s?
  2. Is the Duplication section's full-corpus tokenisation on every run intentional, or is there room to scope it when --base is set without losing detection completeness?
  3. Does audit benefit from --base semantically beyond post-filtering, or is the diff just used to filter the report at the end?

Happy to provide more profiling data, run experimental builds, tighten the repro if it helps.

Fallow version

2.56.0

Operating system

macOS

Configuration

No `.fallowrc.json` or `fallow.toml` is required for the reproduction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions