What happened?
On a large multi-package TypeScript monorepo, fallow health --hotspots --score returns a health_score in the B grade band while only 4 of 11 penalty dimensions track reality. The other 7 are mathematically incapable of firing at this scale:
| Dimension |
Cap |
State |
dead_files |
15 |
✅ honest |
dead_exports |
15 |
✅ honest |
complexity |
20 |
✅ honest |
duplication |
10 |
✅ honest |
p90_complexity |
10 |
⚫ silent (p90_cyc well below the > 10 trigger) |
maintainability |
15 |
⚫ silent (MI_avg well above the < 70 trigger) |
hotspots |
10 |
⚫ silent (max ranked score reaches a fraction of the 50.0 filter) |
unit_size |
10 |
⚫ silent (very_high_risk % below the ≥ 5 % floor) |
coupling |
5 |
⚫ silent (p95_fan_in well below the > 30 trigger) |
unused_deps |
10 |
🔴 saturated (actual count an order of magnitude over the cap) |
circular_deps |
10 |
🔴 saturated (actual count well over an order of magnitude over) |
| total |
130 |
score lands in the B band |
One pattern explains all 7 broken dimensions: scale-blind aggregations + low absolute caps.
- The 5 silent dimensions aggregate per-function/per-file metrics with
mean / p90 / fixed-percentage operators, then trigger on a fixed threshold tuned for small/medium projects. At scale, the long tail is mathematically swallowed by the bulk of trivial code (most TS files are tiny utility/barrel/model files; most functions are 1-CC getters and lambdas), so the aggregation never crosses the floor:
- Tens of thousands of functions live above
p90, but p90 itself sits well below > 10.
- A meaningful absolute count of files have
MI < 70, but they're a tiny fraction of the total, so the mean is near 100.
- Thousands of functions exceed 60 LOC, but they're below the 5 % floor of the function-count denominator.
- Thousands of files are ranked as hotspots, but the within-project
max-norm formula at compute_hotspot_score ((churn/max_churn) × (density/max_density) × 100) is structurally bounded — see §"Related" below.
p95_fan_in lands in the single digits because the bottom 95 % of files are barely imported; the actually-coupled barrels live above p99.
- The 2 saturated dimensions use
min(count, 10) on per-repo counts. Reasonable for a single-package project; a no-op in any workspace where N packages multiply the count linearly. The formula treats n=11 and n=1000 identically.
Net: ~38 % of the penalty budget (50/130 pts) is silently zero, ~15 % (20/130) is pinned at the cap regardless of magnitude. A codebase with thousands of fat functions, hundreds of cycles, and hundreds of unused deps reads as B / mostly healthy.
Per-dimension evidence
clamp(p90_cyclomatic − 10, 0, 10). At large function-population sizes, the bulk are trivial; complex functions live above p99. A p99_cyclomatic (same trigger) or functions_with_cc_above_20 / 1k_functions would survive.
min((70 − MI_avg).max(0) × 0.5, 15). Over 98 % of files have MI ≥ 70, dragging the mean above the trigger. The actionable signal is the small absolute count with MI < 70 — invisible to a mean. maintainability_p10 or count(MI < 70) would survive.
- Penalty:
min(hotspot_count / total_files × 200, 10) where hotspot_count = files with score ≥ HOTSPOT_SCORE_THRESHOLD (= 50.0).
- Score:
(weighted_commits / max_weighted) × (complexity_density / max_density) × 100.
- Max-norm caps every score at
1.0 × 1.0 × 100 = 100 only if a single file is both max-churned and max-density. In practice the top-churned file has moderate density and vice-versa, so the product is structurally bounded well below 50.0. Top-ranked hotspot reaches less than half of the threshold → hotspot_count = 0 even though thousands of files are ranked. Either expose the threshold or count "top N % of the within-project ranking".
min((very_high_risk_pct − 5).max(0) × 0.5, 10), very_high_risk = % of functions > 60 LOC. A substantial absolute inventory of functions over 60 LOC stays invisible because it's a small fraction of a large function-count denominator. Lower the floor (~1 %) or switch to functions_over_60_loc / 1k_functions.
min((p95_fan_in − 30).max(0) × 0.25, 5). Fan-in is heavy-tailed. p95 is in the single digits because the bottom 95 % is barely imported — not because there are no hubs. p99_fan_in (same trigger) or the already-computed coupling_high_pct (vital_signs.rs:285) would work.
min(count, 10) for both. unused_dep_count exceeds the cap by an order of magnitude; circular_dep_count by well over an order of magnitude. Counts grow ~linearly with workspace package count; the cap was reasonable for a single-package project but is a no-op in any monorepo. Recommended replacement: per-1k-files density.
Recommended fix: scale-invariant aggregations as the new default
A metric should ask "what fraction of your code is bad?" — not "are you big enough to dilute the bad code below a threshold?"
| Dimension |
Current scale-blind aggregator |
Scale-invariant replacement |
complexity |
avg_cyclomatic (mean over all functions) |
count(cc ≥ critical) / 1k functions |
p90_complexity |
p90_cyclomatic > 10 |
(subsumed by complexity tail metric — drop) |
maintainability |
mean(MI) < 70 |
% of files with MI < 70 |
hotspots |
count(score ≥ 50) / total_files × 200 |
top 1 % of within-project hotspot ranking / total_files × 200 |
unit_size |
% of functions > 60 LOC, trigger > 5 % |
count(functions > 60 LOC) / 1k functions |
coupling |
p95_fan_in − 30 |
coupling_high_pct (already computed) |
unused_deps |
min(count, 10) |
count / 1k files × 0.5, cap 25 |
circular_deps |
min(count, 10) |
count / 1k files × 0.5, cap 25 |
Every replacement is scale-invariant by construction — bigger codebases neither get a leniency dividend nor a size penalty. A 1K-file project and a 100K-file project with the same density of bad code score identically.
A small project (e.g. 1K files, single-digit unused deps, 1-2 cycles) sees a small improvement under the new densities, not a regression — density-based aggregators are simultaneously small-project-friendly and monorepo-honest.
Fallback ask (if changing defaults is too invasive)
If shipping these as new defaults moves every existing user's grade, the minimum useful change is expose the scale-invariant primitives as new vital_signs fields alongside the existing scale-blind ones — so dashboards and CI gates can compute honest scores externally:
vital_signs.functions_above_critical_cc_per_k // replaces avg_cyc + p90_cyc
vital_signs.functions_above_60_loc_per_k // replaces unit_size very_high_risk
vital_signs.maintainability_pct_below_70 // replaces maintainability_avg
vital_signs.hotspots_top_pct_count // replaces hotspot_count
vital_signs.unused_deps_per_k_files // replaces saturated unused_dep_count
vital_signs.circular_deps_per_k_files // replaces saturated circular_dep_count
Doesn't move any grade, lets large monorepos compute honest scores externally. Strictly worse than fixing the defaults (fallow's own health_score would still report B when the data says D), but the smallest useful change.
Configurability audit (none of this is tunable today)
HealthConfig: the only score-relevant knob is health.ignore (denominator filter). All seven broken-dimension constants are hardcoded:
HealthConfig.maxCyclomatic / maxCognitive / maxCrap only affect finding emission, not the score — confirmed in compute_health_score, which never reads them. CLI flags --since / --min-commits widen the hotspot window but don't affect HOTSPOT_SCORE_THRESHOLD or the max-norm. No .fallowrc.json or CLI combination can move this score from B to its honest grade — scale-blindness lives in source-level constants.
Related: upstream signal defects
Two broken dimensions have defects in the upstream signal, not just in how the score consumes them. Even with compute_health_score() fixed, these will remain silent until the upstream signal is also addressed. Happy to file as companion issues.
-
Hotspot scoring algorithm has a structural ceiling well below the threshold. compute_hotspot_score returns (weighted_commits / max_weighted) × (complexity_density / max_density) × 100. To reach 100 (or even 50), one file must be both max-churned and max-density. In real codebases the top-churned file has moderate density and vice-versa, so the product is structurally bounded well below the HOTSPOT_SCORE_THRESHOLD = 50.0 filter at vital_signs.rs:131. On any sufficiently large repo, top-ranked hotspots reach only a fraction of 50 → hotspot_count is always 0. A percentile-based filter ("files in the top 1 % of the within-project hotspot ranking") would survive max-norm compression.
-
MI per-file formula's small-file dampening pushes most files to MI ≥ 70. compute_maintainability_index is 100 − density × 30 × dampening − dead_ratio × 20 − min(ln1p(fan_out) × 4, 15) where dampening = min(lines / MI_DENSITY_MIN_LINES, 1.0). Files under 50 LOC (barrels, models, utility) get density damped toward 0, pinning their MI near 100 regardless of internal complexity. Result: well over 98 % of scored files end up with MI ≥ 70 on any TS-heavy codebase. Fixing the score-formula aggregator alone helps but per-file MI is still inflated.
Why this matters
scores.rs:26-49 describes health_score as a comprehensible 0–100 summary suitable for dashboards and CI gates. With 5/11 dimensions silently 0 and 2/11 saturated, the score is structurally unable to communicate "really, really bad" for any sufficiently large project. The underlying data is excellent; the problem is in how the score formula aggregates it.
Reproduction
The bug is deterministic in the formula — given inputs in the shape produced by any large TS monorepo, compute_health_score() returns a B-band score with five 0.0 penalties and two saturated 10.0 penalties. No real codebase required.
Easiest: drop a unit test into fallow's own test suite
Following the existing pattern in vital_signs.rs:1135+ (health_score_perfect, etc.):
#[test]
fn health_score_silent_and_saturated_at_monorepo_scale() {
// Inputs in the shape produced by any large multi-package TS monorepo.
// Small perturbations don't change the qualitative result.
let total_files: usize = 25_000;
let vs = VitalSigns {
// honest dimensions
dead_file_pct: Some(4.0),
dead_export_pct: Some(9.0),
avg_cyclomatic: 2.3,
duplication_pct: Some(6.0),
// silent dimensions — every value is "long-tail-hidden"
p90_cyclomatic: 4,
maintainability_avg:Some(91.0), // mean dominated by small files
hotspot_count: Some(0), // none cross HOTSPOT_SCORE_THRESHOLD = 50
unit_size_profile: Some(RiskProfile { very_high_risk: 2.3, ..Default::default() }),
p95_fan_in: Some(7),
// saturated dimensions — counts grow with workspace package count
unused_dep_count: Some(180),
circular_dep_count: Some(450),
..Default::default()
};
let score = compute_health_score(&vs, total_files);
let p = &score.penalties;
// 4 honest dimensions
assert!(p.dead_files.unwrap() > 0.0 && p.dead_files.unwrap() < 5.0);
assert!(p.dead_exports.unwrap() > 0.0 && p.dead_exports.unwrap() < 5.0);
assert!(p.complexity > 0.0 && p.complexity < 10.0);
assert!(p.duplication.unwrap() > 0.0 && p.duplication.unwrap() < 5.0);
// 5 silent dimensions
assert_eq!(p.p90_complexity, 0.0);
assert_eq!(p.maintainability.unwrap(), 0.0);
assert_eq!(p.hotspots.unwrap(), 0.0);
assert_eq!(p.unit_size.unwrap(), 0.0);
assert_eq!(p.coupling.unwrap(), 0.0);
// 2 saturated dimensions
assert_eq!(p.unused_deps.unwrap(), 10.0);
assert_eq!(p.circular_deps.unwrap(), 10.0);
assert_eq!(score.grade, "B");
}
Self-contained, runs in milliseconds. Same test with the recommended scale-invariant aggregators should drop the score by roughly one and a half letter grades (B → D).
End-to-end: synthetic monorepo generator
The script below produces a fully synthetic TS workspace whose vital_signs reproduces the broken-dimension pattern end-to-end. Defaults generate ~21K files in ~3.5 min (mostly git churn); --commits-per-fat-file=2 runs in under a minute with the same pattern. Smaller --packages / --files-per-pkg produce the partial pattern (3-4 silent dimensions).
node generate-monorepo.mjs ./repro # defaults: 80 pkgs × 250 files
cd ./repro && fallow health --hotspots --score --format json --quiet \
| jq '.health.health_score.{score, grade, penalties}, .health.vital_signs'
Expected at defaults: score in the C band (~65), 5 of 11 penalties at 0.0 (p90_complexity, maintainability, unit_size, coupling, plus dead_files / dead_exports since synthetic data has no deads), 2 saturated at 10.0 (unused_deps, circular_deps). Bumping --fat-fns-per-pkg past 5 silences hotspots and lifts the score into B.
generate-monorepo.mjs — click to expand
#!/usr/bin/env node
// Reproduces fallow's health_score scale-blindness pattern (5 silent + 2 saturated).
// Usage: node generate-monorepo.mjs <out-dir> [--packages=80] [--files-per-pkg=250]
// [--fat-fns-per-pkg=5] [--cycles-per-pkg=6] [--unused-deps-per-pkg=3]
// [--commits-per-fat-file=8]
import { mkdirSync, writeFileSync, existsSync, rmSync } from 'node:fs';
import { execSync } from 'node:child_process';
import { join } from 'node:path';
const args = Object.fromEntries(process.argv.slice(2).filter(a => a.startsWith('--'))
.map(a => { const [k, v] = a.replace(/^--/, '').split('='); return [k, v ?? true]; }));
const outDir = process.argv.find((a, i) => i > 1 && !a.startsWith('--')) ?? './repro';
const PACKAGES = Number(args.packages ?? 80);
const FILES_PER_PKG = Number(args['files-per-pkg'] ?? 250);
const FAT_FNS_PER_PKG = Number(args['fat-fns-per-pkg'] ?? 5);
const CYCLES_PER_PKG = Number(args['cycles-per-pkg'] ?? 6);
const UNUSED_DEPS_PER_PKG = Number(args['unused-deps-per-pkg']?? 3);
const COMMITS_PER_FAT = Number(args['commits-per-fat-file'] ?? 8);
if (existsSync(outDir)) rmSync(outDir, { recursive: true, force: true });
mkdirSync(outDir, { recursive: true });
writeFileSync(join(outDir, 'package.json'), JSON.stringify({
name: 'fallow-repro', private: true,
workspaces: Array.from({ length: PACKAGES }, (_, i) => `packages/pkg-${i}`),
}, null, 2));
writeFileSync(join(outDir, 'tsconfig.json'), JSON.stringify({
compilerOptions: { target: 'ES2022', module: 'ESNext', moduleResolution: 'bundler', strict: true, skipLibCheck: true },
}, null, 2));
// Trivial file = 1 trivial fn (1-CC). Drives p90_cyc mean, very_high_risk %, MI mean.
const trivial = (p, i) =>
`// pkg-${p} v${i}\nexport function get_v${i}_${p}(): number { return ${i} + ${p}; }\nexport const v${i}_${p} = ${i * (p + 1)};\n`;
// Fat file = 1 nested-switch fn → high CC, > 60 LOC.
const fat = (p, idx) => {
const branches = Array.from({ length: 12 }, (_, b) => ` case ${b}: { switch (mode) {
case 'a': return ${b} * 2 + ${p}; case 'b': return ${b} + 1 - ${idx};
case 'c': return ${b} - 1 * ${p}; case 'd': return ${b} ** 2 + ${idx};
default: return ${b} + ${p}; } }`).join('\n');
return `export function fatFn_${p}_${idx}(input: number, mode: 'a'|'b'|'c'|'d'): number {
switch (input) {\n${branches}\n default: return input + ${p};\n }\n}\n`;
};
// Cycle pair = two intra-package files importing each other. One pair = one cycle.
const cycA = (p, i) => `import { b_${p}_${i} } from './cycle-${i}-b';\nexport const a_${p}_${i} = b_${p}_${i} + ${p};\n`;
const cycB = (p, i) => `import { a_${p}_${i} } from './cycle-${i}-a';\nexport const b_${p}_${i} = a_${p}_${i} + ${i};\n`;
const barrel = (p) => {
const L = [];
for (let i = 0; i < FILES_PER_PKG; i++) L.push(`export * from './v${i}';`);
for (let f = 0; f < FAT_FNS_PER_PKG; f++) L.push(`export * from './fat-${f}';`);
for (let c = 0; c < CYCLES_PER_PKG; c++) { L.push(`export * from './cycle-${c}-a';`); L.push(`export * from './cycle-${c}-b';`); }
return L.join('\n') + '\n';
};
// Pool of public packages claimed as deps but never imported.
const POOL = ['lodash', 'rxjs', 'date-fns', 'uuid', 'chalk', 'yargs', 'minimist', 'zod', 'axios', 'commander'];
for (let p = 0; p < PACKAGES; p++) {
const dir = join(outDir, 'packages', `pkg-${p}`);
mkdirSync(join(dir, 'src'), { recursive: true });
const devDeps = {};
for (let u = 0; u < UNUSED_DEPS_PER_PKG; u++) devDeps[POOL[(p + u) % POOL.length]] = '*';
writeFileSync(join(dir, 'package.json'), JSON.stringify({
name: `pkg-${p}`, version: '0.0.0', main: './src/barrel.ts', types: './src/barrel.ts',
devDependencies: devDeps,
}, null, 2));
for (let f = 0; f < FILES_PER_PKG; f++) writeFileSync(join(dir, 'src', `v${f}.ts`), trivial(p, f));
for (let f = 0; f < FAT_FNS_PER_PKG; f++) writeFileSync(join(dir, 'src', `fat-${f}.ts`), fat(p, f));
for (let c = 0; c < CYCLES_PER_PKG; c++) {
writeFileSync(join(dir, 'src', `cycle-${c}-a.ts`), cycA(p, c));
writeFileSync(join(dir, 'src', `cycle-${c}-b.ts`), cycB(p, c));
}
writeFileSync(join(dir, 'src', 'barrel.ts'), barrel(p));
}
// Hotspots need git history: commit-burst on each fat file.
const sh = (cmd) => execSync(cmd, { cwd: outDir, stdio: ['ignore', 'ignore', 'inherit'] });
sh('git init -q -b main && git config user.email s@s && git config user.name s && git add . && git -c commit.gpgsign=false commit -q -m init');
let date = new Date('2024-01-01T00:00:00Z').getTime();
for (let p = 0; p < PACKAGES; p++) for (let f = 0; f < FAT_FNS_PER_PKG; f++) {
const path = `packages/pkg-${p}/src/fat-${f}.ts`;
for (let c = 0; c < COMMITS_PER_FAT; c++) {
sh(`printf '\\n// tweak ${c}\\n' >> "${path}"`);
sh(`git -c commit.gpgsign=false -c user.email=s@s -c user.name=s commit -q --allow-empty-message --date="${new Date(date).toISOString()}" -am tweak`);
date += 6 * 60 * 60 * 1000 + Math.floor(Math.random() * 6 * 60 * 60 * 1000);
}
}
console.log(`Done. cd ${outDir} && fallow health --hotspots --score --format json --quiet | jq '.health.health_score'`);
Knob → score-formula-input mapping:
| Knob |
Drives |
--packages |
Workspace package count → unused_deps & circular_deps saturation |
--files-per-pkg |
Total file count → silences unit_size, maintainability, hotspots |
--fat-fns-per-pkg |
Fat-function tail (invisible to mean / p90 / fixed-percent) |
--cycles-per-pkg |
Intra-package cycle count → circular_dep_count |
--unused-deps-per-pkg |
unused_dep_count per package |
--commits-per-fat-file |
Hotspot churn distribution |
Optional: against a real codebase
cd <large-ts-monorepo>
git fetch --unshallow # so hotspots have a real distribution (default --since 6m)
fallow health --hotspots --score --format json --quiet \
| jq '.health.health_score.penalties, .health.vital_signs'
Look for: dimensions reporting 0.0 in .penalties paired with non-zero "bulk" inputs in .vital_signs (p90_cyclomatic > 0, maintainability_avg > 0, unit_size_profile.very_high_risk > 0, p95_fan_in > 0), plus unused_deps / circular_deps pegged at 10.
Expected behavior
The formula behaves exactly as written, the case for changing it: at large scale the calibration produces structurally false signal — not "wrong by a few points" but "5 of 11 dimensions cannot fire under any input distribution this codebase shape will produce", and "11 vs 1,000 unused deps score identically".
health_score should:
- Fire on a codebase containing thousands of fat functions, a measurable absolute count of files with
MI < 70, and thousands of ranked hotspots.
- Differentiate small vs medium vs large vs catastrophic dep / cycle counts rather than collapsing them all to 10 pts.
- Produce different letter grades for repos with order-of-magnitude differences in bad-code volume.
Fallow version
fallow 2.62.0
Operating system
macOS
Configuration
What happened?
On a large multi-package TypeScript monorepo,
fallow health --hotspots --scorereturns ahealth_scorein the B grade band while only 4 of 11 penalty dimensions track reality. The other 7 are mathematically incapable of firing at this scale:dead_filesdead_exportscomplexityduplicationp90_complexityp90_cycwell below the> 10trigger)maintainabilityMI_avgwell above the< 70trigger)hotspots50.0filter)unit_sizevery_high_risk %below the≥ 5 %floor)couplingp95_fan_inwell below the> 30trigger)unused_depscircular_depsOne pattern explains all 7 broken dimensions: scale-blind aggregations + low absolute caps.
mean / p90 / fixed-percentageoperators, then trigger on a fixed threshold tuned for small/medium projects. At scale, the long tail is mathematically swallowed by the bulk of trivial code (most TS files are tiny utility/barrel/model files; most functions are 1-CC getters and lambdas), so the aggregation never crosses the floor:p90, butp90itself sits well below> 10.MI < 70, but they're a tiny fraction of the total, so the mean is near 100.max-normformula atcompute_hotspot_score((churn/max_churn) × (density/max_density) × 100) is structurally bounded — see §"Related" below.p95_fan_inlands in the single digits because the bottom 95 % of files are barely imported; the actually-coupled barrels live above p99.min(count, 10)on per-repo counts. Reasonable for a single-package project; a no-op in any workspace where N packages multiply the count linearly. The formula treatsn=11andn=1000identically.Net: ~38 % of the penalty budget (50/130 pts) is silently zero, ~15 % (20/130) is pinned at the cap regardless of magnitude. A codebase with thousands of fat functions, hundreds of cycles, and hundreds of unused deps reads as B / mostly healthy.
Per-dimension evidence
p90_complexity—vital_signs.rs:319clamp(p90_cyclomatic − 10, 0, 10). At large function-population sizes, the bulk are trivial; complex functions live above p99. Ap99_cyclomatic(same trigger) orfunctions_with_cc_above_20 / 1k_functionswould survive.maintainability—vital_signs.rs:323-325min((70 − MI_avg).max(0) × 0.5, 15). Over 98 % of files haveMI ≥ 70, dragging the mean above the trigger. The actionable signal is the small absolute count withMI < 70— invisible to a mean.maintainability_p10orcount(MI < 70)would survive.hotspots—vital_signs.rs:331-340+scores.rs:4min(hotspot_count / total_files × 200, 10)wherehotspot_count = files with score ≥ HOTSPOT_SCORE_THRESHOLD (= 50.0).(weighted_commits / max_weighted) × (complexity_density / max_density) × 100.1.0 × 1.0 × 100 = 100only if a single file is both max-churned and max-density. In practice the top-churned file has moderate density and vice-versa, so the product is structurally bounded well below50.0. Top-ranked hotspot reaches less than half of the threshold →hotspot_count = 0even though thousands of files are ranked. Either expose the threshold or count "top N % of the within-project ranking".unit_size—vital_signs.rs:359-365min((very_high_risk_pct − 5).max(0) × 0.5, 10),very_high_risk = % of functions > 60 LOC. A substantial absolute inventory of functions over 60 LOC stays invisible because it's a small fraction of a large function-count denominator. Lower the floor (~1 %) or switch tofunctions_over_60_loc / 1k_functions.coupling—vital_signs.rs:368-373min((p95_fan_in − 30).max(0) × 0.25, 5). Fan-in is heavy-tailed.p95is in the single digits because the bottom 95 % is barely imported — not because there are no hubs.p99_fan_in(same trigger) or the already-computedcoupling_high_pct(vital_signs.rs:285) would work.unused_deps&circular_deps(saturated) —vital_signs.rs:343-356min(count, 10)for both.unused_dep_countexceeds the cap by an order of magnitude;circular_dep_countby well over an order of magnitude. Counts grow ~linearly with workspace package count; the cap was reasonable for a single-package project but is a no-op in any monorepo. Recommended replacement: per-1k-files density.Recommended fix: scale-invariant aggregations as the new default
A metric should ask "what fraction of your code is bad?" — not "are you big enough to dilute the bad code below a threshold?"
complexityavg_cyclomatic(mean over all functions)count(cc ≥ critical) / 1k functionsp90_complexityp90_cyclomatic > 10maintainabilitymean(MI) < 70% of files with MI < 70hotspotscount(score ≥ 50) / total_files × 200top 1 % of within-project hotspot ranking / total_files × 200unit_size% of functions > 60 LOC, trigger > 5 %count(functions > 60 LOC) / 1k functionscouplingp95_fan_in − 30coupling_high_pct(already computed)unused_depsmin(count, 10)count / 1k files × 0.5, cap 25circular_depsmin(count, 10)count / 1k files × 0.5, cap 25Every replacement is scale-invariant by construction — bigger codebases neither get a leniency dividend nor a size penalty. A 1K-file project and a 100K-file project with the same density of bad code score identically.
A small project (e.g. 1K files, single-digit unused deps, 1-2 cycles) sees a small improvement under the new densities, not a regression — density-based aggregators are simultaneously small-project-friendly and monorepo-honest.
Fallback ask (if changing defaults is too invasive)
If shipping these as new defaults moves every existing user's grade, the minimum useful change is expose the scale-invariant primitives as new
vital_signsfields alongside the existing scale-blind ones — so dashboards and CI gates can compute honest scores externally:Doesn't move any grade, lets large monorepos compute honest scores externally. Strictly worse than fixing the defaults (fallow's own
health_scorewould still report B when the data says D), but the smallest useful change.Configurability audit (none of this is tunable today)
HealthConfig: the only score-relevant knob ishealth.ignore(denominator filter). All seven broken-dimension constants are hardcoded:HOTSPOT_SCORE_THRESHOLDscores.rs:450.0MI_DENSITY_MIN_LINESscores.rs:2450.0vital_signs.rs:295-404vital_signs.rs:69-288crates/core/src/churn.rs(HALF_LIFE_DAYS = 90)90HealthConfig.maxCyclomatic/maxCognitive/maxCraponly affect finding emission, not the score — confirmed incompute_health_score, which never reads them. CLI flags--since/--min-commitswiden the hotspot window but don't affectHOTSPOT_SCORE_THRESHOLDor the max-norm. No.fallowrc.jsonor CLI combination can move this score from B to its honest grade — scale-blindness lives in source-level constants.Related: upstream signal defects
Two broken dimensions have defects in the upstream signal, not just in how the score consumes them. Even with
compute_health_score()fixed, these will remain silent until the upstream signal is also addressed. Happy to file as companion issues.Hotspot scoring algorithm has a structural ceiling well below the threshold.
compute_hotspot_scorereturns(weighted_commits / max_weighted) × (complexity_density / max_density) × 100. To reach 100 (or even 50), one file must be both max-churned and max-density. In real codebases the top-churned file has moderate density and vice-versa, so the product is structurally bounded well below theHOTSPOT_SCORE_THRESHOLD = 50.0filter atvital_signs.rs:131. On any sufficiently large repo, top-ranked hotspots reach only a fraction of 50 →hotspot_countis always 0. A percentile-based filter ("files in the top 1 % of the within-project hotspot ranking") would survive max-norm compression.MI per-file formula's small-file dampening pushes most files to MI ≥ 70.
compute_maintainability_indexis100 − density × 30 × dampening − dead_ratio × 20 − min(ln1p(fan_out) × 4, 15)wheredampening = min(lines / MI_DENSITY_MIN_LINES, 1.0). Files under 50 LOC (barrels, models, utility) get density damped toward 0, pinning their MI near 100 regardless of internal complexity. Result: well over 98 % of scored files end up with MI ≥ 70 on any TS-heavy codebase. Fixing the score-formula aggregator alone helps but per-file MI is still inflated.Why this matters
scores.rs:26-49describeshealth_scoreas a comprehensible0–100summary suitable for dashboards and CI gates. With 5/11 dimensions silently 0 and 2/11 saturated, the score is structurally unable to communicate "really, really bad" for any sufficiently large project. The underlying data is excellent; the problem is in how the score formula aggregates it.Reproduction
The bug is deterministic in the formula — given inputs in the shape produced by any large TS monorepo,
compute_health_score()returns a B-band score with five0.0penalties and two saturated10.0penalties. No real codebase required.Easiest: drop a unit test into fallow's own test suite
Following the existing pattern in
vital_signs.rs:1135+(health_score_perfect, etc.):Self-contained, runs in milliseconds. Same test with the recommended scale-invariant aggregators should drop the score by roughly one and a half letter grades (B → D).
End-to-end: synthetic monorepo generator
The script below produces a fully synthetic TS workspace whose
vital_signsreproduces the broken-dimension pattern end-to-end. Defaults generate ~21K files in ~3.5 min (mostly git churn);--commits-per-fat-file=2runs in under a minute with the same pattern. Smaller--packages/--files-per-pkgproduce the partial pattern (3-4 silent dimensions).Expected at defaults: score in the C band (~65), 5 of 11 penalties at
0.0(p90_complexity,maintainability,unit_size,coupling, plusdead_files/dead_exportssince synthetic data has no deads), 2 saturated at10.0(unused_deps,circular_deps). Bumping--fat-fns-per-pkgpast 5 silences hotspots and lifts the score into B.generate-monorepo.mjs— click to expandKnob → score-formula-input mapping:
--packagesunused_deps&circular_depssaturation--files-per-pkgunit_size,maintainability,hotspots--fat-fns-per-pkgmean / p90 / fixed-percent)--cycles-per-pkgcircular_dep_count--unused-deps-per-pkgunused_dep_countper package--commits-per-fat-fileOptional: against a real codebase
Look for: dimensions reporting
0.0in.penaltiespaired with non-zero "bulk" inputs in.vital_signs(p90_cyclomatic > 0,maintainability_avg > 0,unit_size_profile.very_high_risk > 0,p95_fan_in > 0), plusunused_deps/circular_depspegged at10.Expected behavior
The formula behaves exactly as written, the case for changing it: at large scale the calibration produces structurally false signal — not "wrong by a few points" but "5 of 11 dimensions cannot fire under any input distribution this codebase shape will produce", and "11 vs 1,000 unused deps score identically".
health_scoreshould:MI < 70, and thousands of ranked hotspots.Fallow version
fallow 2.62.0
Operating system
macOS
Configuration
default