This follow up to #22225 . The merged changes already greatly improve the formatting time.
But profiles reveal that formatting is still blocked by config discovery. All formatting threads are idling for ~1s before doing any work.
All format worker threads during the pre-scan phase:
# overview of work of the first "block" before real formatting starts
rayon_core::registry::WorkerThread::wait_until_cold
→ par_bridge::IterParallelProducer::fold_with
→ _pthread_mutex_firstfit_lock_slow
→ _pthread_mutex_firstfit_lock_wait
→ __psynch_mutexwait ← 96% of samples
par_bridge() holds its internal mutex while blocking on the empty mpsc::Receiver. Since the channel is empty (no files sent yet — pre-scan is still running), all 12 workers pile up waiting for that same mutex. 96% of their time is spent in __psynch_mutexwait.
The root cause is of architectural nature. The current architecture looks like this:
- Pre-scan (parallel walk, dir I/O only) → builds scope map
- Main walk (parallel) → sends files to channel
- Format workers (rayon, par_bridge) → consume channel
It's currently a linear sequence of steps. Phase 2 cannot start until phase 1 is completed and phase 3 cannot start until phase 2 is done. We read the same directories twice in both phase 1 + 2. This is duplicate work that isn't needed in an ideal world.
The real fix architectural: Replace the three phases with a single parallel walk. Since config files cannot affect files higher up in the tree, we know that all matching files in the current directory can be formatted immediately. Something like this:
- Thread reads all entries in
foo/bar/
- Scan entry names for config filenames — no extra syscalls, names come free from
readdir
- No config found → stream all files in
foo/bar/ immediately using parent scope, recurse into subdirs
- Config found → load it (one read), use it for all files in
foo/bar/, recurse into subdirs with new scope
This follow up to #22225 . The merged changes already greatly improve the formatting time.
But profiles reveal that formatting is still blocked by config discovery. All formatting threads are idling for ~1s before doing any work.
All format worker threads during the pre-scan phase:
# overview of work of the first "block" before real formatting starts rayon_core::registry::WorkerThread::wait_until_cold → par_bridge::IterParallelProducer::fold_with → _pthread_mutex_firstfit_lock_slow → _pthread_mutex_firstfit_lock_wait → __psynch_mutexwait ← 96% of samplespar_bridge()holds its internal mutex while blocking on the emptympsc::Receiver. Since the channel is empty (no files sent yet — pre-scan is still running), all 12 workers pile up waiting for that same mutex. 96% of their time is spent in__psynch_mutexwait.The root cause is of architectural nature. The current architecture looks like this:
It's currently a linear sequence of steps. Phase 2 cannot start until phase 1 is completed and phase 3 cannot start until phase 2 is done. We read the same directories twice in both phase 1 + 2. This is duplicate work that isn't needed in an ideal world.
The real fix architectural: Replace the three phases with a single parallel walk. Since config files cannot affect files higher up in the tree, we know that all matching files in the current directory can be formatted immediately. Something like this:
foo/bar/readdirfoo/bar/immediately using parent scope, recurse into subdirsfoo/bar/, recurse into subdirs with new scope