MRG, ENH: Minor speedup of permutation tests#7784
Conversation
larsoner
left a comment
There was a problem hiding this comment.
Speedups are always helpful! Just a couple of minor comments
mne/stats/cluster_level.py
Outdated
| len_c = len(c) | ||
| scores[c] += h * (len_c ** e_power) | ||
| # turn sums into ndarray after running | ||
| sums = np.concatenate(sums) if sums else np.empty(0) |
There was a problem hiding this comment.
sums is 1D so it's simpler just to do sums = np.array(sums) as np.array([]) is the same as np.empty(0)
There was a problem hiding this comment.
Because sums is a list of np.array, sums = np.array(sums) breaks other functions.
I would change it to sums = np.concatenate(sums) if sums else np.array([]).
Current results12-15% speedup n_perm=100, n_space=100, n_time=100370 ms ± 4.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) n_perm=1000, n_space=100, n_time=1003.39 s ± 39.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) n_perm=1000, n_space=300, n_time=3005.11 s ± 39.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) n_perm=1000, n_space=1000, n_time=100014.764962434768677 The performance is better than I thought 🥳 |
|
Thanks @yh-luo ! |
|
(FYI, we are not notified of title changes, so thanks for the ping to review -- it probably would have taken me a while to notice otherwise) |
Add benchmark scripts and feasibility documentation for GPU-accelerating the spatio-temporal cluster-based permutation test, which is the mne-tools#1 computational bottleneck for MNE researchers doing source-space analyses. The connected-component labeling step in _get_components() consumes ~97% of permutation test runtime. This adds: - gpu_accel/benchmark_cluster_cpu.py: CPU baseline benchmark using the MNE sample dataset (fsaverage ico-5, ~20K vertices) - gpu_accel/patch_cupy_poc.py: CuPy proof-of-concept that monkey-patches _get_components with GPU connected_components (NVIDIA) - gpu_accel/FEASIBILITY.md: Full analysis of GPU CCL algorithms, hardware requirements, and a three-phase plan (CuPy PoC → wgpu+Rust → fused pipeline) - CLAUDE.md: Development guide with uv setup instructions and GPU work context Related: mne-tools#5439, mne-tools#12609, mne-tools#7784, mne-tools#8095, mne-tools#13175 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Minor refactoring to speed up permutation tests. This PR brings about 1~2% speedup (best case 5%).
Testing
Setup
n_perm=100, n_space=100, n_time=100
5.2% speedup
n_perm=1000, n_space=100, n_time=100
2.6% speedup
n_perm=1000, n_space=300, n_time=300
1.7% speedup
n_perm=1000, n_space=1000, n_time=1000
1.5% speedup