-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Description
Noticed that gzipping runs after every BIDS2Legacy run. Knew it could be faster since the original was written in Python. Indeed, it is superior by all metrics if re-written in Rust:
- 2.5x faster on a single thread
- Orders faster on multithread, depending on the system
- Slower than native find + grep + gzip on Unix on single thread b/c it validates that a filepath still exists. I'd argue this potentially makes it safer than find + grep + gzip if multiple ExploreASL workers are simultaneously trying to access similar Nifti files that may or may not exist by the time one of the workers is done gzipping.
- Faster than find + grep + gzip on Unix when --num-threads = 3 or greater.
- Executable size is ~9x smaller (~1.1MB vs Python implementation being ~10MB)
All major desktop releases available here: https://github.com/MauricePasternak/SuperGZip/releases/tag/v0.2.0
Tasks
-
Make adjustments to
xASL_adm_GzipAllFiles.m -
Test on macOS:
HENK: with 8 cores on macbook Pro intel, zipping/ExploreASL/External/TestDataSettakes 2.6 s, unzipping 0.5 s.
Usingfind | grepzipping takes 2.7 s.
When I clone/ExploreASL/External/TestDataSet/rawdata8 times, unzipping takes 2.7 s, zipping 7.4 s.
Usingfind | grepzipping takes 2.9 s (but it skipped foldernames with spaces).
Removing the spaces in foldernames:
Usingfind | grepzipping takes 21.6 s (but it skipped foldernames with spaces).So for macOS: 3-fold increase in performance, is more flexible with foldernames, only downside is that we have yet another mac application that needs approval when ran first time (but that's fine I would say) -
Test on Windows server (28 cores; 32 files)
zipping 16.2 s
Windows Matlab method: 81.5 s
so 5-fold increase (because of parallelization probably)Advantage: we don't need to use WSL for fast zipping, eases the code. -
Test on old Linux server (28 cores): gives an error, can you include these libraries in the compilation?
/scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: version
GLIBC_2.25' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux) /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: versionGLIBC_2.28' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux)
/scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: versionGLIBC_2.33' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux) /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: versionGLIBC_2.18' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux)
/scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: versionGLIBC_2.32' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux) /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: versionGLIBC_2.34' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux)
Only disadvantage of SuperGzip (but find | grep has the same) is that we don't have a progress tracker, @MauricePasternak would this be possible to print on the screen? Or only when we select the non-verbose mode?
Minor spelling mistakes:
Deompressing -> Decompressing
Also: our imaging IT specialist Paul Groot asks if SuperGzip is faster than pigz?
How to test
glob_pat = join(['"' ROOT '/**/*.nii"']);
command = [PathToSuperGzip ' gzip ' glob_pat ' -n ' num2str(numCores) ' --verbose' ];
[exit_code, system_result] = system(command);Release notes
Required: Updated SuperGzip to use the newer Rust implementation.