Skip to content

SuperGzip Rust implementation update #1285

@MauricePasternak

Description

@MauricePasternak

Description

Noticed that gzipping runs after every BIDS2Legacy run. Knew it could be faster since the original was written in Python. Indeed, it is superior by all metrics if re-written in Rust:

  • 2.5x faster on a single thread
  • Orders faster on multithread, depending on the system
  • Slower than native find + grep + gzip on Unix on single thread b/c it validates that a filepath still exists. I'd argue this potentially makes it safer than find + grep + gzip if multiple ExploreASL workers are simultaneously trying to access similar Nifti files that may or may not exist by the time one of the workers is done gzipping.
  • Faster than find + grep + gzip on Unix when --num-threads = 3 or greater.
  • Executable size is ~9x smaller (~1.1MB vs Python implementation being ~10MB)

All major desktop releases available here: https://github.com/MauricePasternak/SuperGZip/releases/tag/v0.2.0

Tasks

  • Make adjustments to xASL_adm_GzipAllFiles.m

  • Test on macOS:
    HENK: with 8 cores on macbook Pro intel, zipping /ExploreASL/External/TestDataSet takes 2.6 s, unzipping 0.5 s.
    Using find | grep zipping takes 2.7 s.
    When I clone /ExploreASL/External/TestDataSet/rawdata 8 times, unzipping takes 2.7 s, zipping 7.4 s.
    Using find | grep zipping takes 2.9 s (but it skipped foldernames with spaces).
    Removing the spaces in foldernames:
    Using find | grep zipping takes 21.6 s (but it skipped foldernames with spaces).

      So for macOS: 3-fold increase in performance, is more flexible with foldernames, only downside is that we have yet 
      another mac application that needs approval when ran first time (but that's fine I would say)
    
  • Test on Windows server (28 cores; 32 files)
    zipping 16.2 s
    Windows Matlab method: 81.5 s
    so 5-fold increase (because of parallelization probably)

      Advantage: we don't need to use WSL for fast zipping, eases the code.
    
  • Test on old Linux server (28 cores): gives an error, can you include these libraries in the compilation?

/scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: version GLIBC_2.25' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux) /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: version GLIBC_2.28' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux)
/scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: version GLIBC_2.33' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux) /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: version GLIBC_2.18' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux)
/scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: version GLIBC_2.32' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux) /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux: /lib64/libc.so.6: version GLIBC_2.34' not found (required by /scratch/hjmutsaerts/ExploreASL/External/SuperGZip/super-gzip_linux)

Only disadvantage of SuperGzip (but find | grep has the same) is that we don't have a progress tracker, @MauricePasternak would this be possible to print on the screen? Or only when we select the non-verbose mode?

Minor spelling mistakes:
Deompressing -> Decompressing

Also: our imaging IT specialist Paul Groot asks if SuperGzip is faster than pigz?

How to test

glob_pat = join(['"' ROOT '/**/*.nii"']);
command = [PathToSuperGzip ' gzip ' glob_pat ' -n ' num2str(numCores) ' --verbose' ];
[exit_code, system_result] = system(command);

Release notes

Required: Updated SuperGzip to use the newer Rust implementation.

Metadata

Metadata

Labels

featureNew feature, enhancement or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions