Skip to content

limit NUM_THREADS=1 for numpy/scipy.linalg to save CPU usage#620

Merged
yunjunz merged 2 commits intoinsarlab:mainfrom
yunjunz:num_threads
Jul 29, 2021
Merged

limit NUM_THREADS=1 for numpy/scipy.linalg to save CPU usage#620
yunjunz merged 2 commits intoinsarlab:mainfrom
yunjunz:num_threads

Conversation

@yunjunz
Copy link
Member

@yunjunz yunjunz commented Jul 29, 2021

Description of proposed changes

Testing shows that numpy/scipy with OMP_NUM_THREADS > 1 does not help much on the computing time but uses significantly more CPU. The true fast way is Dask with multiple workers + OMP_NUM_THREADS = 1. Thus, I am setting all the relevant NUM_THREADS env variables to 1, in ifgram_inversion.py and dem_error.py, before running the big matrix inversion and roll back to the original values afterward.

Here is the test on ifgram_inversion.py inputs/ifgramStack.h5 -t smallbaselineApp.cfg on my laptop:

Dataset: SanFranSenDT42 version 1.x, patch 1 (505 x 510 x 1021) only
Machine 1: Mac (6 Intel i7 CPUs/cores with 2.6 GHz)
| dask (worker) | OMP_NUM_THREADS | Time used (sec) | CPU usage |
|   no   (0)    |        4        |      850        | 1 x 300%  |
|   no   (0)    |        1        |      930        | 1 x 100%  |
| local  (4)    |        4        |      580        | 4 x 250%  |
| local  (4)    |        1        |      420        | 4 x 100%  |

Machine 2: Linux local cluster (16 Intel E5 CPUs/cores with 2.4 GHz)
| dask (worker) | OMP_NUM_THREADS | Time used (sec) | CPU usage |
|   no   (0)    |        4        |     1400        | 1 x 400%  |
|   no   (0)    |        1        |     1250        | 1 x 100%  |
| local  (4)    |        4        |      750        | 4 x 320%  |
| local  (4)    |        1        |      500        | 4 x 100%  |

@falkamelung, @Ovec8hkin could you help to confirm: if this PR does give a slight speedup and significant CPU save on HPC?
Update: I am merging this now, but it would still be useful if you could confirm on HPC.

Reminders

  • Pass Codacy code review (green)
  • Pass Circle CI / local test (green)
  • If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration.
  • If adding new functionality, add a detailed description to the documentation and/or an example.

yunjunz added 2 commits July 29, 2021 13:17
+ utils.readfile.read_gdal_vrt(): support reading metadata from file without SRS info by setting defalt X/Y_UNIT in degrees
+ defaults.smallbaselineApp.cfg: add comments on SNAPHU as the only unwrapper that provides connected components info.
+ dask: use explicit cluster name instead of non-local to avoid confusion in the smallbaselineApp.cfg and dask.md files.
+ view: printout multilook-num even if not calculated from auto_*(); plot water/shadowMask in gray by default
Testing shows that numpy/scipy with OMP_NUM_THREADS > 1 does not help much on the computing time but uses significantly more CPU. The true fast way is "Dask with multiple workers + OMP_NUM_THREADS = 1". Thus, I am setting all the relevant NUM_THREADS env variables to 1, in ifgram_inversion.py and dem_error.py, before running the big matrix inversion and roll back to the original values afterwards.

+ objects/cluster:
   - add set_num_threads() to save original num of threads info and set all relevant env var to the given value
   - add roll_back_num_threads() to roll back to the original setting

+ ifgram_inversion/dem_error: call the above two functions in the begining and the end of the process, to ignore the *_NUM_THREADS setting during the numpy/scipy matrix computation.
@yunjunz yunjunz merged commit 6161d95 into insarlab:main Jul 29, 2021
@yunjunz yunjunz deleted the num_threads branch July 29, 2021 20:32
@yunjunz yunjunz added this to the Big Data milestone Aug 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant