ENH: Allow toggling madvise hugepage and fix default by seberg · Pull Request #15769 · numpy/numpy

seberg · 2020-03-17T18:18:50Z

By default this disables madvise hugepage on kernels before 4.6, since
we expect that these typically see large performance regressions when
using hugepages due to slow defragementation code presumably fixed by:

torvalds/linux@7cf91a9

This adds support to set the behaviour at startup time through the
NUMPY_MADVISE_HUGEPAGE environment variable.

Fixes gh-15545

Still needs some documentation somewhere probably... I am not sure where though, a new site listing all environment variables used during compile or startup time? Relaxed strides, experimental array function protocol, this one, ...?

By default this disables madvise hugepage on kernels before 4.6, since we expect that these typically see large performance regressions when using hugepages due to slow defragementation code presumably fixed by: torvalds/linux@7cf91a9 This adds support to set the behaviour at startup time through the ``NUMPY_MADVISE_HUGEPAGE`` environment variable. Fixes numpygh-15545

mattip · 2020-03-17T18:22:53Z

Maybe a page about "Global State", and mention threadpoolctl as well, kind of like in the python documentation

These are options that are controlled typically through environment variable at startup or compile time.

rossbar

This seems like a nice feature - I mostly did a once-over on the docs.

Just for kicks, I tried this on my system (kernel v5.5.9) and did indeed see that performance was generally better with NumPy's use of hugepages was enabled. Using the original examples from #15545:

>>> import numpy as np; print(np.use_hugepage)
1
>>> n = int(1e9)
>>> %timeit np.zeros(n)                                                                  
5.52 µs ± 75.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit np.random.rand(n)                                                            
5.87 s ± 76.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.linspace(0, 100, n)                                                       
2.49 s ± 50.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.exp(np.zeros(n))                                                          
7.05 s ± 152 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

And with NUMPY_MADVISE_HUGEPAGE=0:

>>> import numpy as np; print(np.use_hugepage)
0
>>> n = int(1e9)
>>> %timeit np.zeros(n)                                                                  
7.86 µs ± 30.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit np.random.rand(n)                                                            
6.97 s ± 77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.linspace(0, 100, n)                                                         
3.78 s ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.exp(np.zeros(n))                                                          
8.09 s ± 68.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)