ENH: Allow toggling madvise hugepage and fix default#15769
ENH: Allow toggling madvise hugepage and fix default#15769mattip merged 5 commits intonumpy:masterfrom
Conversation
By default this disables madvise hugepage on kernels before 4.6, since we expect that these typically see large performance regressions when using hugepages due to slow defragementation code presumably fixed by: torvalds/linux@7cf91a9 This adds support to set the behaviour at startup time through the ``NUMPY_MADVISE_HUGEPAGE`` environment variable. Fixes numpygh-15545
529e281 to
2d6edb3
Compare
|
Maybe a page about "Global State", and mention |
These are options that are controlled typically through environment variable at startup or compile time.
rossbar
left a comment
There was a problem hiding this comment.
This seems like a nice feature - I mostly did a once-over on the docs.
Just for kicks, I tried this on my system (kernel v5.5.9) and did indeed see that performance was generally better with NumPy's use of hugepages was enabled. Using the original examples from #15545:
>>> import numpy as np; print(np.use_hugepage)
1
>>> n = int(1e9)
>>> %timeit np.zeros(n)
5.52 µs ± 75.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit np.random.rand(n)
5.87 s ± 76.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.linspace(0, 100, n)
2.49 s ± 50.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.exp(np.zeros(n))
7.05 s ± 152 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)And with NUMPY_MADVISE_HUGEPAGE=0:
>>> import numpy as np; print(np.use_hugepage)
0
>>> n = int(1e9)
>>> %timeit np.zeros(n)
7.86 µs ± 30.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit np.random.rand(n)
6.97 s ± 77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.linspace(0, 100, n)
3.78 s ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit np.exp(np.zeros(n))
8.09 s ± 68.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)| you can experience a significant speedup when transparent | ||
| hugepage is enabled. |
There was a problem hiding this comment.
Do you want to include an external link to linux docs on transparent hugepages? If so, you could consider this: https://www.kernel.org/doc/html/latest/vm/transhuge.html
There was a problem hiding this comment.
good idea. EDIT: Although this link is better: https://www.kernel.org/doc/html/latest/admin-guide/mm/transhuge.html
| The array function protocol which allows array-like objects to | ||
| hook into the NumPy API is currently enabled by default. | ||
| It can be disabled using:: |
There was a problem hiding this comment.
Perhaps mention that the feature was introduced in v1.17 and is enabled by default
| This flag is checked at import time. | ||
|
|
||
|
|
||
| Debugging |
There was a problem hiding this comment.
Same suggestion as above - maybe "Debugging-related Options" to be consistent
|
Meh, I confirmed that kernel 4.8 is fixed, then I went down a wrong turn and lost my progress on testing 4.6 as well :(. |
Co-Authored-By: Ross Barnowski <rossbar@berkeley.edu>
3de26ef to
1400cea
Compare
|
Not sure if it makes sense, but we could consider backporting this, although maybe it is too high risk? Since it tries to at least mitigate a performance regression. |
| } | ||
| _madvise_hugepage = enabled; | ||
| if (was_enabled) { | ||
| Py_RETURN_TRUE; |
There was a problem hiding this comment.
probably doesnt matter, since our code doesnt check the return val, but will this always return Py_RETURN_TRUE ?
There was a problem hiding this comment.
Not sure I follow, this function sets the behaviour and returns the current state, either True or False in the line below?
There was a problem hiding this comment.
but was_enabled was set to _madvise_hugepage value which is set to 1 above, it doesnt seem to change. isnt was_enabled always 1 ?
There was a problem hiding this comment.
Maybe I have to add a test to proof that it does, I admit. was_enabled is a copy of the global static _madvise_hugepage, which is modified with _madvise_hugepage = enabled. So if enabled is 0, on the next call was_enabled will be 0.
There was a problem hiding this comment.
okay thanks for explaining, i guess i misunderstood this function completely 😁 . i was expecting it to return true for platforms where _set_madvise_hugepage works (i.e. it is able to use this env variable and false where it doesnt, didnt realize was_enabled was for the next function call).
There was a problem hiding this comment.
A comment before the function with an explanation may help prevent future confusion.
| kernel_version = os.uname().release.split(".")[:2] | ||
| kernel_version = tuple(int(v) for v in kernel_version) | ||
| if kernel_version < (4, 6): | ||
| use_hugepage = 0 |
There was a problem hiding this comment.
I dont know if this will help here, but is it worth putting a message, saying that if using version less than 4.6 and you notice issues with large arrays try setting NUMPY_MADVISE_HUGEPAGE=1. If we do this we can also backport and it will be safer since there is a message shown to the user to fix the issue.
There was a problem hiding this comment.
Well, there seemed to be some consensus around not trying to guess what is probably right for most people and instead adding into an FAQ style tip page somewhere? I actually like guessing if it helps 90% of the people, since I think very few will find the setting or even know they are getting terrible performance...
There was a problem hiding this comment.
I think guessing is fine. Maybe the new troubleshooting page would be a good page to reference this setting.
There was a problem hiding this comment.
Ahh, I see you have already added a global_state page. That seems fine.
| {"_add_newdoc_ufunc", (PyCFunction)add_newdoc_ufunc, | ||
| METH_VARARGS, NULL}, | ||
| {"_set_madvise_hugepage", (PyCFunction)_set_madvise_hugepage, | ||
| METH_O, "Toggle and return madvise hugepage (no OS support check)."}, |
There was a problem hiding this comment.
maybe extend this docstring, even though it is a private function:
_set_madvise_hugepage(tf: bool) -> bool
Set or unset use of ``madvise (2)`` MADV_HUGEPAGE support when allocating
the array data. Returns the previously set value. See `global_state` for more
information.
9ffd839 to
828e761
Compare
828e761 to
3395802
Compare
|
close/reopen to restart builds |
|
Merging, the failing s390x job is unrelated. |
|
Thanks @seberg |
|
|
||
| On Linux NumPy has previously added support for madavise | ||
| hugepages which can improve performance for very large arrays. | ||
| Unfortunately, on older Kernel versions this led to peformance |
There was a problem hiding this comment.
I realise I'm late to the party since this has already been merged, but "peformance" -> "performance".
|
hi i have probleme with kernel name ovh "4.19-ovh-xxxx-std-ipv6-64" : kernel_version = tuple(int(v) for v in kernel_version) |
|
@Megaland90 You are looking for gh-16679 which will be part of the upcoming 1.19.1. |
|
Thanks @mattip :D |
|
@Megaland90 thanks for the report. For future reference, opening a new issue rather than adding comments to a PR helps keep things more organized (although I just added two more comments myself :)) |
By default this disables madvise hugepage on kernels before 4.6, since
we expect that these typically see large performance regressions when
using hugepages due to slow defragementation code presumably fixed by:
torvalds/linux@7cf91a9
This adds support to set the behaviour at startup time through the
NUMPY_MADVISE_HUGEPAGEenvironment variable.Fixes gh-15545
Still needs some documentation somewhere probably... I am not sure where though, a new site listing all environment variables used during compile or startup time? Relaxed strides, experimental array function protocol, this one, ...?