[core] Make preloading Jemalloc configurable for worker#47243
[core] Make preloading Jemalloc configurable for worker#47243edoakes merged 1 commit intoray-project:masterfrom
Conversation
rkooo567
left a comment
There was a problem hiding this comment.
can you add a new section in this page to explain how to enable jemalloc for workers? https://docs.ray.io/en/master/ray-core/miscellaneous.html#tuning-ray-settings
|
Please add your inference example and benchmark as well. |
@rkooo567 Sure, I'll add a doc for this feature. However, why not adding the description in the memory-profiling doc? |
@hongchaodeng I have added the example in the description, and I think the previous pasted image could describe the benchmark well. |
I assume it is more of performance feature (that optimizes the memory usage). is enabling jemalloc allowing you to do mem profiling as well? |
I see, enabling Jemalloc for workers could also benefit for memory profiling, I will mentation this in the tuning and profiling page. |
|
We also need to enable jemalloc for the worker processes because some C++ code is wrapped into Python workers, jemalloc is very effective for optimizing the memory of this part of c++ code. |
|
@Myasuka this makes sense to me, is this still something you want to merge? |
Sure, could you help to review and merge this PR? I will update this PR this week. |
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
@rkooo567 since https://docs.ray.io/en/master/ray-core/miscellaneous.html#tuning-ray-settings have been removed, I still add the docs in https://docs.ray.io/en/master/ray-contribute/profiling.html#memory-profiling @dayshah Please also take a review on this update. |
dstrodtman
left a comment
There was a problem hiding this comment.
RAY_LD_PRELOAD description was not parsable. Can you review/iterate on my edit to make sure it's technically correct?
Docs should never contain future tense ("will") or passive voice (be verb with a past tense verb). Thanks!
dayshah
left a comment
There was a problem hiding this comment.
generally lgtm, can you address the doc comments
Also, I would prefer if we rename the RAY_LD_PRELOAD env variable to something like RAY_LD_PRELOAD_ON_WORKERS, which gets set to 0 by default. This makes it easier for a user to understand without looking through code and docs.
|
@dayshah Just wondering, any reason why we're using 0/1 here instead of true/false? If this is a user-facing concept and the values are equivalent, the boolean operators are more friendly to humans. |
I think it's because 0 and 1 usually translate to boolean values in C++, and the environment variables used in C++ are set using 0 or 1, like these ray/src/ray/common/ray_config_def.h Line 107 in 255f7a9 I don't see a reason to not use true/false here though, since we're checking against the number. |
I think |
|
I have updated this PR, please take a look @dayshah @dstrodtman |
|
https://buildkite.com/ray-project/premerge/builds/41233 |
817c168 to
905ac3a
Compare
@dayshah updated and all tests passes. |
python/ray/_private/services.py
Outdated
There was a problem hiding this comment.
why is "RAY_LD_PRELOAD" removed here?
There was a problem hiding this comment.
I agree that RAY_LD_PRELOAD_ON_WORKERS looks better, if we still keep the undocumented RAY_LD_PRELOAD, it will make the code looks a bit weird.
generally lgtm, can you address the doc comments
Also, I would prefer if we rename the
RAY_LD_PRELOADenv variable to something likeRAY_LD_PRELOAD_ON_WORKERS, which gets set to 0 by default. This makes it easier for a user to understand without looking through code and docs.
There was a problem hiding this comment.
Did RAY_LD_PRELOAD only apply to workers previously? I think that might be my gap in understanding. If that's the case, then this change LGTM. I was assuming that RAY_LD_PRELOAD also applied to system-level processes.
There was a problem hiding this comment.
YES, RAY_LD_PRELOAD was first introduced in https://github.com/ray-project/ray/pull/39446/files and only be used within that PR.
We can set `RAY_LD_PRELOAD_ON_WORKERS` as `true` with `RAY_JEMALLOC_LIB_PATH` and `RAY_JEMALLOC_PROFILE` provided to also preload jemalloc for worker. This is a fix after ray-project#39446 Signed-off-by: Yun Tang <myasuka@live.com>
Why are these changes needed?
The PR #39446 disables preloading Jemalloc for workers totally. However, Jemalloc is still useful in some cases, and we could make it configurable if user setting env RAY_LD_PRELOAD as 0.
The batch inference example code, using a TF model to infer the batch input of numpy's ndarray.
I did a inference test with limited memory, and we can see the OOM counts decrease from 900+ to 700.

Related issue number
Closes #47242
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.