[core] Make preloading Jemalloc configurable for worker by Myasuka · Pull Request #47243 · ray-project/ray

Myasuka · 2024-08-21T09:28:25Z

Why are these changes needed?

The PR #39446 disables preloading Jemalloc for workers totally. However, Jemalloc is still useful in some cases, and we could make it configurable if user setting env RAY_LD_PRELOAD as 0.

The batch inference example code, using a TF model to infer the batch input of numpy's ndarray.

ds = ray.data.read_tfrecords(xxx)
ds.map_batches(BatchPredictor)
.map_batches(BatchPostProcessor)
.write_parquet(path=output_path

I did a inference test with limited memory, and we can see the OOM counts decrease from 900+ to 700.

Related issue number

Closes #47242

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Myasuka · 2024-08-22T12:18:08Z

@fishbone @rkooo567 could you please take a review?

rkooo567

can you add a new section in this page to explain how to enable jemalloc for workers? https://docs.ray.io/en/master/ray-core/miscellaneous.html#tuning-ray-settings

hongchaodeng · 2024-08-28T19:39:43Z

Please add your inference example and benchmark as well.

Myasuka · 2024-08-29T12:17:20Z

can you add a new section in this page to explain how to enable jemalloc for workers? https://docs.ray.io/en/master/ray-core/miscellaneous.html#tuning-ray-settings

@rkooo567 Sure, I'll add a doc for this feature. However, why not adding the description in the memory-profiling doc?

Myasuka · 2024-08-29T12:34:07Z

Please add your inference example and benchmark as well.

@hongchaodeng I have added the example in the description, and I think the previous pasted image could describe the benchmark well.

rkooo567 · 2024-08-29T16:14:49Z

@rkooo567 Sure, I'll add a doc for this feature. However, why not adding the description in the memory-profiling doc?

I assume it is more of performance feature (that optimizes the memory usage). is enabling jemalloc allowing you to do mem profiling as well?

Myasuka · 2024-09-05T13:42:06Z

@rkooo567 Sure, I'll add a doc for this feature. However, why not adding the description in the memory-profiling doc?

I assume it is more of performance feature (that optimizes the memory usage). is enabling jemalloc allowing you to do mem profiling as well?

I see, enabling Jemalloc for workers could also benefit for memory profiling, I will mentation this in the tuning and profiling page.

MissiontoMars · 2024-10-30T07:24:08Z

We also need to enable jemalloc for the worker processes because some C++ code is wrapped into Python workers, jemalloc is very effective for optimizing the memory of this part of c++ code.

dayshah · 2025-05-06T20:56:48Z

@Myasuka this makes sense to me, is this still something you want to merge?

Myasuka · 2025-05-12T09:12:32Z

@Myasuka this makes sense to me, is this still something you want to merge?

Sure, could you help to review and merge this PR? I will update this PR this week.

github-actions · 2025-06-02T00:34:25Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Myasuka · 2025-06-02T18:23:35Z

@rkooo567 since https://docs.ray.io/en/master/ray-core/miscellaneous.html#tuning-ray-settings have been removed, I still add the docs in https://docs.ray.io/en/master/ray-contribute/profiling.html#memory-profiling

@dayshah Please also take a review on this update.

doc/source/ray-contribute/profiling.rst

dstrodtman

RAY_LD_PRELOAD description was not parsable. Can you review/iterate on my edit to make sure it's technically correct?

Docs should never contain future tense ("will") or passive voice (be verb with a past tense verb). Thanks!

dayshah

generally lgtm, can you address the doc comments

Also, I would prefer if we rename the RAY_LD_PRELOAD env variable to something like RAY_LD_PRELOAD_ON_WORKERS, which gets set to 0 by default. This makes it easier for a user to understand without looking through code and docs.

dstrodtman · 2025-06-02T18:48:04Z

@dayshah Just wondering, any reason why we're using 0/1 here instead of true/false? If this is a user-facing concept and the values are equivalent, the boolean operators are more friendly to humans.

dayshah · 2025-06-02T19:11:51Z

@dayshah Just wondering, any reason why we're using 0/1 here instead of true/false? If this is a user-facing concept and the values are equivalent, the boolean operators are more friendly to humans.

I think it's because 0 and 1 usually translate to boolean values in C++, and the environment variables used in C++ are set using 0 or 1, like these

ray/src/ray/common/ray_config_def.h

Line 107 in 255f7a9

RAY_CONFIG(bool, report_actor_placement_resources, true)

I don't see a reason to not use true/false here though, since we're checking against the number.

Myasuka · 2025-06-03T14:47:49Z

generally lgtm, can you address the doc comments

Also, I would prefer if we rename the RAY_LD_PRELOAD env variable to something like RAY_LD_PRELOAD_ON_WORKERS, which gets set to 0 by default. This makes it easier for a user to understand without looking through code and docs.

I think RAY_LD_PRELOAD_ON_WORKERS looks more readable, and we could setting it as false by default to align with previous behavior.

Myasuka · 2025-06-03T15:42:40Z

I have updated this PR, please take a look @dayshah @dstrodtman

dayshah · 2025-06-03T16:49:59Z

https://buildkite.com/ray-project/premerge/builds/41233
some lint failures

Myasuka · 2025-06-05T08:15:18Z

https://buildkite.com/ray-project/premerge/builds/41233 some lint failures

@dayshah updated and all tests passes.

dayshah

ty! lgtm @edoakes @jjyao to merge

doc/source/ray-contribute/profiling.rst

python/ray/tests/test_advanced_4.py

edoakes · 2025-06-05T16:15:10Z

python/ray/_private/services.py

why is "RAY_LD_PRELOAD" removed here?

I agree that RAY_LD_PRELOAD_ON_WORKERS looks better, if we still keep the undocumented RAY_LD_PRELOAD, it will make the code looks a bit weird.

generally lgtm, can you address the doc comments

Also, I would prefer if we rename the RAY_LD_PRELOAD env variable to something like RAY_LD_PRELOAD_ON_WORKERS, which gets set to 0 by default. This makes it easier for a user to understand without looking through code and docs.

Did RAY_LD_PRELOAD only apply to workers previously? I think that might be my gap in understanding. If that's the case, then this change LGTM. I was assuming that RAY_LD_PRELOAD also applied to system-level processes.

YES, RAY_LD_PRELOAD was first introduced in https://github.com/ray-project/ray/pull/39446/files and only be used within that PR.

We can set `RAY_LD_PRELOAD_ON_WORKERS` as `true` with `RAY_JEMALLOC_LIB_PATH` and `RAY_JEMALLOC_PROFILE` provided to also preload jemalloc for worker. This is a fix after ray-project#39446 Signed-off-by: Yun Tang <myasuka@live.com>

anyscalesam added triage Needs triage (eg: priority, bug/not-bug, and owning component) core Issues that should be addressed in Ray Core labels Aug 26, 2024

hongchaodeng self-assigned this Aug 26, 2024

hongchaodeng self-requested a review August 26, 2024 18:43

rkooo567 self-assigned this Aug 26, 2024

rkooo567 added the go add ONLY when ready to merge, run all tests label Aug 28, 2024

rkooo567 reviewed Aug 28, 2024

View reviewed changes

jjyao added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 16, 2024

hainesmichaelc added the community-contribution Contributed by the community label Apr 4, 2025

jjyao assigned dayshah and unassigned hongchaodeng and rkooo567 May 6, 2025

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 2, 2025

Myasuka force-pushed the jemalloc-fix branch from fe89553 to afa18de Compare June 2, 2025 18:19

Myasuka requested a review from a team as a code owner June 2, 2025 18:19

Myasuka force-pushed the jemalloc-fix branch from afa18de to 5d8b1f6 Compare June 2, 2025 18:25

dstrodtman reviewed Jun 2, 2025

View reviewed changes

doc/source/ray-contribute/profiling.rst Outdated Show resolved Hide resolved

dstrodtman reviewed Jun 2, 2025

View reviewed changes

dayshah reviewed Jun 2, 2025

View reviewed changes

dayshah removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 2, 2025

Myasuka force-pushed the jemalloc-fix branch from 5d8b1f6 to 390ade1 Compare June 3, 2025 15:41

Myasuka force-pushed the jemalloc-fix branch 2 times, most recently from 817c168 to 905ac3a Compare June 4, 2025 16:05

dayshah approved these changes Jun 5, 2025

View reviewed changes

doc/source/ray-contribute/profiling.rst Outdated Show resolved Hide resolved

edoakes reviewed Jun 5, 2025

View reviewed changes

Myasuka force-pushed the jemalloc-fix branch from 221c084 to edfc0ce Compare June 8, 2025 17:08

Myasuka force-pushed the jemalloc-fix branch from 2206add to 87b00b8 Compare June 9, 2025 16:11

edoakes approved these changes Jun 9, 2025

View reviewed changes

edoakes enabled auto-merge (squash) June 9, 2025 16:31

edoakes merged commit 78e7c5a into ray-project:master Jun 9, 2025
6 checks passed

dayshah mentioned this pull request Aug 26, 2025

[Core] Actor RSS does not drop even if all object refs are released #53261

Open

Conversation

Myasuka commented Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Myasuka commented Aug 22, 2024

Uh oh!

rkooo567 left a comment

Choose a reason for hiding this comment

Uh oh!

hongchaodeng commented Aug 28, 2024

Uh oh!

Myasuka commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Myasuka commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkooo567 commented Aug 29, 2024

Uh oh!

Myasuka commented Sep 5, 2024

Uh oh!

MissiontoMars commented Oct 30, 2024

Uh oh!

dayshah commented May 6, 2025

Uh oh!

Myasuka commented May 12, 2025

Uh oh!

github-actions bot commented Jun 2, 2025

Uh oh!

Myasuka commented Jun 2, 2025

Uh oh!

Uh oh!

dstrodtman left a comment

Choose a reason for hiding this comment

Uh oh!

dayshah left a comment

Choose a reason for hiding this comment

Uh oh!

dstrodtman commented Jun 2, 2025

Uh oh!

dayshah commented Jun 2, 2025

Uh oh!

Myasuka commented Jun 3, 2025

Uh oh!

Myasuka commented Jun 3, 2025

Uh oh!

dayshah commented Jun 3, 2025

Uh oh!

Myasuka commented Jun 5, 2025

Uh oh!

dayshah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

edoakes Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Myasuka Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

edoakes Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Myasuka Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Myasuka commented Aug 21, 2024 •

edited

Loading

Myasuka commented Aug 29, 2024 •

edited

Loading

Myasuka commented Aug 29, 2024 •

edited

Loading