[Core] (Resource Isolation 1/n) Unify resource isolation config construction into a single step#59372
Conversation
There was a problem hiding this comment.
Code Review
This pull request successfully unifies the construction of ResourceIsolationConfig into a single, more robust step. The refactoring of object store memory calculation into a utility function is a good improvement. The changes are well-implemented and the tests are updated accordingly. I have a couple of minor suggestions to improve the code further.
python/ray/tests/resource_isolation/test_resource_isolation_integration.py
Outdated
Show resolved
Hide resolved
python/ray/tests/resource_isolation/test_resource_isolation_integration.py
Outdated
Show resolved
Hide resolved
22bf1c1 to
0b94939
Compare
israbbani
left a comment
There was a problem hiding this comment.
Changes look good. A few comments around code organization. Let's clean up the get_configured_object_store_memory method in a follow up.
Signed-off-by: davik <davik@anyscale.com>
Signed-off-by: davik <davik@anyscale.com>
Signed-off-by: davik <davik@anyscale.com>
Signed-off-by: davik <davik@anyscale.com>
Signed-off-by: davik <davik@anyscale.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
…s to make sure it is only called once Signed-off-by: davik <davik@anyscale.com>
39ed8d3 to
84682b1
Compare
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
israbbani
left a comment
There was a problem hiding this comment.
LGTM. Thanks for cleaning this up.
We've still got work to do here so can you convert your JIRA into an issue and link the issue to relevant parts of the codebase with TODOs so we don't lose context?
|
Issue to track follow up tech debts exposed by this PR: #60683 |
Signed-off-by: davik <davik@anyscale.com>
Signed-off-by: davik <davik@anyscale.com>
Signed-off-by: davik <davik@anyscale.com>
Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
python/ray/_private/worker.py
Outdated
|
|
||
| resource_isolation_config = ResourceIsolationConfig( | ||
| enable_resource_isolation=enable_resource_isolation, | ||
| cgroup_path=_cgroup_path, |
There was a problem hiding this comment.
Undefined variable _cgroup_path causes NameError
High Severity
The code references _cgroup_path but this variable is never defined. The function parameter is cgroup_path (without underscore, defined at line 1435), but the refactored code incorrectly uses _cgroup_path. This will cause a NameError at runtime when ray.init() is called on the code path that starts a new local cluster.
Signed-off-by: davik <davik@anyscale.com>
|
@edoakes Could you help me merge this when you get the chance? Thanks! |
…ruction into a single step (ray-project#59372) ## Description The existing `resource_isolation_config` is constructed in two steps. 1. The class is first instantiated using the constructor. 2. The construction is complete when `add_object_store_memory` is called. This is undesirable, as forgetting to complete the second step in creating the `resource_isolation_config` can result in critical system misbehavior. This PR addresses the [TODO](https://github.com/ray-project/ray/blob/b13361a8045632a7c2308ce606b0487f096fd927/python/ray/_private/resource_isolation_config.py#L54-L56) to unify the construction of resource isolation config. ## Related issues ray-project#57653 ## Additional information N/A --------- Signed-off-by: davik <davik@anyscale.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com> Co-authored-by: davik <davik@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
…ruction into a single step (ray-project#59372) ## Description The existing `resource_isolation_config` is constructed in two steps. 1. The class is first instantiated using the constructor. 2. The construction is complete when `add_object_store_memory` is called. This is undesirable, as forgetting to complete the second step in creating the `resource_isolation_config` can result in critical system misbehavior. This PR addresses the [TODO](https://github.com/ray-project/ray/blob/b13361a8045632a7c2308ce606b0487f096fd927/python/ray/_private/resource_isolation_config.py#L54-L56) to unify the construction of resource isolation config. ## Related issues ray-project#57653 ## Additional information N/A --------- Signed-off-by: davik <davik@anyscale.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com> Co-authored-by: davik <davik@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
…ruction into a single step (ray-project#59372) ## Description The existing `resource_isolation_config` is constructed in two steps. 1. The class is first instantiated using the constructor. 2. The construction is complete when `add_object_store_memory` is called. This is undesirable, as forgetting to complete the second step in creating the `resource_isolation_config` can result in critical system misbehavior. This PR addresses the [TODO](https://github.com/ray-project/ray/blob/b13361a8045632a7c2308ce606b0487f096fd927/python/ray/_private/resource_isolation_config.py#L54-L56) to unify the construction of resource isolation config. ## Related issues ray-project#57653 ## Additional information N/A --------- Signed-off-by: davik <davik@anyscale.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com> Co-authored-by: davik <davik@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ruction into a single step (#59372) ## Description The existing `resource_isolation_config` is constructed in two steps. 1. The class is first instantiated using the constructor. 2. The construction is complete when `add_object_store_memory` is called. This is undesirable, as forgetting to complete the second step in creating the `resource_isolation_config` can result in critical system misbehavior. This PR addresses the [TODO](https://github.com/ray-project/ray/blob/b13361a8045632a7c2308ce606b0487f096fd927/python/ray/_private/resource_isolation_config.py#L54-L56) to unify the construction of resource isolation config. ## Related issues #57653 ## Additional information N/A --------- Signed-off-by: davik <davik@anyscale.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com> Co-authored-by: davik <davik@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…ruction into a single step (#59372) ## Description The existing `resource_isolation_config` is constructed in two steps. 1. The class is first instantiated using the constructor. 2. The construction is complete when `add_object_store_memory` is called. This is undesirable, as forgetting to complete the second step in creating the `resource_isolation_config` can result in critical system misbehavior. This PR addresses the [TODO](https://github.com/ray-project/ray/blob/b13361a8045632a7c2308ce606b0487f096fd927/python/ray/_private/resource_isolation_config.py#L54-L56) to unify the construction of resource isolation config. ## Related issues #57653 ## Additional information N/A --------- Signed-off-by: davik <davik@anyscale.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com> Co-authored-by: davik <davik@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ruction into a single step (ray-project#59372) ## Description The existing `resource_isolation_config` is constructed in two steps. 1. The class is first instantiated using the constructor. 2. The construction is complete when `add_object_store_memory` is called. This is undesirable, as forgetting to complete the second step in creating the `resource_isolation_config` can result in critical system misbehavior. This PR addresses the [TODO](https://github.com/ray-project/ray/blob/b13361a8045632a7c2308ce606b0487f096fd927/python/ray/_private/resource_isolation_config.py#L54-L56) to unify the construction of resource isolation config. ## Related issues ray-project#57653 ## Additional information N/A --------- Signed-off-by: davik <davik@anyscale.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com> Co-authored-by: davik <davik@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ruction into a single step (ray-project#59372) ## Description The existing `resource_isolation_config` is constructed in two steps. 1. The class is first instantiated using the constructor. 2. The construction is complete when `add_object_store_memory` is called. This is undesirable, as forgetting to complete the second step in creating the `resource_isolation_config` can result in critical system misbehavior. This PR addresses the [TODO](https://github.com/ray-project/ray/blob/b13361a8045632a7c2308ce606b0487f096fd927/python/ray/_private/resource_isolation_config.py#L54-L56) to unify the construction of resource isolation config. ## Related issues ray-project#57653 ## Additional information N/A --------- Signed-off-by: davik <davik@anyscale.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com> Co-authored-by: davik <davik@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Adel Nour <ans9868@nyu.edu>
…ruction into a single step (ray-project#59372) ## Description The existing `resource_isolation_config` is constructed in two steps. 1. The class is first instantiated using the constructor. 2. The construction is complete when `add_object_store_memory` is called. This is undesirable, as forgetting to complete the second step in creating the `resource_isolation_config` can result in critical system misbehavior. This PR addresses the [TODO](https://github.com/ray-project/ray/blob/b13361a8045632a7c2308ce606b0487f096fd927/python/ray/_private/resource_isolation_config.py#L54-L56) to unify the construction of resource isolation config. ## Related issues ray-project#57653 ## Additional information N/A --------- Signed-off-by: davik <davik@anyscale.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com> Co-authored-by: davik <davik@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ruction into a single step (ray-project#59372) ## Description The existing `resource_isolation_config` is constructed in two steps. 1. The class is first instantiated using the constructor. 2. The construction is complete when `add_object_store_memory` is called. This is undesirable, as forgetting to complete the second step in creating the `resource_isolation_config` can result in critical system misbehavior. This PR addresses the [TODO](https://github.com/ray-project/ray/blob/b13361a8045632a7c2308ce606b0487f096fd927/python/ray/_private/resource_isolation_config.py#L54-L56) to unify the construction of resource isolation config. ## Related issues ray-project#57653 ## Additional information N/A --------- Signed-off-by: davik <davik@anyscale.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com> Co-authored-by: davik <davik@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
…ruction into a single step (ray-project#59372) ## Description The existing `resource_isolation_config` is constructed in two steps. 1. The class is first instantiated using the constructor. 2. The construction is complete when `add_object_store_memory` is called. This is undesirable, as forgetting to complete the second step in creating the `resource_isolation_config` can result in critical system misbehavior. This PR addresses the [TODO](https://github.com/ray-project/ray/blob/b13361a8045632a7c2308ce606b0487f096fd927/python/ray/_private/resource_isolation_config.py#L54-L56) to unify the construction of resource isolation config. ## Related issues ray-project#57653 ## Additional information N/A --------- Signed-off-by: davik <davik@anyscale.com> Signed-off-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com> Co-authored-by: davik <davik@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: peterxcli <peterxcli@gmail.com>


Description
The existing
resource_isolation_configis constructed in two steps.add_object_store_memoryis called.This is undesirable, as forgetting to complete the second step in creating the
resource_isolation_configcan result in critical system misbehavior.This PR addresses the TODO to unify the construction of resource isolation config.
Related issues
#57653
Additional information
N/A