[core] (cgroups 9/n) end-to-end integration of cgroups with ray start.#56352
Merged
[core] (cgroups 9/n) end-to-end integration of cgroups with ray start.#56352
Conversation
to perform cgroup operations. Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
instead of clone for older kernel headers < 5.7 (which is what we have in CI) Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
…irabbani/cgroups-1
Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
…irabbani/cgroups-1
Signed-off-by: irabbani <irabbani@anyscale.com>
fix CI. Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com>
Contributor
Author
|
I can merge/promote fixtures to the main conftest and use them. I don't have a good idea for how to prevent these tests from running as part of normal unit tests without marking them as manual. I'll leave a TODO in #54703 and see if I can come up with anything better. |
| instance_type: medium | ||
| commands: | ||
| - RAYCI_DISABLE_TEST_DB=1 bazel run //ci/ray_ci:test_in_docker -- //:all //src/ray/common/cgroup2/tests/... core --build-type clang --cache-test-results | ||
| - bazel run //ci/ray_ci:test_in_docker -- //:all //python/ray/tests/resource_isolation:test_resource_isolation_integration //python/ray/tests/resource_isolation:test_resource_isolation_config core --privileged --cache-test-results |
Contributor
There was a problem hiding this comment.
do we remove the RAYCI_DISABLE_TEST_DB=1 accidentally?
Collaborator
There was a problem hiding this comment.
oh yeah that should get added back
edoakes
added a commit
that referenced
this pull request
Sep 17, 2025
…r to move processes into system cgroup (#56446) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
zma2
pushed a commit
to zma2/ray
that referenced
this pull request
Sep 23, 2025
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Zhiqiang Ma <zhiqiang.ma@intel.com>
zma2
pushed a commit
to zma2/ray
that referenced
this pull request
Sep 23, 2025
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Zhiqiang Ma <zhiqiang.ma@intel.com>
edoakes
added a commit
that referenced
this pull request
Sep 23, 2025
…n startup (#56522) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in #56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
ZacAttack
pushed a commit
to ZacAttack/ray
that referenced
this pull request
Sep 24, 2025
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: zac <zac@anyscale.com>
ZacAttack
pushed a commit
to ZacAttack/ray
that referenced
this pull request
Sep 24, 2025
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: zac <zac@anyscale.com>
ZacAttack
pushed a commit
to ZacAttack/ray
that referenced
this pull request
Sep 24, 2025
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: zac <zac@anyscale.com>
elliot-barn
pushed a commit
that referenced
this pull request
Sep 24, 2025
#56352) This PR stacks on #56297. For more details about the resource isolation project see #54703. This PR wires the resource isolation config (introduced here #51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn
pushed a commit
that referenced
this pull request
Sep 24, 2025
…r to move processes into system cgroup (#56446) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn
pushed a commit
that referenced
this pull request
Sep 24, 2025
…n startup (#56522) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in #56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
marcostephan
pushed a commit
to marcostephan/ray
that referenced
this pull request
Sep 24, 2025
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Marco Stephan <marco@magic.dev>
marcostephan
pushed a commit
to marcostephan/ray
that referenced
this pull request
Sep 24, 2025
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Marco Stephan <marco@magic.dev>
marcostephan
pushed a commit
to marcostephan/ray
that referenced
this pull request
Sep 24, 2025
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Marco Stephan <marco@magic.dev>
elliot-barn
pushed a commit
that referenced
this pull request
Sep 27, 2025
#56352) This PR stacks on #56297. For more details about the resource isolation project see #54703. This PR wires the resource isolation config (introduced here #51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn
pushed a commit
that referenced
this pull request
Sep 27, 2025
…r to move processes into system cgroup (#56446) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn
pushed a commit
that referenced
this pull request
Sep 27, 2025
…n startup (#56522) This PR stacks on #56352 . For more details about the resource isolation project see #54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in #56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
dstrodtman
pushed a commit
to dstrodtman/ray
that referenced
this pull request
Oct 6, 2025
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
dstrodtman
pushed a commit
to dstrodtman/ray
that referenced
this pull request
Oct 6, 2025
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
dstrodtman
pushed a commit
to dstrodtman/ray
that referenced
this pull request
Oct 6, 2025
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
justinyeh1995
pushed a commit
to justinyeh1995/ray
that referenced
this pull request
Oct 20, 2025
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
justinyeh1995
pushed a commit
to justinyeh1995/ray
that referenced
this pull request
Oct 20, 2025
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
justinyeh1995
pushed a commit
to justinyeh1995/ray
that referenced
this pull request
Oct 20, 2025
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
landscapepainter
pushed a commit
to landscapepainter/ray
that referenced
this pull request
Nov 17, 2025
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
landscapepainter
pushed a commit
to landscapepainter/ray
that referenced
this pull request
Nov 17, 2025
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
landscapepainter
pushed a commit
to landscapepainter/ray
that referenced
this pull request
Nov 17, 2025
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Future-Outlier
pushed a commit
to Future-Outlier/ray
that referenced
this pull request
Dec 7, 2025
ray-project#56352) This PR stacks on ray-project#56297. For more details about the resource isolation project see ray-project#54703. This PR wires the resource isolation config (introduced here ray-project#51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include: 1. A separate python test target for running test_resource_isolation related unit and integration tests. This unifies all cgroup related tests under one buildkite target and removes the need for `--except-tags cgroup` everywhere else. 2. Modification to the cgroup hierarchy. This was an oversight on my part. The "no internal processes" constraint says that a non-root cgroup can either have controllers enabled or have processes. Therefore, the new hierarchy looks like: ``` // base_cgroup_path (e.g. /sys/fs/cgroup) // | // ray_node_<node_id> // | | // system application // | | // leaf leaf // ``` where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints. 3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager. 4. CgroupManager is now owned by NodeManager. 5. An end-to-end integration test in `python/ray/tests/resource_isolation/test_resource_isolation_integration.py`. 6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Future-Outlier
pushed a commit
to Future-Outlier/ray
that referenced
this pull request
Dec 7, 2025
…r to move processes into system cgroup (ray-project#56446) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the following functions to move a process into the system cgroup: * CgroupManagerInterface::AddProcessToSystemCgroup * CgroupDriverInterface::AddProcessToCgroup I've also added integration tests for SysFsCgroupDriver and unit tests for CgroupManager. Let me explain how these APIs will be used. In the next PR, the raylet will * be passed a list of pids of system processes that are started before the raylet starts and need to be moved into the system cgroup (e.g. gcs_server) * call CgroupManagerInterface::AddProcessToSystemCgroup for each of these pids to move them into the system cgroup. --------- Signed-off-by: Ibrahim Rabbani <irabbani@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Future-Outlier
pushed a commit
to Future-Outlier/ray
that referenced
this pull request
Dec 7, 2025
…n startup (ray-project#56522) This PR stacks on ray-project#56352 . For more details about the resource isolation project see ray-project#54703. This PR the makes the raylet move the system processes into the system cgroup on startup if resource isolation is enabled. It introduces the following * A new raylet cli arg `--system-pids` which is a comma-separated string of pids of system processes that are started before the raylet. As of today, it contains * On the head node: gcs_server, dashboard_api_server, ray client server, monitor (autoscaler) * On every node (including head): process subreaper, log monitor. * End-to-end integration tests for resource isolation with the Ray SDK (`ray.init`) and the Ray CLI (`ray --start`) There are a few rough edges (I've added a comment on the PR where relevant): 1. The construction of ResourceIsolationConfig is spread across multiple call-sites (create the object, add the object store memory, add the system pids). The big positive of doing it this way was to fail fast on invalid user input (in scripts.py and worker.py). I think it needs to have at least two components: the user input (cgroup_path, system_reserved_memory, ...) and the derived input (system_pids, total_system_reserved_memory). 2. How to determine which processes should be moved? Right now I'm using `self.all_processes` in `node.py`. It _should_ contain all processes started so far, but there's no guarantee. 3. How intrusive should the integration test be? Should we count the number of pids inside the system cgroup? (This was answered in ray-project#56549) 4. How should a user setup multiple nodes on the same VM? I haven't written an integration test for it yet because there are multiple options for how to set this up. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR stacks on #56297.
For more details about the resource isolation project see #54703.
This PR wires the resource isolation config (introduced here #51865) from ray cli (ray start) and the ray sdk (ray.init) into the raylet. Notable changes include:
--except-tags cgroupeverywhere else.where the leaf nodes contain all processes and the system/application cgroups apply cpu.weight and memory.min constraints.
3. CgroupManager now has a move ctor/assignment operator that allows ownership and lifecycle to be managed by the NodeManager.
4. CgroupManager is now owned by NodeManager.
5. An end-to-end integration test in
python/ray/tests/resource_isolation/test_resource_isolation_integration.py.6. Moved all previous integration tests from test_ray_init and test_cli into test_resource_isolation_integration.py. These tests have TODOs to finish them up once the rest of cgroup features are implemented.