Skip to content

[core] [docs] (cgroups 24/n) Adding public docs for the Resource Isolation #60183

Merged
edoakes merged 19 commits intomasterfrom
irabbani/ri-docs-1
Feb 4, 2026
Merged

[core] [docs] (cgroups 24/n) Adding public docs for the Resource Isolation #60183
edoakes merged 19 commits intomasterfrom
irabbani/ri-docs-1

Conversation

@israbbani
Copy link
Copy Markdown
Contributor

@israbbani israbbani commented Jan 15, 2026

For more information about the Resource Isolation project see #54703.

Adding public documentation for how to enable and use Resource Isolation for process isolation between system and user processes.

Signed-off-by: irabbani <irabbani@anyscale.com>
@israbbani israbbani added docs An issue or change related to documentation core Issues that should be addressed in Ray Core labels Jan 15, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds comprehensive documentation for the new Resource Isolation feature using cgroup v2. The documentation is well-structured and covers requirements, usage with containers and on bare metal, API references, and troubleshooting. I've found a couple of minor inconsistencies in the code examples which I've pointed out in the review comments. Once these are addressed, this will be a great addition to the Ray documentation.

irabbani added 2 commits January 15, 2026 19:58
Signed-off-by: irabbani <irabbani@anyscale.com>
up
Signed-off-by: irabbani <irabbani@anyscale.com>
@israbbani israbbani changed the title [core] [docs] Adding public docs for the Resource Isolation. [core] [docs] (cgroups 2/n) Adding public docs for the Resource Isolation Jan 15, 2026
@israbbani israbbani changed the title [core] [docs] (cgroups 2/n) Adding public docs for the Resource Isolation [core] [docs] (cgroups 20/n) Adding public docs for the Resource Isolation Jan 15, 2026
@israbbani israbbani marked this pull request as ready for review January 15, 2026 21:56
@israbbani israbbani requested a review from a team as a code owner January 15, 2026 21:56
1. System critical processes internal to Ray which are critical to node health
2. User processes that are executing remote tasks and actors

Without resource isolation, user processes can starve system processes of CPU and memory leading to node failure. Node failure can cause instability in your workload and in extreme cases lead to job failure.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as extreme as we'd hope lol

@israbbani israbbani requested a review from a team January 15, 2026 22:32
Copy link
Copy Markdown
Contributor

@Kunchd Kunchd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the usage docs for resource isolation! This should be very helpful for everyone.

I left a couple of little details, but the doc looks good overall.

Ibrahim Rabbani and others added 4 commits January 15, 2026 17:44
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Signed-off-by: irabbani <irabbani@anyscale.com>
@Kunchd Kunchd self-requested a review January 16, 2026 01:51
Copy link
Copy Markdown
Contributor

@Kunchd Kunchd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Ibrahim Rabbani added 3 commits January 16, 2026 12:54
Copy link
Copy Markdown
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a few minor comments.

We should also consider discoverability. Many times, people will not be directly searching for "resource isolation" but rather trying to solve their stability/OOM problems. So we may want to drop an example of WorkerDiedError / OOM kill errors here for SEO... and probably audit other pages to see if this should be cross-linked from anywhere. For example: https://docs.ray.io/en/latest/cluster/vms/user-guides/large-cluster-best-practices.html

Signed-off-by: irabbani <israbbani@gmail.com>
@israbbani
Copy link
Copy Markdown
Contributor Author

Looks great! Just a few minor comments.

We should also consider discoverability. Many times, people will not be directly searching for "resource isolation" but rather trying to solve their stability/OOM problems. So we may want to drop an example of WorkerDiedError / OOM kill errors here for SEO... and probably audit other pages to see if this should be cross-linked from anywhere. For example: https://docs.ray.io/en/latest/cluster/vms/user-guides/large-cluster-best-practices.html

I addressed some of this. I'll update all of our docs to x-link for SEO in a follow-up.

@edoakes edoakes enabled auto-merge (squash) February 4, 2026 01:18
edoakes pushed a commit that referenced this pull request Feb 4, 2026
… ray.init(...) (#60726)

Follow up from #60183. 

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter. 
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
@github-actions github-actions bot disabled auto-merge February 4, 2026 16:50
@israbbani
Copy link
Copy Markdown
Contributor Author

@edoakes plz merge. Unrelated CI tests were failing so I had to update branch.

@edoakes edoakes merged commit e9ab540 into master Feb 4, 2026
6 checks passed
@edoakes edoakes deleted the irabbani/ri-docs-1 branch February 4, 2026 19:03
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 6, 2026
… ray.init(...) (ray-project#60726)

Follow up from ray-project#60183.

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter.
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
… ray.init(...) (ray-project#60726)

Follow up from ray-project#60183.

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter.
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
…ation (ray-project#60183)

For more information about the Resource Isolation project see
ray-project#54703.

Adding public documentation for how to enable and use Resource Isolation
for process isolation between system and user processes.

---------

Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Signed-off-by: irabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
… ray.init(...) (ray-project#60726)


Follow up from ray-project#60183.

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter.
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
…ation (ray-project#60183)


For more information about the Resource Isolation project see
ray-project#54703.

Adding public documentation for how to enable and use Resource Isolation
for process isolation between system and user processes.

---------

Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Signed-off-by: irabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
… ray.init(...) (ray-project#60726)


Follow up from ray-project#60183.

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter.
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
…ation (ray-project#60183)


For more information about the Resource Isolation project see
ray-project#54703.

Adding public documentation for how to enable and use Resource Isolation
for process isolation between system and user processes.

---------

Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Signed-off-by: irabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
tiennguyentony pushed a commit to tiennguyentony/ray that referenced this pull request Feb 7, 2026
… ray.init(...) (ray-project#60726)

Follow up from ray-project#60183.

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter.
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: tiennguyentony <46289799+tiennguyentony@users.noreply.github.com>
Sparks0219 pushed a commit to Sparks0219/ray that referenced this pull request Feb 9, 2026
… ray.init(...) (ray-project#60726)

Follow up from ray-project#60183. 

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter. 
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
… ray.init(...) (#60726)

Follow up from #60183. 

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter. 
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
…ation (#60183)

For more information about the Resource Isolation project see
#54703.

Adding public documentation for how to enable and use Resource Isolation
for process isolation between system and user processes.

---------

Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Signed-off-by: irabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
… ray.init(...) (#60726)

Follow up from #60183. 

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter. 
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
elliot-barn pushed a commit that referenced this pull request Feb 9, 2026
…ation (#60183)

For more information about the Resource Isolation project see
#54703.

Adding public documentation for how to enable and use Resource Isolation
for process isolation between system and user processes.

---------

Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Signed-off-by: irabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
… ray.init(...) (ray-project#60726)

Follow up from ray-project#60183.

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter.
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
ans9868 pushed a commit to ans9868/ray that referenced this pull request Feb 18, 2026
…ation (ray-project#60183)

For more information about the Resource Isolation project see
ray-project#54703.

Adding public documentation for how to enable and use Resource Isolation
for process isolation between system and user processes.

---------

Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Signed-off-by: irabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Adel Nour <ans9868@nyu.edu>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
… ray.init(...) (ray-project#60726)

Follow up from ray-project#60183. 

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter. 
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Aydin-ab pushed a commit to kunling-anyscale/ray that referenced this pull request Feb 20, 2026
…ation (ray-project#60183)

For more information about the Resource Isolation project see
ray-project#54703.

Adding public documentation for how to enable and use Resource Isolation
for process isolation between system and user processes.

---------

Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Signed-off-by: irabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
… ray.init(...) (ray-project#60726)

Follow up from ray-project#60183.

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter.
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ation (ray-project#60183)

For more information about the Resource Isolation project see
ray-project#54703.

Adding public documentation for how to enable and use Resource Isolation
for process isolation between system and user processes.

---------

Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Signed-off-by: irabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
… ray.init(...) (ray-project#60726)

Follow up from ray-project#60183.

When not running inside privileged containers, the user will have to
specify a `cgroup_path`. It makes sense for this to be a part of the
public API for `ray.init(...)`.

Things I'm changing
1. Promoting `cgroup_path` to a public API parameter for `ray.init`
2. Updating tests to use that parameter.
3. Running all cgroup tests on CI for all C++ and python changes.

---------

Signed-off-by: irabbani <israbbani@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…ation (ray-project#60183)

For more information about the Resource Isolation project see
ray-project#54703.

Adding public documentation for how to enable and use Resource Isolation
for process isolation between system and user processes.

---------

Signed-off-by: irabbani <irabbani@anyscale.com>
Signed-off-by: Ibrahim Rabbani <israbbani@gmail.com>
Signed-off-by: irabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core docs An issue or change related to documentation go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants