[core] (cgroups 19/n) Allow fractions when getting the number of CPUs to calculate weights#57800
Conversation
cpus available on the machine. This will prevent us from rounding down when running in a container that has cpu.max set. Signed-off-by: irabbani <israbbani@gmail.com>
|
Tested on Anyscale w/ a 2 core machine. Works with default parameters now. From the logs
|
edoakes
left a comment
There was a problem hiding this comment.
add comment in followup
|
|
||
| """ | ||
| available_system_cpus = utils.get_num_cpus() | ||
| available_system_cpus = utils.get_num_cpus(truncate=False) |
There was a problem hiding this comment.
should leave a comment for why we don't truncate
) For more details about the resource isolation project see #54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from #57800. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
…-project#57776) For more details about the resource isolation project see ray-project#54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from ray-project#57800. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
…-project#57776) For more details about the resource isolation project see ray-project#54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from ray-project#57800. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: xgui <xgui@anyscale.com>
) For more details about the resource isolation project see #54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from #57800. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…-project#57776) For more details about the resource isolation project see ray-project#54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from ray-project#57800. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
…-project#57776) For more details about the resource isolation project see ray-project#54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from ray-project#57800. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…-project#57776) For more details about the resource isolation project see ray-project#54703. This PR moves the driver into the workers cgroup when it registers with the NodeManager. Also updates the tests to reflect this. This now includes changes from ray-project#57800. --------- Signed-off-by: irabbani <israbbani@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
This PR stacks on #57776.
For more details about the resource isolation project see #54703.
When Ray calculates the number of cpus available on the machine, it checks to see if it's running in a container. However, it truncates the number of cpus.
In this PR,
DEFAULT_MIN_SYSTEM_RESERVED_CPU_CORES, then raise a ValueError. Previously, this was <DEFAULT_MIN_SYSTEM_RESERVED_CPU_CORES.ray._private.utils.get_num_cpusif an optional parameter is set to True.