[Core] Add fallback strategy scheduling logic by ryanaoleary · Pull Request #56369 · ray-project/ray

ryanaoleary · 2025-09-09T04:11:26Z

Why are these changes needed?

This PR also updates the cluster resource scheduler logic to account for the list of LabelSelectors specified by the fallback_strategy, falling back to each fallback strategy LabelSelector in-order until one is satisfied when selecting the best node. We're able to support fallback selectors by considering them in the cluster resource scheduler in-order using the existing label selector logic in IsFeasible and IsAvailable, returning the first valid node returned by GetBestSchedulableNode.

Related issue number

#51564

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

ryanaoleary · 2025-09-09T12:09:00Z

The python changes are in this separate PR: #56374 which can be merged first. This PR will then contain only the C++ scheduling logic changes. cc: @MengjinYan

MengjinYan

Haven't looked at the tests yet and there are also seems to be conflicts that need to be fixed.

src/ray/common/cgroup/cgroup_setup.cc

src/ray/raylet/scheduling/cluster_resource_scheduler.cc

jjyao · 2025-09-22T20:06:43Z

@ryanaoleary could you rebase?

python/ray/tests/test_label_utils.py

python/ray/_private/label_utils.py

python/ray/tests/test_label_scheduling.py

This PR contains only the python changes from #56369, adding `fallback_strategy` as an option to the remote decorator of Tasks/Actors. Fallback strategy consists of a list of dict of decorator options. The dict of decorator options are evaluated together, and the first satisfied strategy dict is scheduled. With this PR, the only supported option is `label_selector`. Example using `fallback_strategy` to schedule on different instance types: ``` @ray.remote( label_selector={"instance_type": "m5.16xlarge"}, fallback_strategy=[ # Fall back to selector for a "m5.large" instance type if "m5.16xlarge" # cannot be satisfied. {"label_selector": {"instance_type": "m5.large"}}, # Finally, fall back to an empty set of labels (no constraints). # neither desired m5 type can be sastisfied. {"label_selector": {}}, ], ) class A: pass ``` In the above field, first the `label_selector` field will be tried. Then, the scheduler will iterate through each dict in `fallback_strategy` and attempt to scheduling using the label selector specified there (first `{"instance_type": "m5.large"}` and then the empty set). The first satisfied `label_selector` will be scheduled. #51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…project#56374) This PR contains only the python changes from ray-project#56369, adding `fallback_strategy` as an option to the remote decorator of Tasks/Actors. Fallback strategy consists of a list of dict of decorator options. The dict of decorator options are evaluated together, and the first satisfied strategy dict is scheduled. With this PR, the only supported option is `label_selector`. Example using `fallback_strategy` to schedule on different instance types: ``` @ray.remote( label_selector={"instance_type": "m5.16xlarge"}, fallback_strategy=[ # Fall back to selector for a "m5.large" instance type if "m5.16xlarge" # cannot be satisfied. {"label_selector": {"instance_type": "m5.large"}}, # Finally, fall back to an empty set of labels (no constraints). # neither desired m5 type can be sastisfied. {"label_selector": {}}, ], ) class A: pass ``` In the above field, first the `label_selector` field will be tried. Then, the scheduler will iterate through each dict in `fallback_strategy` and attempt to scheduling using the label selector specified there (first `{"instance_type": "m5.large"}` and then the empty set). The first satisfied `label_selector` will be scheduled. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

ryanaoleary · 2025-10-09T21:29:34Z

cc: @MengjinYan rebased and fixed all the merge conflicts

…project#56374) This PR contains only the python changes from ray-project#56369, adding `fallback_strategy` as an option to the remote decorator of Tasks/Actors. Fallback strategy consists of a list of dict of decorator options. The dict of decorator options are evaluated together, and the first satisfied strategy dict is scheduled. With this PR, the only supported option is `label_selector`. Example using `fallback_strategy` to schedule on different instance types: ``` @ray.remote( label_selector={"instance_type": "m5.16xlarge"}, fallback_strategy=[ # Fall back to selector for a "m5.large" instance type if "m5.16xlarge" # cannot be satisfied. {"label_selector": {"instance_type": "m5.large"}}, # Finally, fall back to an empty set of labels (no constraints). # neither desired m5 type can be sastisfied. {"label_selector": {}}, ], ) class A: pass ``` In the above field, first the `label_selector` field will be tried. Then, the scheduler will iterate through each dict in `fallback_strategy` and attempt to scheduling using the label selector specified there (first `{"instance_type": "m5.large"}` and then the empty set). The first satisfied `label_selector` will be scheduled. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Josh Kodi <joshkodi@gmail.com>

…project#56374) This PR contains only the python changes from ray-project#56369, adding `fallback_strategy` as an option to the remote decorator of Tasks/Actors. Fallback strategy consists of a list of dict of decorator options. The dict of decorator options are evaluated together, and the first satisfied strategy dict is scheduled. With this PR, the only supported option is `label_selector`. Example using `fallback_strategy` to schedule on different instance types: ``` @ray.remote( label_selector={"instance_type": "m5.16xlarge"}, fallback_strategy=[ # Fall back to selector for a "m5.large" instance type if "m5.16xlarge" # cannot be satisfied. {"label_selector": {"instance_type": "m5.large"}}, # Finally, fall back to an empty set of labels (no constraints). # neither desired m5 type can be sastisfied. {"label_selector": {}}, ], ) class A: pass ``` In the above field, first the `label_selector` field will be tried. Then, the scheduler will iterate through each dict in `fallback_strategy` and attempt to scheduling using the label selector specified there (first `{"instance_type": "m5.large"}` and then the empty set). The first satisfied `label_selector` will be scheduled. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

python/ray/_common/ray_option_utils.py

python/ray/remote_function.py

python/ray/_raylet.pyx

…project#56374) This PR contains only the python changes from ray-project#56369, adding `fallback_strategy` as an option to the remote decorator of Tasks/Actors. Fallback strategy consists of a list of dict of decorator options. The dict of decorator options are evaluated together, and the first satisfied strategy dict is scheduled. With this PR, the only supported option is `label_selector`. Example using `fallback_strategy` to schedule on different instance types: ``` @ray.remote( label_selector={"instance_type": "m5.16xlarge"}, fallback_strategy=[ # Fall back to selector for a "m5.large" instance type if "m5.16xlarge" # cannot be satisfied. {"label_selector": {"instance_type": "m5.large"}}, # Finally, fall back to an empty set of labels (no constraints). # neither desired m5 type can be sastisfied. {"label_selector": {}}, ], ) class A: pass ``` In the above field, first the `label_selector` field will be tried. Then, the scheduler will iterate through each dict in `fallback_strategy` and attempt to scheduling using the label selector specified there (first `{"instance_type": "m5.large"}` and then the empty set). The first satisfied `label_selector` will be scheduled. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…project#56374) This PR contains only the python changes from ray-project#56369, adding `fallback_strategy` as an option to the remote decorator of Tasks/Actors. Fallback strategy consists of a list of dict of decorator options. The dict of decorator options are evaluated together, and the first satisfied strategy dict is scheduled. With this PR, the only supported option is `label_selector`. Example using `fallback_strategy` to schedule on different instance types: ``` @ray.remote( label_selector={"instance_type": "m5.16xlarge"}, fallback_strategy=[ # Fall back to selector for a "m5.large" instance type if "m5.16xlarge" # cannot be satisfied. {"label_selector": {"instance_type": "m5.large"}}, # Finally, fall back to an empty set of labels (no constraints). # neither desired m5 type can be sastisfied. {"label_selector": {}}, ], ) class A: pass ``` In the above field, first the `label_selector` field will be tried. Then, the scheduler will iterate through each dict in `fallback_strategy` and attempt to scheduling using the label selector specified there (first `{"instance_type": "m5.large"}` and then the empty set). The first satisfied `label_selector` will be scheduled. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: xgui <xgui@anyscale.com>

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Add tests and fix scheduling logic Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> remove cgroup change Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Fix merge Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

MengjinYan · 2025-10-28T00:28:48Z

cc: @edoakes @jjyao in case you'd like to take a look as well.

ryanaoleary · 2025-10-28T00:36:28Z

The CI test failure seems to be legit though:

[2025-10-27T21:40:02Z] =================================== FAILURES ===================================
--
  | [2025-10-27T21:40:02Z] ____________ test_task_scheduled_on_node_with_label_selector[True] _____________

...

[2025-10-27T21:40:02Z] E               AssertionError: Task 'task_1' has an incorrect label selector. Expected: {'region': 'in(us-east1,me-central1)'}, Got: {'region': 'in(me-central1,us-east1)'}
--
  | [2025-10-27T21:40:02Z] E               assert {'region': 'i...l1,us-east1)'} == {'region': 'i...me-central1)'}
  | [2025-10-27T21:40:02Z] E                 Differing items:
  | [2025-10-27T21:40:02Z] E                 {'region': 'in(me-central1,us-east1)'} != {'region': 'in(us-east1,me-central1)'}
  | [2025-10-27T21:40:02Z] E                 Full diff:
  | [2025-10-27T21:40:02Z] E                 - {'region': 'in(us-east1,me-central1)'}
  | [2025-10-27T21:40:02Z] E                 ?                ---------
  | [2025-10-27T21:40:02Z] E                 + {'region': 'in(me-central1,us-east1)'}
  | [2025-10-27T21:40:02Z] E                 ?                           +++++++++

Oh this is because we sort the values inside in() in ToStringMap when passing the message back from the scheduler. I can remove the sorting part and the test should pass.

MengjinYan

Just 1 minor comments on the tests

python/ray/tests/test_label_scheduling.py

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

…llbackOption Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary · 2025-10-28T07:44:01Z

+++++++++

I think we actually want the sorting since the values are stored in an unordered set, but I fixed the tests to account for this in: bdf0ef8.

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

src/ray/common/scheduling/label_selector.cc

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

MengjinYan

Thanks! Just one minor followup question on the ToProto function.

src/ray/common/scheduling/fallback_strategy.cc

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary · 2025-10-29T01:51:47Z

That makes sense. Thanks for explaining!
And if thats the case, wondering if we should also update the ToProto function in LabelSelector to be with output parameter?

Yeah I think we should, since we're using writing to the label selector field in a larger Proto and this will avoid copies. Done in
8376578 and re-tested, the CI failure seemed unrelated in test_gcs_fault_tolerance.py so I'll update the branch and try again.

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

python/ray/_raylet.pyx

MengjinYan · 2025-10-29T19:45:14Z

cc: @jjyao @edoakes

edoakes

Looks great. Only nits that can be addressed in follow up PRs.

edoakes · 2025-10-30T21:52:13Z

python/ray/tests/test_label_scheduling.py



+def test_fallback_strategy(cluster_with_labeled_nodes):
+    # Create a RayCluster with labelled nodes.


Suggested change

# Create a RayCluster with labelled nodes.

# Create a RayCluster with labeled nodes.

edoakes · 2025-10-30T21:52:32Z

python/ray/tests/test_label_scheduling.py

+    ).remote()
+
+    # Assert that the actor was scheduled on the expected node.
+    assert ray.get(label_selector_actor.get_node_id.remote(), timeout=5) == gpu_node


timeout is a little tight (CI can be slow), would loosen it

edoakes · 2025-10-30T21:53:54Z

python/ray/tests/test_label_scheduling.py

+    # Assert that the actor was scheduled on the expected node.
+    assert ray.get(label_selector_actor.get_node_id.remote(), timeout=5) in {
+        node_1,
+        node_2,
+        node_3,
+    }


consider making the test more deterministic by scheduling 3 actors in parallel, each that occupies all CPUs on each node, and asserting that all 3 nodes are occupied by one of the actors

edoakes · 2025-10-30T21:54:24Z

python/ray/tests/test_label_scheduling.py

Great tests, thanks!

edoakes · 2025-10-30T22:00:35Z

src/ray/raylet/scheduling/cluster_resource_scheduler.cc

+  std::vector<std::reference_wrapper<const LabelSelector>> label_selectors;
+  label_selectors.push_back(std::cref(lease_spec.GetLabelSelector()));


I've never seen reference_wrapper and std::cref before -- is this just a special way to have a vector of const refs and avoid copying into the vector?

Yeah exactly, std::vector<const LabelSelector&> wouldn't compile so I found those helpers. I only wanted to store the references to avoid copying unnecessarily into 1 list when we already had the original objects.

edoakes · 2025-10-30T22:12:32Z

src/ray/raylet/scheduling/cluster_resource_scheduler.cc

+        requires_object_store_memory);
+
+    // Use the label selector from the highest-priority fallback that was feasible.
+    // There must be at least one feasible node and selector.


Suggested change

// There must be at least one feasible node and selector.

// There must be at least one feasible node and selector, else we would have returned early above.

ryanaoleary · 2025-10-30T22:30:01Z

Looks great. Only nits that can be addressed in follow up PRs.

Nice thank you, I'll fix the typo/comment and update the tests in a follow-up PR.

This PR also updates the cluster resource scheduler logic to account for the list of `LabelSelector`s specified by the `fallback_strategy`, falling back to each fallback strategy `LabelSelector` in-order until one is satisfied when selecting the best node. We're able to support fallback selectors by considering them in the cluster resource scheduler in-order using the existing label selector logic in `IsFeasible` and `IsAvailable`, returning the first valid node returned by `GetBestSchedulableNode`. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>

…project#56374) This PR contains only the python changes from ray-project#56369, adding `fallback_strategy` as an option to the remote decorator of Tasks/Actors. Fallback strategy consists of a list of dict of decorator options. The dict of decorator options are evaluated together, and the first satisfied strategy dict is scheduled. With this PR, the only supported option is `label_selector`. Example using `fallback_strategy` to schedule on different instance types: ``` @ray.remote( label_selector={"instance_type": "m5.16xlarge"}, fallback_strategy=[ # Fall back to selector for a "m5.large" instance type if "m5.16xlarge" # cannot be satisfied. {"label_selector": {"instance_type": "m5.large"}}, # Finally, fall back to an empty set of labels (no constraints). # neither desired m5 type can be sastisfied. {"label_selector": {}}, ], ) class A: pass ``` In the above field, first the `label_selector` field will be tried. Then, the scheduler will iterate through each dict in `fallback_strategy` and attempt to scheduling using the label selector specified there (first `{"instance_type": "m5.large"}` and then the empty set). The first satisfied `label_selector` will be scheduled. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

This PR also updates the cluster resource scheduler logic to account for the list of `LabelSelector`s specified by the `fallback_strategy`, falling back to each fallback strategy `LabelSelector` in-order until one is satisfied when selecting the best node. We're able to support fallback selectors by considering them in the cluster resource scheduler in-order using the existing label selector logic in `IsFeasible` and `IsAvailable`, returning the first valid node returned by `GetBestSchedulableNode`. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>

…project#56374) This PR contains only the python changes from ray-project#56369, adding `fallback_strategy` as an option to the remote decorator of Tasks/Actors. Fallback strategy consists of a list of dict of decorator options. The dict of decorator options are evaluated together, and the first satisfied strategy dict is scheduled. With this PR, the only supported option is `label_selector`. Example using `fallback_strategy` to schedule on different instance types: ``` @ray.remote( label_selector={"instance_type": "m5.16xlarge"}, fallback_strategy=[ # Fall back to selector for a "m5.large" instance type if "m5.16xlarge" # cannot be satisfied. {"label_selector": {"instance_type": "m5.large"}}, # Finally, fall back to an empty set of labels (no constraints). # neither desired m5 type can be sastisfied. {"label_selector": {}}, ], ) class A: pass ``` In the above field, first the `label_selector` field will be tried. Then, the scheduler will iterate through each dict in `fallback_strategy` and attempt to scheduling using the label selector specified there (first `{"instance_type": "m5.large"}` and then the empty set). The first satisfied `label_selector` will be scheduled. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

This PR also updates the cluster resource scheduler logic to account for the list of `LabelSelector`s specified by the `fallback_strategy`, falling back to each fallback strategy `LabelSelector` in-order until one is satisfied when selecting the best node. We're able to support fallback selectors by considering them in the cluster resource scheduler in-order using the existing label selector logic in `IsFeasible` and `IsAvailable`, returning the first valid node returned by `GetBestSchedulableNode`. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

…project#56374) This PR contains only the python changes from ray-project#56369, adding `fallback_strategy` as an option to the remote decorator of Tasks/Actors. Fallback strategy consists of a list of dict of decorator options. The dict of decorator options are evaluated together, and the first satisfied strategy dict is scheduled. With this PR, the only supported option is `label_selector`. Example using `fallback_strategy` to schedule on different instance types: ``` @ray.remote( label_selector={"instance_type": "m5.16xlarge"}, fallback_strategy=[ # Fall back to selector for a "m5.large" instance type if "m5.16xlarge" # cannot be satisfied. {"label_selector": {"instance_type": "m5.large"}}, # Finally, fall back to an empty set of labels (no constraints). # neither desired m5 type can be sastisfied. {"label_selector": {}}, ], ) class A: pass ``` In the above field, first the `label_selector` field will be tried. Then, the scheduler will iterate through each dict in `fallback_strategy` and attempt to scheduling using the label selector specified there (first `{"instance_type": "m5.large"}` and then the empty set). The first satisfied `label_selector` will be scheduled. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>

This PR also updates the cluster resource scheduler logic to account for the list of `LabelSelector`s specified by the `fallback_strategy`, falling back to each fallback strategy `LabelSelector` in-order until one is satisfied when selecting the best node. We're able to support fallback selectors by considering them in the cluster resource scheduler in-order using the existing label selector logic in `IsFeasible` and `IsAvailable`, returning the first valid node returned by `GetBestSchedulableNode`. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>

This PR also updates the cluster resource scheduler logic to account for the list of `LabelSelector`s specified by the `fallback_strategy`, falling back to each fallback strategy `LabelSelector` in-order until one is satisfied when selecting the best node. We're able to support fallback selectors by considering them in the cluster resource scheduler in-order using the existing label selector logic in `IsFeasible` and `IsAvailable`, returning the first valid node returned by `GetBestSchedulableNode`. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

ryanaoleary marked this pull request as ready for review September 9, 2025 12:04

ryanaoleary requested review from a team, edoakes and jjyao as code owners September 9, 2025 12:04

ryanaoleary mentioned this pull request Sep 9, 2025

[Core] Add fallback_strategy API to Task/Actor remote options #56374

Merged

8 tasks

ryanaoleary changed the title ~~[Core] Add fallback strategy API and scheduling logic~~ [Core] Add fallback strategy scheduling logic Sep 9, 2025

ryanaoleary mentioned this pull request Aug 7, 2025

[Core] Ray Label Selector API Implementation Tracker #51564

Open

36 tasks

ray-gardener bot added core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Sep 9, 2025

edoakes assigned MengjinYan Sep 9, 2025

MengjinYan reviewed Sep 16, 2025

View reviewed changes

src/ray/common/cgroup/cgroup_setup.cc Outdated Show resolved Hide resolved

src/ray/raylet/scheduling/cluster_resource_scheduler.cc Show resolved Hide resolved

src/ray/raylet/scheduling/cluster_resource_scheduler.cc Outdated Show resolved Hide resolved

ryanaoleary force-pushed the fallback-selector-api branch from 8c94ede to 5f35328 Compare September 23, 2025 05:29

This comment was marked as outdated.

Sign in to view

MengjinYan added the go add ONLY when ready to merge, run all tests label Sep 24, 2025

MengjinYan reviewed Sep 24, 2025

View reviewed changes

python/ray/tests/test_label_utils.py Show resolved Hide resolved

python/ray/_private/label_utils.py Outdated Show resolved Hide resolved

python/ray/tests/test_label_scheduling.py Outdated Show resolved Hide resolved

edoakes removed the community-contribution Contributed by the community label Sep 26, 2025

ryanaoleary requested a review from MengjinYan October 9, 2025 21:23

ryanaoleary force-pushed the fallback-selector-api branch from ed07570 to 7a0f94b Compare October 9, 2025 21:24

MengjinYan reviewed Oct 15, 2025

View reviewed changes

python/ray/_common/ray_option_utils.py Outdated Show resolved Hide resolved

python/ray/remote_function.py Outdated Show resolved Hide resolved

python/ray/_raylet.pyx Outdated Show resolved Hide resolved

python/ray/_raylet.pyx Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

MengjinYan reviewed Oct 28, 2025

View reviewed changes

python/ray/tests/test_label_scheduling.py Show resolved Hide resolved

ryanaoleary added 2 commits October 28, 2025 04:12

Fix label selector ref

514d27e

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Add more tests cases, fix SerializeFallbackStrategy, and rename to Fa…

fae8440

…llbackOption Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary requested a review from MengjinYan October 28, 2025 07:46

add back sorting for deterministic tostringmap output

bdf0ef8

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

cursor bot reviewed Oct 28, 2025

View reviewed changes

src/ray/common/scheduling/label_selector.cc Show resolved Hide resolved

Use sorted values

7e9e494

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

MengjinYan approved these changes Oct 28, 2025

View reviewed changes

src/ray/common/scheduling/fallback_strategy.cc Outdated Show resolved Hide resolved

Change label selector proto to directly write to object, and add test

8376578

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary and others added 4 commits October 29, 2025 01:51

Merge branch 'master' into fallback-selector-api

697db26

Fix e2e test

b56484e

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Fix test incorrectly creating infeasible nodes

bc739a6

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Merge branch 'master' into fallback-selector-api

58548f3

cursor bot reviewed Oct 29, 2025

View reviewed changes

python/ray/_raylet.pyx Show resolved Hide resolved

edoakes approved these changes Oct 30, 2025

View reviewed changes

edoakes merged commit ed49a53 into ray-project:master Oct 30, 2025
6 checks passed



		def test_fallback_strategy(cluster_with_labeled_nodes):
		# Create a RayCluster with labelled nodes.

	# Create a RayCluster with labelled nodes.
	# Create a RayCluster with labeled nodes.

		std::vector<std::reference_wrapper<const LabelSelector>> label_selectors;
		label_selectors.push_back(std::cref(lease_spec.GetLabelSelector()));

	// There must be at least one feasible node and selector.
	// There must be at least one feasible node and selector, else we would have returned early above.

Conversation

ryanaoleary commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

ryanaoleary commented Sep 9, 2025

Uh oh!

MengjinYan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjyao commented Sep 22, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryanaoleary commented Oct 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

MengjinYan commented Oct 28, 2025

Uh oh!

ryanaoleary commented Oct 28, 2025

Uh oh!

MengjinYan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryanaoleary commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MengjinYan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryanaoleary commented Oct 29, 2025

Uh oh!

Uh oh!

MengjinYan commented Oct 29, 2025

Uh oh!

edoakes left a comment

Choose a reason for hiding this comment

Uh oh!

edoakes Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

edoakes Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

edoakes Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

edoakes Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

edoakes Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

ryanaoleary Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

edoakes Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryanaoleary commented Oct 30, 2025

Uh oh!

ryanaoleary commented Sep 9, 2025 •

edited

Loading

MengjinYan left a comment •

edited

Loading

MengjinYan left a comment •

edited

Loading

ryanaoleary commented Oct 28, 2025 •

edited

Loading