[Autoscaler][V2] Consider bundle_label_selector in Ray V2 Autoscaler#56826
[Autoscaler][V2] Consider bundle_label_selector in Ray V2 Autoscaler#56826edoakes merged 11 commits intoray-project:masterfrom
bundle_label_selector in Ray V2 Autoscaler#56826Conversation
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
cc: @MengjinYan as discussed offline |
MengjinYan
left a comment
There was a problem hiding this comment.
Thanks for the followup!!
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
MengjinYan
left a comment
There was a problem hiding this comment.
Just minor and nit comments.
| return len(status.active_nodes) == expected_nodes | ||
|
|
||
| wait_for_condition(all_nodes_launched, timeout=30) | ||
| proc.wait(timeout=30) |
There was a problem hiding this comment.
Curious why we need to wait the tasks to be done?
Just to want to make sure there is no unnecessary wait in the test cases.
There was a problem hiding this comment.
I think it's necessary or you can run into a race condition where the tasks/nodes have "launched' but the task["node_id"] is still None. When I removed the proc.wait this test became flaky with this issue, but with the wait it passes consistently.
|
From testing with this PR I realize now that |
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
We probably should add it also to the |
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
bundle_label_selectorbundle_label_selector in Ray V2 Autoscaler
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
|
Manual testing process: RayCluster CR: Setup: Task test: Actor test: Placement Group Test: |
ray-project#56826) This PR adds support to parse the `GangResourceRequest.bundle_selectors.resource_requests` field for gang resource requests in the V2 Autoscaler. This proto field replaces the deprecated `GangResourceRequest.resource_requests` ([definition](https://github.com/ray-project/ray/blob/3408fe94a687e0ed03f6861ab8f9e8708a68763a/src/ray/protobuf/autoscaler.proto#L85)) in order to support repeated selectors for fallback strategy. This change is required for autoscaling to work with the `bundle_label_selector` placement group option. This PR also adds an e2e test case for scaling up a placement group with `bundle_label_selector` specified. This tests verifies the behavior that the v2 scheduler will scale nodes satisfying the given label constraints, preferring nodes with the required `labels` over node types with sufficient resources, but lacking those labels. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>
ray-project#56826) This PR adds support to parse the `GangResourceRequest.bundle_selectors.resource_requests` field for gang resource requests in the V2 Autoscaler. This proto field replaces the deprecated `GangResourceRequest.resource_requests` ([definition](https://github.com/ray-project/ray/blob/3408fe94a687e0ed03f6861ab8f9e8708a68763a/src/ray/protobuf/autoscaler.proto#L85)) in order to support repeated selectors for fallback strategy. This change is required for autoscaling to work with the `bundle_label_selector` placement group option. This PR also adds an e2e test case for scaling up a placement group with `bundle_label_selector` specified. This tests verifies the behavior that the v2 scheduler will scale nodes satisfying the given label constraints, preferring nodes with the required `labels` over node types with sufficient resources, but lacking those labels. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: Josh Kodi <joshkodi@gmail.com>
ray-project#56826) This PR adds support to parse the `GangResourceRequest.bundle_selectors.resource_requests` field for gang resource requests in the V2 Autoscaler. This proto field replaces the deprecated `GangResourceRequest.resource_requests` ([definition](https://github.com/ray-project/ray/blob/3408fe94a687e0ed03f6861ab8f9e8708a68763a/src/ray/protobuf/autoscaler.proto#L85)) in order to support repeated selectors for fallback strategy. This change is required for autoscaling to work with the `bundle_label_selector` placement group option. This PR also adds an e2e test case for scaling up a placement group with `bundle_label_selector` specified. This tests verifies the behavior that the v2 scheduler will scale nodes satisfying the given label constraints, preferring nodes with the required `labels` over node types with sufficient resources, but lacking those labels. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>
ray-project#56826) This PR adds support to parse the `GangResourceRequest.bundle_selectors.resource_requests` field for gang resource requests in the V2 Autoscaler. This proto field replaces the deprecated `GangResourceRequest.resource_requests` ([definition](https://github.com/ray-project/ray/blob/3408fe94a687e0ed03f6861ab8f9e8708a68763a/src/ray/protobuf/autoscaler.proto#L85)) in order to support repeated selectors for fallback strategy. This change is required for autoscaling to work with the `bundle_label_selector` placement group option. This PR also adds an e2e test case for scaling up a placement group with `bundle_label_selector` specified. This tests verifies the behavior that the v2 scheduler will scale nodes satisfying the given label constraints, preferring nodes with the required `labels` over node types with sufficient resources, but lacking those labels. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>
ray-project#56826) This PR adds support to parse the `GangResourceRequest.bundle_selectors.resource_requests` field for gang resource requests in the V2 Autoscaler. This proto field replaces the deprecated `GangResourceRequest.resource_requests` ([definition](https://github.com/ray-project/ray/blob/3408fe94a687e0ed03f6861ab8f9e8708a68763a/src/ray/protobuf/autoscaler.proto#L85)) in order to support repeated selectors for fallback strategy. This change is required for autoscaling to work with the `bundle_label_selector` placement group option. This PR also adds an e2e test case for scaling up a placement group with `bundle_label_selector` specified. This tests verifies the behavior that the v2 scheduler will scale nodes satisfying the given label constraints, preferring nodes with the required `labels` over node types with sufficient resources, but lacking those labels. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: xgui <xgui@anyscale.com>
ray-project#56826) This PR adds support to parse the `GangResourceRequest.bundle_selectors.resource_requests` field for gang resource requests in the V2 Autoscaler. This proto field replaces the deprecated `GangResourceRequest.resource_requests` ([definition](https://github.com/ray-project/ray/blob/3408fe94a687e0ed03f6861ab8f9e8708a68763a/src/ray/protobuf/autoscaler.proto#L85)) in order to support repeated selectors for fallback strategy. This change is required for autoscaling to work with the `bundle_label_selector` placement group option. This PR also adds an e2e test case for scaling up a placement group with `bundle_label_selector` specified. This tests verifies the behavior that the v2 scheduler will scale nodes satisfying the given label constraints, preferring nodes with the required `labels` over node types with sufficient resources, but lacking those labels. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>
ray-project#56826) This PR adds support to parse the `GangResourceRequest.bundle_selectors.resource_requests` field for gang resource requests in the V2 Autoscaler. This proto field replaces the deprecated `GangResourceRequest.resource_requests` ([definition](https://github.com/ray-project/ray/blob/3408fe94a687e0ed03f6861ab8f9e8708a68763a/src/ray/protobuf/autoscaler.proto#L85)) in order to support repeated selectors for fallback strategy. This change is required for autoscaling to work with the `bundle_label_selector` placement group option. This PR also adds an e2e test case for scaling up a placement group with `bundle_label_selector` specified. This tests verifies the behavior that the v2 scheduler will scale nodes satisfying the given label constraints, preferring nodes with the required `labels` over node types with sufficient resources, but lacking those labels. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
ray-project#56826) This PR adds support to parse the `GangResourceRequest.bundle_selectors.resource_requests` field for gang resource requests in the V2 Autoscaler. This proto field replaces the deprecated `GangResourceRequest.resource_requests` ([definition](https://github.com/ray-project/ray/blob/3408fe94a687e0ed03f6861ab8f9e8708a68763a/src/ray/protobuf/autoscaler.proto#L85)) in order to support repeated selectors for fallback strategy. This change is required for autoscaling to work with the `bundle_label_selector` placement group option. This PR also adds an e2e test case for scaling up a placement group with `bundle_label_selector` specified. This tests verifies the behavior that the v2 scheduler will scale nodes satisfying the given label constraints, preferring nodes with the required `labels` over node types with sufficient resources, but lacking those labels. ray-project#51564 --------- Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Why are these changes needed?
This PR adds support to parse the
GangResourceRequest.bundle_selectors.resource_requestsfield for gang resource requests in the V2 Autoscaler. This proto field replaces the deprecatedGangResourceRequest.resource_requests(definition) in order to support repeated selectors for fallback strategy. This change is required for autoscaling to work with thebundle_label_selectorplacement group option.This PR also adds an e2e test case for scaling up a placement group with
bundle_label_selectorspecified. This tests verifies the behavior that the v2 scheduler will scale nodes satisfying the given label constraints, preferring nodes with the requiredlabelsover node types with sufficient resources, but lacking those labels.Related issue number
Contributes to #51564
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.Note
Adds and validates e2e scheduling tests for label-based placement of tasks, actors, and placement groups (bundle_label_selector), including a helper for selector matching.
test_task_scheduled_on_node_with_label_selector: submits tasks with differentlabel_selectors via.options(...), waits for autoscaled nodes, and verifies each task runs on a node whoseLabelssatisfy the selector.test_actor_scheduled_on_node_with_label_selector: submits actors withlabel_selectors, verifies expected node types scale up and that each actor is placed on a matching-labeled node.test_pg_scheduled_on_node_with_bundle_label_selector: creates a placement group with per-bundlebundle_label_selector, ensures correct node types scale and validates bundle-to-node label matching viaplacement_group_table._verify_node_labels_for_selector(...)to assert nodeLabelssatisfyin(...),!in(...), equality, and negation forms.list_actorsandplacement_group_tablefor new validations.Written by Cursor Bugbot for commit 48cbf73. This will update automatically on new commits. Configure here.