[Autoscaler] Add bundle_label_selector to request_resources sdk#54843
[Autoscaler] Add bundle_label_selector to request_resources sdk#54843jjyao merged 15 commits intoray-project:masterfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
cc: @MengjinYan |
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
Tons of merge conflicts @ryanaoleary |
83d8bcc to
a0199b0
Compare
|
Rebased and re-tested, should all be fixed now cc: @edoakes @MengjinYan |
|
ping @MengjinYan |
|
You need to rebase now. |
912f181 to
a88a6a2
Compare
7d0e84f to
933c0a0
Compare
|
cc: @MengjinYan fixed merge conflicts and other comments |
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
There was a problem hiding this comment.
Bug: Autoscaler v2 Missing Label Selectors
When autoscaler v2 is enabled, the request_resources function calls request_cluster_resources without passing the bundle_label_selectors. This means the v2 autoscaler won't receive label selector information, potentially causing a TypeError or incorrect resource provisioning.
python/ray/autoscaler/_private/commands.py#L234-L236
ray/python/ray/autoscaler/_private/commands.py
Lines 234 to 236 in 11ef4ac
Invalid comment, the label selectors are part of |
|
@jjyao Ping to merge |
| Each bundle is a dict of resource name to resource quantity, e.g: | ||
| [{"CPU": 1}, {"GPU": 1}]. | ||
| to_request: A list of resource requests to request the cluster to have. | ||
| Each resource request is a tuple of resources and a label_selector |
There was a problem hiding this comment.
Each resource request is a tuple
It's a dict not tuple?
There was a problem hiding this comment.
I think it was previously a dict, but we changed it to be a tuple of two dicts (resources and labels) here:
ray/python/ray/autoscaler/v2/sdk.py
Line 17 in c6b8c9f
…project#54843) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>
…project#54843) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…project#54843) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com>
…project#54843) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…project#54843) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Why are these changes needed?
This change adds a
bundle_label_selectorargument to therequest_resourcessdk command for the v2 Ray autoscaler. This command is used by several Ray libraries. Thebundle_label_selectoris a parallel list to thebundlesof resource shapes specified by the user and are applied per-bundle. These label selectors are passed to therepeated LabelSelector label_selectorsfield in theResourceRequestmessage that gets built byRequestClusterResourceConstraint.This change depends on #53578.
Related issue number
Contributes to #51564
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.Note
Adds per-bundle label selectors to resource requests and wires them through Python SDK, autoscaler v2, Cython GCS client, and GCS RPC, with tests.
python/ray/autoscaler/sdk/sdk.py: Extendrequest_resourceswithbundle_label_selectors; validate types/length and label syntax; update docs/examples.python/ray/autoscaler/_private/commands.py: Emit requests as{"resources": ..., "label_selector": ...}; support selectors per bundle; forward to v2 SDK.python/ray/autoscaler/v2/sdk.py: Accept per-bundlelabel_selector, normalize legacy format, aggregate by(resources, selector), and forward selectors to GCS.python/ray/includes/common.pxd,python/ray/includes/gcs_client.pxi: Addlabel_selectorsparam torequest_cluster_resource_constraint.src/ray/gcs_rpc_client/accessor.{h,cc}: Method signature updated to take selectors; build protoLabelSelectorper bundle.src/ray/common/scheduling/label_selector.{h,cc}: Add generic ctor from map and keepToProto; hashing/equality unchanged.python/ray/autoscaler/v2/tests/test_sdk.py: New test verifying per-bundle label selectors and operators; importsLabelSelectorOperator.python/ray/tests/test_multi_node_2.py: Adjust expectedresource_requestsformat to includeresourcesand emptylabel_selector.Written by Cursor Bugbot for commit 4409f1d. This will update automatically on new commits. Configure here.