[Core] Update V2 Autoscaler to support scheduling using Node labels and LabelSelector API#53578
Conversation
…pute_score Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
|
@MengjinYan I ended up implementing the logic to pass labels from GCS to the autoscaler and consider labels in the autoscaler when scheduling in the same PR because it made it easier to test the change. |
can-anyscale
left a comment
There was a problem hiding this comment.
some tests are failing, could be related @ryanaoleary
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
|
@rueian Can you take a look at the PR as well from autoscaler's perspective to see if we missed anything? Thanks! |
There was a problem hiding this comment.
Nit: I think we should also change the _sort_resource_request logic in _try_schedule to add labels to the sorting mechanism, as there can resource requests with exact same resource requirements but different label selectors
There was a problem hiding this comment.
Sounds good to me, done in 8cb1816. I had it sort by the length of the constraints of the first selector, similar to how we sort based on the length of the placement constraints.
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
| // raylet knows about. | ||
| int64 backlog_size = 4; | ||
| // The label selector constraints for this Resource shape on a node. | ||
| repeated LabelSelector label_selectors = 5; |
There was a problem hiding this comment.
There is already a repeated LabelSelectorConstraint field in the LabelSelector. Why do we need repeated here?
There was a problem hiding this comment.
The LabelSelectorConstraints are the conditions associated with one label selector with AND semantics (i.e. we require a node to have labels accelerator-type="TPU" and market-type="SPOT" if those two respective constraints are specified). The repeated label selector is so that we can support fallback with label selectors (i.e. the user specifies two separate label selectors with their own constraint(s), and we fallback to the second one if the first is unsatisfiable).
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
dayshah
left a comment
There was a problem hiding this comment.
generally just minor nits, not really blockers
src/ray/gcs/gcs_server/test/gcs_autoscaler_state_manager_test.cc
Outdated
Show resolved
Hide resolved
|
|
||
| namespace std { | ||
| template <> | ||
| struct hash<ray::LabelSelector> { |
There was a problem hiding this comment.
are we using std sets/maps with this somewhere instead of absl ones. Would prefer sticking to absl and just defining the absl hash functions
There was a problem hiding this comment.
We are using std::hash in task_spec.h here for the scheduling class descriptor. I think since we're using absl::flat_hash_map in that class I can just go ahead and switch it all to use absl hash functions instead.
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
…nd LabelSelector API (ray-project#53578) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Krishna Kalyan <krishnakalyan3@gmail.com>
…nd LabelSelector API (ray-project#53578) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: avigyabb <avigyabb@stanford.edu>
…nd LabelSelector API (ray-project#53578) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: avigyabb <avigyabb@stanford.edu>
…nd LabelSelector API (#53578) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Kamil Kaczmarek <kamil@anyscale.com>
…nd LabelSelector API (ray-project#53578) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
…nd LabelSelector API (ray-project#53578) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Rueian <rueiancsie@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
Why are these changes needed?
This PR updates the GCS state manager to append the labels set on a Node and retrieved from GCS to the
dynamic_node_labelson the NodeState passed to the autoscaler. This PR also updates the V2 autoscaler to consider labels when calling_compute_scoreto schedule aResourceRequeston a Node, assigning a higher score to a node type with labels that satisfy a LabelSelector.Related issue number
#51564
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.