Skip to content

[Data] Add autoscaler metrics to Data Dashboard #60342

@bveeramani

Description

@bveeramani

Task

  • Add three new charts to the Ray Data dashboard called "Cluster utilization % ({resource})" for each of CPU, GPU, and object store memory
  • (Bonus) Add a dotted line at the DEFAULT_CLUSTER_SCALING_UP_UTIL_THRESHOLD
  • The three charts should go in a new row named "Cluster autoscaler"

Motivation
User request:

I am thinking there is very limited visibility into the autoscaling decisions, currently have to look through DEBUG logs FWICT. Adding some visibility in terms of metrics and events would be nice, and promoting key action logs to INFO would be my high-level suggestion

Metadata

Metadata

Assignees

Labels

P1Issue that should be fixed within a few weeksdataRay Data-related issues

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions