Add "ray status" debug tool for autoscaler.#9091
Conversation
|
Can one of the admins verify this patch? |
| def status(address): | ||
| """Print cluster status, including autoscaling info.""" | ||
| if not address: | ||
| address = services.find_redis_address_or_die() | ||
| logger.info("Connecting to Ray instance at {}.".format(address)) | ||
| ray.init(address=address) | ||
| print(debug_status()) |
There was a problem hiding this comment.
Hey, wouldn't this be more useful if it were used like:
ray status [cluster_yaml], similar to ray dashboard cluster.yaml?
There was a problem hiding this comment.
There are a number of ray commands that are intended to run on the currently active cluster such as ray memory, etc. This is consistent with those.
|
Looks very useful, can you post a print output? |
|
Test FAILed. |
|
Test FAILed. |
|
The output looks like this: |
| exec_cluster(config_file, "ray stop", False, False, False, False, | ||
| False, override_cluster_name, None, False) | ||
| except Exception: | ||
| logger.exception("Ignoring error attempting a clean shutdown.") |
There was a problem hiding this comment.
@ijrsvt otherwise teardown fails if ray is misconfigured
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
Why are these changes needed?
This adds a "ray status" cluster debug tool for the autoscaler. It can be used to inspect the current autoscaling state instead of trying to read cluster logs.
Also, some minor improvements to autoscaler logging.
Related issue number
Checks
scripts/format.shto lint the changes in this PR.