-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[Ray autoscaler v2] Can't scaler up when using autoscaler v2 #46473
Copy link
Copy link
Closed
Labels
P2Important issue, but not time-criticalImportant issue, but not time-criticalbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray Corecore-autoscalerautoscaler related issuesautoscaler related issues
Description
What happened + What you expected to happen
For the same environment, only changing the use of autoscaler v1 or v2, for a one-time submission of 8000 tasks, v1 can work normally, but v2 will always be stuck, can not be scaled up.
Also, I want to know what has made recent progress in AutoScaler V2? It seems that it has not been updated for a long time.
Versions / Dependencies
Ray 2.23.0
Kuberay 1.1.1
Reproduction script
import ray
import time
import os
import random
@ray.remote(max_retries=5, num_cpus=8)
def inside_ray_task():
sleep_time = random.randint(120, 600)
start_time = time.perf_counter()
while True:
if(time.perf_counter() - start_time < sleep_time):
time.sleep(0.001)
else:
break
@ray.remote(max_retries=0)
def outside_ray_task():
future_list = []
for i in range(8000):
future_list.append(inside_ray_task.remote())
ray.get(future_list)
if __name__ == '__main__':
ray.init("ray://localhost:10001")
ray.get(outside_ray_task.remote())
Issue Severity
High: It blocks me from completing my task.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2Important issue, but not time-criticalImportant issue, but not time-criticalbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray Corecore-autoscalerautoscaler related issuesautoscaler related issues
