Skip to content

[Ray autoscaler v2] Can't scaler up when using autoscaler v2 #46473

@yx367563

Description

@yx367563

What happened + What you expected to happen

For the same environment, only changing the use of autoscaler v1 or v2, for a one-time submission of 8000 tasks, v1 can work normally, but v2 will always be stuck, can not be scaled up.
Also, I want to know what has made recent progress in AutoScaler V2? It seems that it has not been updated for a long time.

Versions / Dependencies

Ray 2.23.0
Kuberay 1.1.1

Reproduction script

import ray
import time
import os
import random

@ray.remote(max_retries=5, num_cpus=8)
def inside_ray_task():
    sleep_time = random.randint(120, 600)

    start_time = time.perf_counter()
    while True:
        if(time.perf_counter() - start_time < sleep_time):
            time.sleep(0.001)
        else:
            break
  
@ray.remote(max_retries=0)
def outside_ray_task():
    future_list = []
    for i in range(8000):
        future_list.append(inside_ray_task.remote())
    ray.get(future_list)

if __name__ == '__main__':
    ray.init("ray://localhost:10001")
    ray.get(outside_ray_task.remote())

3adc4197-8928-4f55-9bce-a332d21b3b07

Issue Severity

High: It blocks me from completing my task.

Metadata

Metadata

Assignees

Labels

P2Important issue, but not time-criticalbugSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray Corecore-autoscalerautoscaler related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions