Skip to content

[Autoscaler] Unmanaged node off-by-one error #11430

@wuisawesome

Description

@wuisawesome

What is the problem?

Consider a cluster with utilization_fraction: 1.0 and an unmanaged node.

target_num_workers() == 1 because we receive load metrics from the unmanaged node.

But num_workers = self.workers() + num_pending, and self.workers() already filters out the unmanaged node.

Ray version and other system information (Python version, TensorFlow version, OS):

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

If we cannot run your script, we cannot fix your issue.

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

Metadata

Metadata

Assignees

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn't

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions