Skip to content

[Kubernetes][Operator][Autoscaler] Operator fails to autoscale when running on OpenShift Kubernetes #13569

@DmitriGekhtman

Description

@DmitriGekhtman

What is the problem?

commands.get_or_create_head_node is on the K8s operator's code path.
This method tries to copy a cluster config to the Ray head node, which is unnecessary for the operator.
The copy fails when running the operator on OpenShift which prevents the operator from working.
See #13567 (comment).

The solution is to modify commands.get_or_create_head_node to allow the operator to circumvent the violating rsync command.

Ray version and other system information (Python version, TensorFlow version, OS):

Reproduction (REQUIRED)

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

Metadata

Metadata

Labels

P2Important issue, but not time-criticalbugSomething that is supposed to be working; but isn'tinfraautoscaler, ray client, kuberay, related issues

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions