Skip to content

[Kubernetes] Unit test for cluster launch and teardown using K8s Operator#13437

Merged
edoakes merged 6 commits intoray-project:masterfrom
DmitriGekhtman:dmitri/k8s-operator-example-test
Jan 21, 2021
Merged

[Kubernetes] Unit test for cluster launch and teardown using K8s Operator#13437
edoakes merged 6 commits intoray-project:masterfrom
DmitriGekhtman:dmitri/k8s-operator-example-test

Conversation

@DmitriGekhtman
Copy link
Copy Markdown
Contributor

@DmitriGekhtman DmitriGekhtman commented Jan 14, 2021

Why are these changes needed?

Adds a unit test for cluster launch and teardown using the K8s Operator.

Also, updates the size of K8s tests in tests/BUILD from small to medium.
If I understand correctly, the size determines a timeout. These test could potentially take longer than a minute.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Test passes locally using minikube with Kubernetes server version v1.19.0 and 1.20.1.

@DmitriGekhtman DmitriGekhtman force-pushed the dmitri/k8s-operator-example-test branch from b709a3e to 297677b Compare January 19, 2021 20:42
"| grep ^example-cluster: | tail -n 100"
log_tail = subprocess.check_output(cmd, shell=True).decode()
return ("head-node" in log_tail) and ("worker-nodes" in log_tail)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to simplify this a bit to avoid running issues with monitor log format changes.
This plus the pod quantity checks should be sufficient.

# Retry 60 times with 1 second delay between attempts.
def f_with_retries(*args, **kwargs):
for _ in range(60):
if f(*args, **kwargs):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended the timeout from 30 retries to 60 to get it to consistently pass on minikube.

@AmeerHajAli AmeerHajAli added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jan 20, 2021
@AmeerHajAli
Copy link
Copy Markdown
Contributor

@edoakes , can you please merge this?

@edoakes edoakes merged commit 87ca102 into ray-project:master Jan 21, 2021
fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
@AmeerHajAli AmeerHajAli added this to the Serverless Autoscaling milestone Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests-ok The tagger certifies test failures are unrelated and assumes personal liability.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants