Skip to content

[kubernetes][operator][hotfix] Dictionary fix#13663

Merged
edoakes merged 1 commit intoray-project:masterfrom
DmitriGekhtman:k8s-operator-dict-fix
Jan 25, 2021
Merged

[kubernetes][operator][hotfix] Dictionary fix#13663
edoakes merged 1 commit intoray-project:masterfrom
DmitriGekhtman:k8s-operator-dict-fix

Conversation

@DmitriGekhtman
Copy link
Copy Markdown
Contributor

@DmitriGekhtman DmitriGekhtman commented Jan 24, 2021

Why are these changes needed?

#13623 causes a key error when launching the operator --
Error comes from the fact that the head pod's config doesn't have the optional minWorkers.
(Don't know how I missed this before -- probably screwed up tests when checking #13623.)

This PR fixes the problem by iterating only over the keys belong to both of the relevant dictionaries.
A key error from the translate method is no longer possible.

Related issue number

Closes #13667 .

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

I'm pretty sure I ran the test right this time --
built an image incorporating the changes,
ran a container based on the image and checked that the code changes were present in the ray installed there, ran the operator unit test,
checked that the operator pod was using the right image.

@jacobhjkim
Copy link
Copy Markdown

I assume this PR will resolve #13667

@jacobhjkim
Copy link
Copy Markdown

jacobhjkim commented Jan 24, 2021

Instead of the operator panicking if there is a Python exception, shouldn't the operator update status of cluster.ray.io/v1 to FAILED or sth? (I am not sure if this is the right place to discuss this issue.)

@DmitriGekhtman
Copy link
Copy Markdown
Contributor Author

@jacobhjkim

I assume this PR will resolve #13667

Yes -- updated this PR's description to reflect that.

Instead of the operator panicking if there is a Python exception, shouldn't the operator update status of cluster.ray.io/v1 to FAILED or sth? (I am not sure if this is the right place to discuss this issue.)

You're right, thanks. I'll open an issue for that.

@edoakes edoakes merged commit 7920911 into ray-project:master Jan 25, 2021
@DmitriGekhtman DmitriGekhtman deleted the k8s-operator-dict-fix branch January 25, 2021 19:19
fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[autoscaler][kubernetes] Example Cluster Config Returns KeyError

3 participants