Skip to content

[autoscaler][kubernetes] Add ability to not copy cluster config to head node when calling create_or_update_head_node.#13720

Merged
ericl merged 7 commits intoray-project:masterfrom
DmitriGekhtman:no-monitor-no-bootstrap-config
Feb 4, 2021
Merged

[autoscaler][kubernetes] Add ability to not copy cluster config to head node when calling create_or_update_head_node.#13720
ericl merged 7 commits intoray-project:masterfrom
DmitriGekhtman:no-monitor-no-bootstrap-config

Conversation

@DmitriGekhtman
Copy link
Copy Markdown
Contributor

@DmitriGekhtman DmitriGekhtman commented Jan 26, 2021

Why are these changes needed?

In situations where we don't run the monitor on the head node, there's no point in copying the cluster launching config onto the head node.
The copying operation has lead to issues with the K8s operator (#13569).

This PR makes the following changes:

  • Adds a no_monitor_on_head_node boolean argument to commands.create_or_update_or_cluster.
    When set to True, the cluster launching config is not copied to the head node.

  • Updates the K8s operator is to use the new flag.

  • Updates the K8s operator example configs to use the --no-monitor flag (Add ability to not start Monitor when calling ray start #13505) in the head ray start commands.

  • To clean up the code a bit, I hid the config-copying logic behind a function call.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.

  • I've included any doc changes needed for https://docs.ray.io/en/master/.

  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/

  • Testing Strategy

    • Unit tests
    • Release tests
    • This PR is not tested :(

    Tested manually by running the unit tests for K8s operator and Ray on K8s cluster launcher.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add type annotation and function level description?
Any idea for a better name since we already have provider.prepare_for_head_node?

@DmitriGekhtman
Copy link
Copy Markdown
Contributor Author

Hmm, yyeah, I guess these changes are useful only for the Kubernetes operator.

Copy link
Copy Markdown
Contributor

@dmatch01 dmatch01 Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to clarify the expected logic for this line in the K8s Operator use case where I assume no_monitor_on_head=true:

The warning will not logged if "autoscaling-config" is not inray_start_cmd. Is that the correct/intended interpretation?

In addition, all the use cases where no_monitor_on_head=false:

The warning will be logged whether or not "autoscaling-config" is inray_start_cmd?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no_monitor_on_head is true, the if condition becomes
not (A or True) = False,
so the warning isn’t logged.

The readability here could be improved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. And I see you improved read ability as well in the PR update. Thank you!

@AmeerHajAli AmeerHajAli added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 27, 2021
@DmitriGekhtman DmitriGekhtman removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 29, 2021
@ericl ericl removed their assignment Jan 30, 2021
@AmeerHajAli
Copy link
Copy Markdown
Contributor

This looks good to me. Can we add a test that verifies it solves the issue?

@AmeerHajAli AmeerHajAli added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 30, 2021
@DmitriGekhtman DmitriGekhtman force-pushed the no-monitor-no-bootstrap-config branch from 8f2a9b0 to ba6375f Compare February 1, 2021 02:36
@DmitriGekhtman DmitriGekhtman removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 1, 2021
@DmitriGekhtman
Copy link
Copy Markdown
Contributor Author

This looks good to me. Can we add a test that verifies it solves the issue?

Added a test which confirms that no files are mounted during the operator's Ray cluster creation.

@DmitriGekhtman DmitriGekhtman added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 1, 2021
@AmeerHajAli
Copy link
Copy Markdown
Contributor

The test you added fails for any file mounts. Does that mean that the operator users should have nothing in their file_mounts config? Why not throw an error if they explicitly tried to mount?

@DmitriGekhtman
Copy link
Copy Markdown
Contributor Author

DmitriGekhtman commented Feb 1, 2021 via email

@DmitriGekhtman DmitriGekhtman removed the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 1, 2021
@DmitriGekhtman
Copy link
Copy Markdown
Contributor Author

Will wait for tests to finish.

@DmitriGekhtman DmitriGekhtman added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Feb 1, 2021
@DmitriGekhtman
Copy link
Copy Markdown
Contributor Author

DmitriGekhtman commented Feb 1, 2021

Screen Shot 2021-02-01 at 12 33 37 PM

@AmeerHajAli Tests look good. Awaiting your final review.

@DmitriGekhtman
Copy link
Copy Markdown
Contributor Author

Added a comment describing the additional flag.

Copy link
Copy Markdown
Contributor

@AmeerHajAli AmeerHajAli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. left a minor comment.

@AmeerHajAli
Copy link
Copy Markdown
Contributor

AmeerHajAli commented Feb 2, 2021

@ericl , this looks good to me. Can you please merge?

@dmatch01
Copy link
Copy Markdown
Contributor

dmatch01 commented Feb 4, 2021

@ericl Just wanted to give a gentle nudge for the merge. Would like to continue with my testing once this merge is available. Thank you! cc: @AmeerHajAli @DmitriGekhtman

@ericl ericl merged commit db59736 into ray-project:master Feb 4, 2021
@dmatch01
Copy link
Copy Markdown
Contributor

dmatch01 commented Feb 4, 2021

@ericl Thank you!

fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021
…ad node when calling `create_or_update_head_node`. (ray-project#13720)

* Add option to skip bootstrapping head node autoscaling config

* don't close remote config before copying

* Type

* Type hints etc.

* test

* Test CR to config conversion

* comment
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
…ig to head node when calling `create_or_update_head_node`. (ray-project#13720)"

This reverts commit 729883a.
@AmeerHajAli AmeerHajAli added this to the Serverless Autoscaling milestone Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests-ok The tagger certifies test failures are unrelated and assumes personal liability.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants