[autoscaler][kubernetes] Add ability to not copy cluster config to head node when calling create_or_update_head_node.#13720
Conversation
There was a problem hiding this comment.
can you add type annotation and function level description?
Any idea for a better name since we already have provider.prepare_for_head_node?
|
Hmm, yyeah, I guess these changes are useful only for the Kubernetes operator. |
There was a problem hiding this comment.
Just wanted to clarify the expected logic for this line in the K8s Operator use case where I assume no_monitor_on_head=true:
The warning will not logged if "autoscaling-config" is not inray_start_cmd. Is that the correct/intended interpretation?
In addition, all the use cases where no_monitor_on_head=false:
The warning will be logged whether or not "autoscaling-config" is inray_start_cmd?
There was a problem hiding this comment.
If no_monitor_on_head is true, the if condition becomes
not (A or True) = False,
so the warning isn’t logged.
The readability here could be improved.
There was a problem hiding this comment.
Got it. And I see you improved read ability as well in the PR update. Thank you!
|
This looks good to me. Can we add a test that verifies it solves the issue? |
8f2a9b0 to
ba6375f
Compare
Added a test which confirms that no files are mounted during the operator's Ray cluster creation. |
|
The test you added fails for any file mounts. Does that mean that the operator users should have nothing in their file_mounts config? Why not throw an error if they explicitly tried to mount? |
|
Good question -- there's actually no user-interface for specifying file
mounts for the operator.
Users configure and apply a Kubernetes custom resource, which is internally
translated to an autoscaling config.
There's no file mounts field for the custom resource and the internal
autoscaling config has no file mounts.
This does suggest an improvement to the test: instead of reading in an
autoscaling config, read in a cluster custom resource and translate it to
an autoscaling config as the operator does.
This has the bonus that it tests the function that does the translation.
Will go ahead and make this change in the test.
…On Mon, Feb 1, 2021 at 12:51 AM Ameer Haj Ali ***@***.***> wrote:
The test you added fails for any file mounts. Does that mean that the
operator users should have nothing in their file_mounts config? Why not
throw an error if they explicitly tried to mount?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#13720 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APAQTKZCCMENS76BTCNCG3TS4ZTQZANCNFSM4WUHYIDA>
.
|
|
Will wait for tests to finish. |
| @AmeerHajAli Tests look good. Awaiting your final review. |
|
Added a comment describing the additional flag. |
AmeerHajAli
left a comment
There was a problem hiding this comment.
LGTM. left a minor comment.
|
@ericl , this looks good to me. Can you please merge? |
|
@ericl Just wanted to give a gentle nudge for the merge. Would like to continue with my testing once this merge is available. Thank you! cc: @AmeerHajAli @DmitriGekhtman |
|
@ericl Thank you! |
…ad node when calling `create_or_update_head_node`. (ray-project#13720) * Add option to skip bootstrapping head node autoscaling config * don't close remote config before copying * Type * Type hints etc. * test * Test CR to config conversion * comment
…ig to head node when calling `create_or_update_head_node`. (ray-project#13720)" This reverts commit 729883a.

Why are these changes needed?
In situations where we don't run the monitor on the head node, there's no point in copying the cluster launching config onto the head node.
The copying operation has lead to issues with the K8s operator (#13569).
This PR makes the following changes:
Adds a
no_monitor_on_head_nodeboolean argument tocommands.create_or_update_or_cluster.When set to True, the cluster launching config is not copied to the head node.
Updates the K8s operator is to use the new flag.
Updates the K8s operator example configs to use the
--no-monitorflag (Add ability to not start Monitor when callingray start#13505) in the head ray start commands.To clean up the code a bit, I hid the config-copying logic behind a function call.
Related issue number
Checks
I've run
scripts/format.shto lint the changes in this PR.I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
Tested manually by running the unit tests for K8s operator and Ray on K8s cluster launcher.