Skip to content

Expose internal config parameters for starting Ray#3246

Merged
richardliaw merged 16 commits intoray-project:masterfrom
richardliaw:config_updating
Nov 8, 2018
Merged

Expose internal config parameters for starting Ray#3246
richardliaw merged 16 commits intoray-project:masterfrom
richardliaw:config_updating

Conversation

@richardliaw
Copy link
Copy Markdown
Contributor

@richardliaw richardliaw commented Nov 6, 2018

What do these changes do?

This PR exposes the CL option for using a config file. This is important for certain tests (i.e., FT tests that removing nodes) to run quickly.

Note that this is bad practice and should be replaced with GFLAGS or some equivalent as soon as possible.

#3239 depends on this.

TODO:

  • Add documentation to method arguments before merging.
  • Add test to verify this works?

Related issue number

@richardliaw richardliaw self-assigned this Nov 6, 2018
@richardliaw richardliaw requested review from ericl and pcmoritz November 6, 2018 02:26
@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9108/
Test FAILed.

@robertnishihara
Copy link
Copy Markdown
Collaborator

In this PR, can we bring back all of the tests that were skipped in #3217?

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9144/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9146/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9149/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9154/
Test FAILed.

@richardliaw
Copy link
Copy Markdown
Contributor Author

jenkins retest this please

@robertnishihara
Copy link
Copy Markdown
Collaborator

FYI, @stephanie-wang this PR reintroduces the stress tests that were skipped in #3217.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9161/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9165/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9160/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9167/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9168/
Test PASSed.


cluster.remove_node(worker)
cluster.wait_for_nodes(retries=10)
assert ray.global_state.cluster_resources()["CPU"] == 2
Copy link
Copy Markdown
Collaborator

@robertnishihara robertnishihara Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test looks like it could be flaky because of the reliance on timing..

Not this line in particular, just the whole test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the test runs on Jenkins which I think has a lot less timing issues? Also, the timing leeway is on the order of seconds.

Is there a better way to guarantee node removal other than refreshing the client table?

Alternatively, we could see if this is an issue and if we ever see it again, we could use flaky (https://github.com/box/flaky)

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9176/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9190/
Test FAILed.

@robertnishihara
Copy link
Copy Markdown
Collaborator

jenkins, retest this please

@richardliaw richardliaw changed the title Expose Config File loading for starting Ray Expose internal config parameters for starting Ray Nov 8, 2018
@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/9195/
Test PASSed.

@richardliaw richardliaw merged commit 0bab8ed into ray-project:master Nov 8, 2018
@richardliaw richardliaw deleted the config_updating branch November 8, 2018 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants