Skip to content

[core] Store Internal Config in GCS#8921

Merged
edoakes merged 40 commits intoray-project:masterfrom
ijrsvt:internal_config
Jul 8, 2020
Merged

[core] Store Internal Config in GCS#8921
edoakes merged 40 commits intoray-project:masterfrom
ijrsvt:internal_config

Conversation

@ijrsvt
Copy link
Copy Markdown
Contributor

@ijrsvt ijrsvt commented Jun 12, 2020

Why are these changes needed?

This stores a single Internal Config for the entire cluster in GCS. This removes the need for specifying the config on every subsequent node that is added.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27096/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Can one of the admins verify this patch?

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27097/
Test FAILed.

@ijrsvt ijrsvt requested a review from edoakes June 12, 2020 23:20
@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27098/
Test FAILed.

@rkooo567 rkooo567 requested a review from ffbin June 12, 2020 23:30
@rkooo567 rkooo567 self-assigned this Jun 12, 2020
@rkooo567
Copy link
Copy Markdown
Contributor

rkooo567 commented Jun 12, 2020

@ijrsvt Can you write some integration tests?

@rkooo567 rkooo567 removed their assignment Jun 12, 2020
logger.error(
"Internal Config parameters can only be set on the head "
"node. NOTE, this include lru_evict.")
self._config == str(dict())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just raising errors?

Copy link
Copy Markdown
Collaborator

@edoakes edoakes Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should just be raising an error here.

Also, lru_evict isn't an internal config parameter from the user's perspective so we shouldn't mention it in the error message. The user should just know they can't pass internal_config=XXX when starting a non-head node.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But lru_evict does modify internal_config parameters--how should that be handled?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just have a separate check for lru_evict flag - if it's set and not head raise an error that says lru_evict can only be set on head node

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added it to the massive check block in worker.py

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27099/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27100/
Test FAILed.

@simon-mo
Copy link
Copy Markdown
Contributor

Can you add some description to this PR

}
};

class GcsInternalConfigTable : public GcsTable<UniqueID, StoredConfig> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about create a kV table? Both key and value type are string. There are other scenarios that need to use it, such as saving GCS server address. The GcsKVTable can use StoreClient directly.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a good idea but not sure it should be a blocker for this PR

Copy link
Copy Markdown
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Comments are basically all about style, nothing too major.

logger.error(
"Internal Config parameters can only be set on the head "
"node. NOTE, this include lru_evict.")
self._config == str(dict())
Copy link
Copy Markdown
Collaborator

@edoakes edoakes Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should just be raising an error here.

Also, lru_evict isn't an internal config parameter from the user's perspective so we shouldn't mention it in the error message. The user should just know they can't pass internal_config=XXX when starting a non-head node.

ijrsvt and others added 4 commits June 15, 2020 10:53
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@ijrsvt
Copy link
Copy Markdown
Contributor Author

ijrsvt commented Jun 29, 2020

@rkooo567 I actually just added to test_multi_node_2.py::test_internal_config. Unless you want me to actually test that it is in GCS?

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27688/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27721/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27736/
Test FAILed.

@ijrsvt
Copy link
Copy Markdown
Contributor Author

ijrsvt commented Jul 1, 2020

Thanks @kisuke95 for all the help!

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27776/
Test FAILed.

@kisuke95
Copy link
Copy Markdown
Member

kisuke95 commented Jul 1, 2020

@ijrsvt My pleasure.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27801/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27796/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27817/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27915/
Test FAILed.

@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27994/
Test FAILed.

@edoakes edoakes merged commit 9172f8c into ray-project:master Jul 8, 2020
@ijrsvt ijrsvt deleted the internal_config branch August 27, 2020 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants