Skip to content

[GCS] Support gcs client subscribe idempotent#9153

Closed
ffbin wants to merge 12 commits intoray-project:masterfrom
antgroup:dev_gcs_client_sub_idempotent
Closed

[GCS] Support gcs client subscribe idempotent#9153
ffbin wants to merge 12 commits intoray-project:masterfrom
antgroup:dev_gcs_client_sub_idempotent

Conversation

@ffbin
Copy link
Copy Markdown
Contributor

@ffbin ffbin commented Jun 26, 2020

Why are these changes needed?

Design document:
https://docs.google.com/document/d/1Cuqxlw53abEZPVYVF-pUWpbnwrFEgVTel_sQGwt8DmU/edit#heading=h.s3ogmk8m7308
This PR implements GCS client subscribe idempotent.
Node table and object location table are special in that they contain several records, so they are sorted by timestamp in GCS. The processing logic of this part is relatively complex. In order to merge this PR faster, I plan to split PR into two.
The first pr: #9424

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

@AmplabJenkins
Copy link
Copy Markdown

Can one of the admins verify this patch?

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27588/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27590/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27615/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27633/
Test PASSed.

@ffbin ffbin force-pushed the dev_gcs_client_sub_idempotent branch from 2483dfd to c8735ce Compare June 28, 2020 09:53
@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27637/
Test PASSed.

@ffbin ffbin changed the title [WIP]Support gcs client sub idempotent Support gcs client subscribe idempotent Jun 30, 2020
@rkooo567 rkooo567 changed the title Support gcs client subscribe idempotent [GCS] Support gcs client subscribe idempotent Jun 30, 2020
@rkooo567 rkooo567 requested a review from stephanie-wang June 30, 2020 18:42
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add some comment to explain why need two different filters.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use [&node_id]?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto &object_list = item.second

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add RAY_CHECK()?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAY_CHECK(done)?

Copy link
Copy Markdown
Contributor

@wumuzi520 wumuzi520 Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return rhs.second > lhs.second ? true : (rhs.second == lhs.second ? rhs.first.Hex() > lhs.first.Hex() : false);

@ffbin ffbin force-pushed the dev_gcs_client_sub_idempotent branch from c8735ce to 3a1e5af Compare July 2, 2020 08:47
@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27841/
Test PASSed.

@AmplabJenkins
Copy link
Copy Markdown

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/27849/
Test PASSed.

@rkooo567 rkooo567 self-assigned this Jul 9, 2020
Copy link
Copy Markdown
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design doc says The premise of our processing is that the messages obtained by subscribe and fetch are ordered.. How do we guarantee this now?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the caller should always hold locks whenever it uses gcs client?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the caller should always hold locks whenever it uses gcs client?

GCS client and the main thread use the same event loop, so they are actually called by a single thread.

@ffbin
Copy link
Copy Markdown
Contributor Author

ffbin commented Jul 12, 2020

The design doc says The premise of our processing is that the messages obtained by subscribe and fetch are ordered.. How do we guarantee this now?

Except for node table and object location table, the other tables are sorted by timestamp. Node table and object location table are special in that they contain several records, so they are sorted by timestamp in GCS. The processing logic of this part is relatively complex. In order to merge this PR faster, I plan to split PR into two.

@ffbin ffbin force-pushed the dev_gcs_client_sub_idempotent branch from c01c5a4 to 9b95cd8 Compare July 12, 2020 06:45
@AmplabJenkins
Copy link
Copy Markdown

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/28211/
Test FAILed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants