Skip to content

rgw: dynamic resharding modifies existing bucket instance#38657

Closed
cbodley wants to merge 38 commits intoceph:masterfrom
cbodley:wip-rgw-reshard-instance
Closed

rgw: dynamic resharding modifies existing bucket instance#38657
cbodley wants to merge 38 commits intoceph:masterfrom
cbodley:wip-rgw-reshard-instance

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Dec 18, 2020

builds on #35175

  • refactors the reshard logic into separate functions to better compose the fault injection and error handling
  • adds cleanup of old-style reshards that were in progress
  • refactors the test coverage into a test_bucket_reshard() function that's parameterized on the fault to inject

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@cbodley
Copy link
Contributor Author

cbodley commented Dec 18, 2020

@cbodley cbodley force-pushed the wip-rgw-reshard-instance branch from e2890fb to b31e054 Compare December 18, 2020 21:09
@cbodley
Copy link
Contributor Author

cbodley commented Jan 13, 2021

jenkins test make check

1 similar comment
@cbodley
Copy link
Contributor Author

cbodley commented Jan 15, 2021

jenkins test make check

@cbodley
Copy link
Contributor Author

cbodley commented Jan 18, 2021

jenkins test make check

Shilpa Jagannath and others added 18 commits January 18, 2021 14:44
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
…s a parameter.

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
…s_rgw_reshard_status.

Remove unused parameters in cls_rgw_bucket_instance_entry.
Other minor cleanup fixes.

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
…) to be

able to read the updated layout.

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Manjarabad Jagannath <smanjara@redhat.com>
…ard_reshard()

     - remove unused 'instance_id' from cls_rgw_reshard_entry
     - other minor fixes

Signed-off-by: Shilpa Manjarabad Jagannath <smanjara@redhat.com>
…tions.

     - call init_index() on target layout during reshard process.
       Takes const rgw::bucket_index_layout_generation&

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
     - fix bi_get() to get objects after being resharded

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
…stance_id and

new_instance_id fields back for proper decoding.

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
     - function update_bucket() handles updating bucket state

Signed-off-by: Shilpa Jagannath <smanjara@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley
Copy link
Contributor Author

cbodley commented Jan 18, 2021

rebased. hoping the unrelated make check failures go away

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
the rgw_bucket overload of BucketShard::init() has to look up the bucket
info. use the RGWBucketInfo overload when we have one

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
these are only used for testing, not administration

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley cbodley force-pushed the wip-rgw-reshard-instance branch from c282264 to ffaa749 Compare January 18, 2021 22:09
@cbodley
Copy link
Contributor Author

cbodley commented Jan 18, 2021

oh wow, that's an obscure failure. i guess something in 'make check' was linting test_rgw_reshard.py?

flake8 run-test: commands[0] | flake8 --select=F,E9 --exclude=venv,.tox
./workunits/rgw/test_rgw_reshard.py:9:1: F401 'pprint.pprint' imported but unused
./workunits/rgw/test_rgw_reshard.py:10:1: F401 're' imported but unused

pushed a fix

addresses test timeout and warning message:

[WARNING] /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest-death-test.cc:1121:: Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test detected 3 threads. See https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#death-tests-and-threads for more explanation and suggested solutions, especially if this is the last message you see before your test times out.

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley
Copy link
Contributor Author

cbodley commented Jan 19, 2021

215/215 Test #110: unittest_fault_injector ...................***Timeout 3600.01 sec

pushed a fix for the death tests

@cbodley
Copy link
Contributor Author

cbodley commented Jan 19, 2021

jenkins test api

ldout(store->ctx(), 0) << "ERROR: " << __func__ << " failed to clear "
"target index layout in bucket info: " << cpp_strerror(ret) << dendl;

bucket_info.layout = std::move(prev); // restore in-memory layout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after restoring the in memory layout, I think we have to commit it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block is handling the error from put_bucket_instance_info() above, where we failed to commit the updated bucket_info to rados. so it's just reverting to what we had before

@smanjara
Copy link
Contributor

everything else looks great!

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Contributor Author

cbodley commented Feb 5, 2021

this merged into #39002

@cbodley cbodley closed this Feb 5, 2021
@mattbenjamin
Copy link
Contributor

yay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants