rgw/cloud-restore [PART2] : Add Restore support from Glacier/Tape cloud endpoints#62713
Conversation
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
src/rgw/driver/rados/rgw_sal_rados.h
Outdated
| int num_objs; | ||
| librados::IoCtx& ioctx; | ||
| using centries = std::vector<ceph::buffer::list>; | ||
| ceph::containers::tiny_vector<LazyFIFO> fifos; |
There was a problem hiding this comment.
@adamemerson I used LazyFIFO (from rgw_log_backing.h) to store restore entries similar to datalog. After rebase I noticed that you updated LazyFIFO to use neorados. Do you suggest I update this RadosRestore too with neorados or do I need to create new wrapper for FIFO class ?
1616429 to
b5b5c7d
Compare
b5b5c7d to
985d81a
Compare
|
jenkins test make check arm64 |
|
jenkins test signed |
|
@mattbenjamin @adamemerson @dang @thotz ..This PR is now ready and contains all the commits merged into downstream till now along with one additional commit (the top commit [rgw/restore: Update to neorados FIFO routines] ) which handles the changes needed to use new neorados/FIFO routines in the upstream. Please review. |
| for (auto i=0; i < num_objs; i++) { | ||
| std::unique_ptr<rgw::cls::fifo::FIFO> fifo_tmp; | ||
| ret = rgw::cls::fifo::FIFO::create(dpp, ioctx, obj_names[i], &fifo_tmp, y); | ||
| std::unique_ptr<fifo::FIFO> fifo_tmp; |
There was a problem hiding this comment.
ret is no longer used in this function
There was a problem hiding this comment.
cleaned this up.
|
|
||
| return 1; | ||
| } | ||
| return 1; |
There was a problem hiding this comment.
for better readablity need to move return 1 to if condition
There was a problem hiding this comment.
but we cannot return true until all the shards are processed and are found empty .
src/rgw/rgw_restore.cc
Outdated
| done: | ||
| lock.unlock(); | ||
|
|
||
| return 0; |
There was a problem hiding this comment.
are we ignoring ret values intentionally here?
There was a problem hiding this comment.
thanks..addressed it.
thotz
left a comment
There was a problem hiding this comment.
Some minor suggestion regarding handling of ret values in various places
a6b1efd to
c4f1305
Compare
342130f to
aad3024
Compare
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
aad3024 to
22e8eff
Compare
|
@adamemerson @thotz .. I addressed review comments and rebase conflicts.. kindly review |
22e8eff to
6a8631d
Compare
|
jenkins test docs |
6a8631d to
d4dbb6f
Compare
|
jenkins test make check |
…cier/Tape endpoint Restoration of objects from certain cloud services (like Glacier/Tape) could take significant amount of time (even days). Hence store the state of such restore requests and periodically process them. Brief summary of changes * Refactored existing restore code to consolidate and move all restore processing into rgw_restore* file/class * RGWRestore class is defined to manage the restoration of objects. * Lastly, for SAL_RADOS, FIFO is used to store and read restore entries. Currently, this PR handles storing state of restore requests sent to cloud-glacier tier-type which need async processing. The changes are tested with AWS Glacier Flexible Retrieval with tier_type Expedited and Standard. Reviewed-by: Matt Benjamin <mbenjamin@redhat.com> Reviewed-by: Adam Emerson <aemerson@redhat.com> Reviewed-by: Jiffin Tony Thottan <thottanjiffin@gmail.com> Reviewed-by: Daniel Gryniewicz <dang@redhat.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com>
In case adding restore entry to FIFO fails, reset the `restore_status` of that object as "RestoreFailed" so that restore process can be retried from the end S3 user. Reviewed-by: Adam Emerson <aemerson@redhat.com> Reviewed-by: Jiffin Tony Thottan <thottanjiffin@gmail.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com>
In addition, added some more debug statements and done code cleanup Reviewed-by: Adam Emerson <aemerson@redhat.com> Reviewed-by: Jiffin Tony Thottan <thottanjiffin@gmail.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Reviewed-by: Adam Emerson <aemerson@redhat.com> Reviewed-by: Matt Benjamin <mbenjamin@redhat.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Use new neorados/FIFO routines to store restore state. Note: Old librados ioctx is also still retained as it is needed by RestoreRadosSerializer. Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
d4dbb6f to
a981b4c
Compare
|
|
||
| void encode(bufferlist& bl) const { | ||
| ENCODE_START(16, 1, bl); | ||
| ENCODE_START(17, 1, bl); |
There was a problem hiding this comment.
@cbodley @adamemerson .. while trying to backport these changes to tentacle, I observed that the ENCODE/DECODE version will be 16 for restore_pool (same as in downstream 8.1 codebase) in that branch as opposed to 17 in main . Would this cause any issue during upgrade later from tentacle to next upcoming release?
To resolve it, can we include "restore_pool" along with "dedup_pool" in the same ENCODE version i.e, 16 here in this PR?
There was a problem hiding this comment.
thanks @soumyakoduri
upstream squid is on the v15 encoding and has up to group_pool: https://github.com/ceph/ceph/blob/squid/src/rgw/driver/rados/rgw_zone.h#L156-L184
i see that #64264 is pending backport to tentacle, adding v16 for dedup_pool (cc @benhanokh)
does 8.1 add dedup_pool for v16 too? if not, we'd probably need tentacle to include a v16 with only restore_pool to match 8.1, and bump the dedup_pool change to v17 so we could support upgrade from 8.1 to 9
There was a problem hiding this comment.
thanks @cbodley ...
yes.. 8.1 doesn't have dedup_pool. Its just restore_pool in v16.
So shall I make changes to reverse the version (i.e, v16 for restore_pool and v17 for dedup_pool ) in main and backport changes in the same order to tentacle?
There was a problem hiding this comment.
#64360 is the backport of this PR to tentacle..
|
jenkins make check arm64 |
|
jenkins test make check arm64 |
|
@soumyakoduri @adamemerson @thotz -- This PR seems to be causing testing failures. |
@ivancich , as you had mentioned in the slack, these tests were run before this PR was merged. So this PR hasn't caused any regressions. Moreover, teuthology tests had passed when tested with this PR changes (#62713 (comment)). I am unsure at the moment why all the restore tests had failed in the log link you pasted above. I have requested Anuchaitra to rerun the tests and see if it consistently fails. Probably any PR in that testing branch may be causing the failures. |
This PR is a continuation of #61745. With this update, we now support restoring objects from Glacier-like cloud endpoints that have longer restore times — for example, AWS Glacier Flexible Retrieval with restore type: Standard.
Summary of changes
restorerequests stateCurrently, this PR handles storing state of restore requests sent to
cloud-glaciertier-type which need async processing.The changes are tested with AWS Glacier Flexible Retrieval with tier_type Expedited and Standard.
TODO: (pending items)
Fixes: https://tracker.ceph.com/issues/70628
Signed-off-by: Soumya Koduri skoduri@redhat.com
Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job Definition