rgw/cloudtier: initial MVP of the feature RestoreObject from cloud#59311
rgw/cloudtier: initial MVP of the feature RestoreObject from cloud#59311soumyakoduri merged 3 commits intoceph:mainfrom
Conversation
8644849 to
e49ecc5
Compare
dang
left a comment
There was a problem hiding this comment.
Looks great! A few tiny nits.
Thanks Dan. Addressed your comments and also fixed a bug in ObjectExpirer code. |
|
jenkins test make check |
|
jenkins test make check arm64 |
939b59e to
78c529a
Compare
| // Read ACLOwner maybe ? | ||
| } | ||
|
|
||
| ldpp_dout(tier_ctx.dpp, 20) << __func__ << "(): Sucessfully fetched object from cloud bucket:" << dest_bucket << ", object: " << target_obj_name << dendl; |
There was a problem hiding this comment.
| ldpp_dout(tier_ctx.dpp, 20) << __func__ << "(): Sucessfully fetched object from cloud bucket:" << dest_bucket << ", object: " << target_obj_name << dendl; | |
| ldpp_dout(tier_ctx.dpp, 20) << __func__ << "(): Successfully fetched object from cloud bucket:" << dest_bucket << ", object: " << target_obj_name << dendl; |
| ret = processor.complete(accounted_size, etag, &mtime, set_mtime, | ||
| attrs, rgw::cksum::no_cksum, delete_at , nullptr, nullptr, nullptr, | ||
| (rgw_zone_set *)&zone_set, &canceled, rctx, log_op ? rgw::sal::FLAG_LOG_OP : 0); | ||
| if (ret < 0) { |
| ldpp_dout(dpp, 5) << "restore is not completed yet, please retry again" << dendl; | ||
| s->err.message = "restore is not completed yet"; | ||
| } | ||
| return op_ret; |
There was a problem hiding this comment.
| return op_ret; | |
| } else { // CloudRestored..return success | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
| } else { // CloudRestored..return success | |
| return 0; | |
| } | |
| return op_ret; |
| } else { | ||
| ldpp_dout(this, 20) << " manifest not found" << dendl; | ||
| } | ||
| ldpp_dout(this, 20) << "completed restore" << dendl; |
There was a problem hiding this comment.
most of the errors are alredy handled in the handle_cloud_object_tier, it will reductant IMO
There was a problem hiding this comment.
Yeah, I guess only handle_cloudtier is needed there then.
| if (ret < 0) { | ||
| ldpp_dout(this, 5) << "RGWLC::process_expire_objects: failed, " | ||
| << " worker ix: " << worker->ix << dendl; | ||
| } |
There was a problem hiding this comment.
We should not be failing LC process for any failure with expiring temp objects here.
|
|
||
| if (attr_iter == attrs.end() || restore_status != rgw::sal::RGWRestoreStatus::None) { | ||
| s->err.http_ret = 202; //Accepted | ||
| dump_header(s, "x-amz-restore", rgw_bl_str(restore_status)); |
There was a problem hiding this comment.
here the header can be empty right, is it right to provide empty values for header?
There was a problem hiding this comment.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html#API_HeadObject_ResponseElements
x-amz-restore
If the object is an archived object (an object whose storage class is GLACIER), the response includes this header if either the archive restoration is in progress (see RestoreObject or an archive copy is already restored.
If an archive copy is already restored, the header value indicates when Amazon S3 is scheduled to delete the object copy. For example:
x-amz-restore: ongoing-request="false", expiry-date="Fri, 21 Dec 2012 00:00:00 GMT"
If the object restoration is in progress, the header returns the value ongoing-request="true".
There was a problem hiding this comment.
expiry-date should be set for only temporary copies.
There was a problem hiding this comment.
Okay got it, so in case of read_through at the moment it will be temporary objects only.
| } | ||
|
|
||
| int RadosObject::restore_obj_from_cloud(Bucket* bucket, | ||
| rgw::sal::PlacementTier* tier, |
There was a problem hiding this comment.
yesterday while fixing tier config issue, I was wondering whether we need both tierconf and Placementtier is required for this function
There was a problem hiding this comment.
right..I was wondering the same.. now that tier_config is updated with latest Zonegroup tier info, we can avoid this second argument.
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
d162331 to
8852379
Compare
|
jenkins test windows |
|
jenkins test make check arm64 |
|
jenkins test windows |
|
@mattbenjamin @dang @cbodley .. all the review comments so far have been addressed and code is rebased to latest main. Please review and let us know if this PR can be merged. Thanks! |
|
could we maybe squash some of these, keeping summarizing the squashed commits in the commit mesg? |
8852379 to
54873d9
Compare
|
jenkins test api |
Done. Please review! |
a9707f6 to
acdf239
Compare
|
jenkins test api |
|
jekins test make check arm64 |
|
teuthology results: http://pulpito.front.sepia.ceph.com/soumyakoduri-2024-09-27_07:57:10-rgw-wip-skoduri-cloud-restore-distro-default-smithi One test (rgw:multisite) failed which seems unrelated to this PR. I see that test fail without these changes too. |
acdf239 to
a1a0ead
Compare
|
jenkins retest this please |
|
jenkins test api |
|
jenkins test make check arm64 |
1 similar comment
|
jenkins test make check arm64 |
|
@mattbenjamin , do you have any further comments on this PR? If not can we merge this PR? There are couple of other WIP issues to be reviewed & fixed which are blocked and waiting for this PR to be merged. If there are n't any major comments, we can open another PR to address them if needed. |
a1a0ead to
ca665a4
Compare
yes and its already tested by QE in the downstream. |
1)Add functionality to restore cloud-transitioned objects on demand. Current commit has below - * Given <bucket,object>, fetch the object from the cloud endpoint. * if days provided and > 0, the restore is marked temporary with expiry date. * Without <days>, it is marked as permanent restore. 2)Use ObjectExpirer/delete_at attr to delete temp objects For temporarily restored objects, set delete_at attr to the expiration time. This will add those objects to ObjectExpirer list. Use LC worker thread to scan that list and delete expired objects. By delete here, it means to delete restored object data and reset HEAD object as Cloud-transitioned object as it was before restore. In addition below changes are done - * If temporary, object is still marked RGWObj::CloudTiered and mtime is set same as transition time. * If permanent, object is marked RGWObj::Main and mtime is set to restore time (now()). * rgw_restore_debug_interval option added to set configure restore Days (similar to rgw_lc_debug_interval) There is an issue with ObjectExpirer code where in if an object is added to ObjectExpirer list and is re-written, it is not deleted from the expirer list and hence the new object may get deleted. Fixed the same and also addressed minor review comments. 3)Design doc added 4) ObjCategory should be set to CloudTiered only for cloud-transitioned objects and temporarily restored objects. Permanent copies are to be treated as regular objects. Signed-off-by: Soumya Koduri <skoduri@redhat.com>
What are all supported : * It allows read-through for cloud-tiered objects via restore_obj_from_cloud * New tier config options user need to set allow_read_through to true and read_through_restore_days more than 1 for this feature to work, also objects with retain_head_object will be available for this feature. * First get request will fail with restoring in progress error, objects are downloaded asynchronously. * The objects restore are temporary. * Tested `aws s3api get-object`, `aws s3api head-object` and `aws s3 cp` In addition send timeout errors for first readthrough request Also addressed lint warning and other cleanup(review comments) Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
* For first and repititive request 202 Accepted will be corresponding response code. * For CloudRestored status 200 OK will be corresponding response code. * For conflicting requests 409 Conflict corresponding response code. Also Fixed storage class update while listing objects. Earlier while restoring object temporarily list-objects (s3api) and radosgw-admin bucket list didn't have updated storage class. With this fixed it now has the cloudtier storage class. Signed-off-by: shreyanshjain7174 <ssanchet@redhat.com>
ca665a4 to
1b66bc5
Compare
mattbenjamin
left a comment
There was a problem hiding this comment.
much appreciated cleanup!
|
jenkins test make check |
|
jenkins test make check arm64 |
This PR covers initial PoC for the feature to restore objects from the cloud-tier on demand. Details below -
Feature doc: https://docs.google.com/document/d/1rzLJAzHK6cLuzJswgoplgugNOFCKi8NrjPOGfgVrIFg/edit
Updated to -
src/doc/rgw/cloud-restore.mdin this PRFixes: https://tracker.ceph.com/issues/67679
Restore CLI parsing
Added restore cli parameters and xml parsing for
restore-objectcommandCurrent commit has basic cli checks with their corresponding response, cli parameters like Days.
Response elements:
For first and repetitive request 202 Accepted will be corresponding response code.
For CloudRestored status 200 OK will be corresponding response code.
For conflicting requests 409 Conflict corresponding response code.
TODO:
x-amz-restoreheader in the responseGET/RESTORE : calling cloud restore APIs to fetch the object from the remote endpoint.
What are all supported :
It allows read-through for cloud-tiered objects
New tier config options user need to set allow_read_through to true and read_through_restore_days more than 1 for this feature to work, also objects with retain_head_object will be available for this feature.
First get request will fail with restoring in progress error, objects are downloaded asynchronously.
The objects restored are temporary.
TODO :
x-amz-restoreheaderCloudTier module:
<bucket,object,cloudtier-storage-class>&optional<days>, fetch the object from the cloud endpoint and copy it to STANDARD storage-class<days>provided, anda) > 0, the restore is marked temporary with expiry date, but
b) = 0, restore is not done and an error is returned
<days>, it is considered as permanent restore.x-amz-storage-class.This will add those objects to ObjectExpirer list. Use LC worker thread to scan that list and delete expired objects. By delete here, it means to delete restored object data and reset HEAD object as Cloud-transitioned object as it was before restore.
In addition below changes are done -
transition time.
TODO:
Not addressed in this PR:
Signed-off-by: Shreyansh Sancheti ssancheti@ibm.com
Signed-off-by: Jiffin Tony Thottan thottanjiffin@gmail.com
Signed-off-by: Soumya Koduri skoduri@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e