[rgw][tentacle] backport of cloud-restore related PRs#65830
[rgw][tentacle] backport of cloud-restore related PRs#65830ivancich merged 6 commits intoceph:tentaclefrom
Conversation
thanks @soumyakoduri - ideally we would use those test cases to validate this backport. can i ask you to qa this manually against an s3tests branch with those prs on top of ceph-tentacle? on a related note, we're discussing moving s3-tests into the ceph repo itself so we don't have to mess with branches. that way, backport prs will automatically include their test cases. you can follow progress in #65724 |
thanks @cbodley ..sure.. I will run the tests and update. |
|
jenkins test api |
|
While testing this PR, I ran into an intermittent teuthology results after applying these new commits - http://pulpito.front.sepia.ceph.com/soumyakoduri-2025-10-13_13:43:23-rgw:cloud-transition-wip-skoduri-tentacle-distro-default-smithi |
As per AWS spec (https://docs.aws.amazon.com/AmazonS3/latest/API/API_RestoreObject.html), if a `restore-object` request is re-issued on already restored copy, server needs to update restoration period relative to the current time. These changes handles the same. Note: this applies to only temporary restored copies Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 9fa3433)
In order to resume IN_PROGRESS restore operations post RGW service restarts, store the entries of the objects being restored from `cloud-s3` tier persistently. This is already being done for `cloud-s3-glacier` tier and now the same will be applied to `cloud-s3` tier too. With this change, when `restore-object` is performed on any object, it will be marked RESTORE_ALREADY_IN_PROGRESS and added to a restore FIFO queue. This queue is later processed by Restore worker thread which will try to fetch the objects from Cloud or Glacier/Tape S3 services. Hence all the restore operations are now handled asynchronously (for both `cloud-s3`, `cloud-s3-glacier` tiers). Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 71882f6)
Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 90b962c)
While adding the restore entry to the FIFO, mark its status as `None` so that restore thread knows that the entry is being processed for the first time. Incase the restore is still in progress and the entry needs to be re-added to the queue, its status then will be marked `InProgress`. Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 1eb0623)
This includes * fixing `rgw_cloudtier.py` qa script * enabling `debug_rgw_restore` for cloud-transition suite * adding few debug statements Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 0f98740)
Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit d67a713)
8486369 to
3b2aa48
Compare
|
jenkins test make check |
|
@ivancich @anrao19 the failures seen in QE testing are because of missing commits in s3-tests (mentioned in this PR description) I ran the restore tests privately using those commits and updated results above. If there are no other failures seen, I request to merge this PR. I can then cherry-pick s3-tests fixes for tentacle branch. |
|
@soumyakoduri Are your teuthology results from October 18th sufficient? I see you added the needs-qa label on November 7th. Given the mismatch of this PR with current s3 tests, and the subtleties there, I think you're in the best position to evaluate whether this passes QA. Do you consider this having passed QA? |
@ivancich .. yes .. I can vouch for restore tests. With this PR and corresponding s3-tests' changes, I see restore tests being successfully run consistently. |
|
@soumyakoduri : Can you verify that two QA failures are not related to this PR. [NOTE: Eric removed one failure that is in the tentacle baseline.] They're both from this run: https://pulpito.ceph.com/anuchaithra-2025-11-17_05:03:27-rgw-wip-anrao4-testing-2025-11-14-1154-tentacle-distro-default-smithi/ And they are part of this QA tracker: https://tracker.ceph.com/issues/73809 Thanks! |
yes..I confirm that these two failures were present earlier too and will be fixed once this PR and s3-tests' commits mentioned in this PR description are merged. |
This PR contains backports of below PRs
#64804
#64933
#65926
Fixes: https://tracker.ceph.com/issues/73408
Fixes: https://tracker.ceph.com/issues/73409
Note: Once this PR is merged, relevant
s3-testschanges from ceph/s3-tests#680 , ceph/s3-tests#686 and ceph/s3-tests#701 also need to be backported toceph-tentaclebranch.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.