Skip to content

[rgw][tentacle] backport of cloud-restore related PRs#65830

Merged
ivancich merged 6 commits intoceph:tentaclefrom
soumyakoduri:wip-skoduri-tentacle
Nov 19, 2025
Merged

[rgw][tentacle] backport of cloud-restore related PRs#65830
ivancich merged 6 commits intoceph:tentaclefrom
soumyakoduri:wip-skoduri-tentacle

Conversation

@soumyakoduri
Copy link
Copy Markdown
Contributor

@soumyakoduri soumyakoduri commented Oct 8, 2025

This PR contains backports of below PRs

#64804
#64933
#65926

Fixes: https://tracker.ceph.com/issues/73408
Fixes: https://tracker.ceph.com/issues/73409

Note: Once this PR is merged, relevant s3-tests changes from ceph/s3-tests#680 , ceph/s3-tests#686 and ceph/s3-tests#701 also need to be backported to ceph-tentacle branch.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@cbodley
Copy link
Copy Markdown
Contributor

cbodley commented Oct 8, 2025

Note: Once this PR is merged, relevant s3-tests changes from ceph/s3-tests#680 and ceph/s3-tests#686 also need to be backported to ceph-tentacle branch.

thanks @soumyakoduri - ideally we would use those test cases to validate this backport. can i ask you to qa this manually against an s3tests branch with those prs on top of ceph-tentacle?

on a related note, we're discussing moving s3-tests into the ceph repo itself so we don't have to mess with branches. that way, backport prs will automatically include their test cases. you can follow progress in #65724

@soumyakoduri
Copy link
Copy Markdown
Contributor Author

Note: Once this PR is merged, relevant s3-tests changes from ceph/s3-tests#680 and ceph/s3-tests#686 also need to be backported to ceph-tentacle branch.

thanks @soumyakoduri - ideally we would use those test cases to validate this backport. can i ask you to qa this manually against an s3tests branch with those prs on top of ceph-tentacle?

on a related note, we're discussing moving s3-tests into the ceph repo itself so we don't have to mess with branches. that way, backport prs will automatically include their test cases. you can follow progress in #65724

thanks @cbodley ..sure.. I will run the tests and update.

@soumyakoduri
Copy link
Copy Markdown
Contributor Author

jenkins test api

@soumyakoduri
Copy link
Copy Markdown
Contributor Author

While testing this PR, I ran into an intermittent test_read_though testcase failure. Have submitted #65926 & ceph/s3-tests#701 to fix the same.

teuthology results after applying these new commits - http://pulpito.front.sepia.ceph.com/soumyakoduri-2025-10-13_13:43:23-rgw:cloud-transition-wip-skoduri-tentacle-distro-default-smithi

As per AWS spec (https://docs.aws.amazon.com/AmazonS3/latest/API/API_RestoreObject.html),
if a `restore-object` request is re-issued on already restored copy, server needs to
update restoration period relative to the current time. These changes handles the same.

Note: this applies to only temporary restored copies

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit 9fa3433)
In order to resume IN_PROGRESS restore operations post RGW service
restarts, store the entries of the objects being restored from `cloud-s3`
tier persistently. This is already being done for `cloud-s3-glacier`
tier and now the same will be applied to `cloud-s3` tier too.

With this change, when `restore-object` is performed on any object,
it will be marked RESTORE_ALREADY_IN_PROGRESS and added to a restore FIFO queue.
This queue is later processed by Restore worker thread which will try to
fetch the objects from Cloud or Glacier/Tape S3 services. Hence all the
restore operations are now handled asynchronously (for both `cloud-s3`,
`cloud-s3-glacier` tiers).

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit 71882f6)
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit 90b962c)
While adding the restore entry to the FIFO, mark its status as `None`
so that restore thread knows that the entry is being processed for
the first time. Incase the restore is still in progress and the entry
needs to be re-added to the queue, its status then will be marked
`InProgress`.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit 1eb0623)
This includes
* fixing `rgw_cloudtier.py` qa script
* enabling `debug_rgw_restore` for cloud-transition suite
* adding few debug statements

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit 0f98740)
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
(cherry picked from commit d67a713)
@soumyakoduri
Copy link
Copy Markdown
Contributor Author

jenkins test make check

@soumyakoduri
Copy link
Copy Markdown
Contributor Author

soumyakoduri commented Nov 14, 2025

@ivancich @anrao19 the failures seen in QE testing are because of missing commits in s3-tests (mentioned in this PR description)

I ran the restore tests privately using those commits and updated results above. If there are no other failures seen, I request to merge this PR. I can then cherry-pick s3-tests fixes for tentacle branch.

@ivancich
Copy link
Copy Markdown
Member

@soumyakoduri Are your teuthology results from October 18th sufficient? I see you added the needs-qa label on November 7th. Given the mismatch of this PR with current s3 tests, and the subtleties there, I think you're in the best position to evaluate whether this passes QA. Do you consider this having passed QA?

@soumyakoduri
Copy link
Copy Markdown
Contributor Author

@soumyakoduri Are your teuthology results from October 18th sufficient? I see you added the needs-qa label on November 7th. Given the mismatch of this PR with current s3 tests, and the subtleties there, I think you're in the best position to evaluate whether this passes QA. Do you consider this having passed QA?

@ivancich .. yes .. I can vouch for restore tests. With this PR and corresponding s3-tests' changes, I see restore tests being successfully run consistently.
But since this is a release branch, I added needs-qa label to confirm that there are no regressions observed in any other tests/test-suites because of these changes.

@ivancich
Copy link
Copy Markdown
Member

ivancich commented Nov 18, 2025

@soumyakoduri : Can you verify that two QA failures are not related to this PR.

[NOTE: Eric removed one failure that is in the tentacle baseline.]

They're both from this run: https://pulpito.ceph.com/anuchaithra-2025-11-17_05:03:27-rgw-wip-anrao4-testing-2025-11-14-1154-tentacle-distro-default-smithi/

And they are part of this QA tracker: https://tracker.ceph.com/issues/73809

Thanks!

@soumyakoduri
Copy link
Copy Markdown
Contributor Author

@soumyakoduri : Can you verify that two QA failures are not related to this PR.

* https://pulpito.ceph.com/anuchaithra-2025-11-17_05:03:27-rgw-wip-anrao4-testing-2025-11-14-1154-tentacle-distro-default-smithi/8606175

yes..I confirm that these two failures were present earlier too and will be fixed once this PR and s3-tests' commits mentioned in this PR description are merged.

@ivancich ivancich merged commit af76f47 into ceph:tentacle Nov 19, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants