rgw/cloud: Handle RGWRESTStreamS3PutObj initialization failures by soumyakoduri · Pull Request #56657 · ceph/ceph

soumyakoduri · 2024-04-03T10:42:10Z

With the recent code added to handle connection errors (commit#e200499bb3c5703862b92a4d7fb534d98601f1bf), RGWRESTStreamS3PutObj initialization could fail at times if there were any failed requests to the cloud endpoint within CONN_STATUS_EXPIRE_SECS period.

This fix is to handle such errors and abort the transition/sync requests which can be retried later by LC/Sync worker threads.

Signed-off-by: Soumya Koduri skoduri@redhat.com
Fixes: https://tracker.ceph.com/issues/65251

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

With the recent code added to handle connection errors (commit#e200499bb3c5703862b92a4d7fb534d98601f1bf), RGWRESTStreamS3PutObj initialization could fail at times if there were any failed requests to the cloud endpoint within CONN_STATUS_EXPIRE_SECS period. This fix is to handle such errors and abort the transition/sync requests which can be retried later by LC/Sync worker threads. Signed-off-by: Soumya Koduri <skoduri@redhat.com>

soumyakoduri · 2024-04-03T11:09:11Z

@cbodley.. following up on #53320 (comment) , these changes prevent the crash. But I am wondering if this will cause the failures (https://github.com/ceph/ceph/blob/main/src/rgw/driver/rados/rgw_lc_tier.cc#L1282), which ideally need to be ignored, fail the transition repeatedly if the error gets mapped to EIO.

Can we make it conditional to not check CONN_STATUS_EXPIRE_SECS for cloud modules which deal with single endpoint per connection.

cbodley

looks correct to handle the case where we have no available endpoints

but i wouldn't expect to see any connection errors from the rgw/cloud-transition suite since we're not stopping/restarting any rgws there. i'm guessing that we still want to figure out which http error is getting to mapped to EIO to cause this in the first place

(edit: normal/expected http errors shouldn't cause cloud transitions to fail)

soumyakoduri · 2024-04-03T17:56:19Z

but i wouldn't expect to see any connection errors from the rgw/cloud-transition suite since we're not stopping/restarting any rgws there. i'm guessing that we still want to figure out which http error is getting to mapped to EIO to cause this in the first place

I am not very sure as this issue is not consistently reproducible in teuthology runs but in my test environment I have seen the crash with op_ret=-125 which I think may be due to ECANCELED errors (version mismatch) while trying to recreate the target bucket.

(edit: normal/expected http errors shouldn't cause cloud transitions to fail)

My doubt is, for suppose, if the cloud endpoint returns EIO when we try to fetch the HEAD object (to check if the object is already present - https://github.com/ceph/ceph/blob/main/src/rgw/driver/rados/rgw_lc_tier.cc#L1282). Since it was added for optimization, we ideally ignore any error for that op and proceed to transition the object. But now the endpoint_status may fail that transition if its tried within 2 sec and this could repeatedly happen right?

soumyakoduri · 2024-04-05T14:24:49Z

http://pulpito.front.sepia.ceph.com/soumyakoduri-2024-04-03_19:08:57-rgw:cloud-transition-wip-skoduri-cloud-trans-distro-default-smithi/

http://pulpito.front.sepia.ceph.com/soumyakoduri-2024-04-03_19:09:28-rgw-wip-skoduri-cloud-trans-distro-default-smithi

soumyakoduri · 2024-04-05T14:26:55Z

jenkins test make check arm64

soumyakoduri · 2024-04-05T16:37:04Z

@cbodley .. can we merge this PR? the teuthology test and make check arm64 failures seem unrelated to this change.

soumyakoduri · 2024-04-05T16:42:21Z

Thanks!

soumyakoduri requested a review from a team as a code owner April 3, 2024 10:42

soumyakoduri requested a review from cbodley April 3, 2024 10:42

github-actions bot added the rgw label Apr 3, 2024

cbodley approved these changes Apr 3, 2024

View reviewed changes

cbodley merged commit 6f58861 into ceph:main Apr 5, 2024

soumyakoduri deleted the wip-skoduri-cloud-trans branch March 6, 2026 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rgw/cloud: Handle RGWRESTStreamS3PutObj initialization failures#56657

rgw/cloud: Handle RGWRESTStreamS3PutObj initialization failures#56657
cbodley merged 1 commit intoceph:mainfrom
soumyakoduri:wip-skoduri-cloud-trans

soumyakoduri commented Apr 3, 2024 •

edited

Loading

Uh oh!

soumyakoduri commented Apr 3, 2024 •

edited

Loading

Uh oh!

cbodley left a comment •

edited

Loading

Uh oh!

soumyakoduri commented Apr 3, 2024

Uh oh!

soumyakoduri commented Apr 5, 2024

Uh oh!

soumyakoduri commented Apr 5, 2024

Uh oh!

soumyakoduri commented Apr 5, 2024

Uh oh!

soumyakoduri commented Apr 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

soumyakoduri commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

soumyakoduri commented Apr 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbodley left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soumyakoduri commented Apr 3, 2024

Uh oh!

soumyakoduri commented Apr 5, 2024

Uh oh!

soumyakoduri commented Apr 5, 2024

Uh oh!

soumyakoduri commented Apr 5, 2024

Uh oh!

soumyakoduri commented Apr 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

soumyakoduri commented Apr 3, 2024 •

edited

Loading

soumyakoduri commented Apr 3, 2024 •

edited

Loading

cbodley left a comment •

edited

Loading