Skip to content

rgw: Cleaned up fix for 57562#49179

Closed
adamemerson wants to merge 9 commits intoceph:mainfrom
adamemerson:wip-57562-redux
Closed

rgw: Cleaned up fix for 57562#49179
adamemerson wants to merge 9 commits intoceph:mainfrom
adamemerson:wip-57562-redux

Conversation

@adamemerson
Copy link
Contributor

@adamemerson adamemerson commented Dec 1, 2022

This is a more thoroughly cleaned up fix for https://tracker.ceph.com/issues/57562

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Copy link
Contributor

@kchheda3 kchheda3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering now do we even need the back-off changes?
coz the version will now not change for same update ?

assert(r >= 0);
}

void FIFO::maybe_backoff(const DoutPrefixProvider* dpp, std::uint64_t tid,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function can be stanalone, doesn't use any variables from Fifo class !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses get_backoff().

@adamemerson
Copy link
Contributor Author

Also wondering now do we even need the back-off changes? coz the version will now not change for same update ?

We might not, that's why I asked if you could test without it first. But I figured I should write it up just in case we need it for a fallback.

Fix logic error where disjunction was used instead of conjunction.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Part tags make part creation and setting the head non-idempotent,
leading to issues where racing RGWs may get confused about the correct
tag for a part. (Or worse, potentially have the metadata header hold
different value for a part than the part's header.)

Consistently only requires that all nodes agree on the number.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Except for the version type, where only versions with the same tag are
comparable.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Since we no longer use tags, journal entries are just an operation and
a part number. If an entry being added is already in the journal, skip
it.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Only increment the version if we make an actual change.

Return whether we have changed something or not so the OSD side can
skip writing if there's no change.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
We don't really need the overhead and complexity of a multimap.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
First, make `_prepare_new_head` take the new head part number, so two
calls racing from the same push will attempt to create the same head.

Also remove the neorados FIFO since it doesn't have all the bug fixes
in the legacy version and will be rewritten in terms of `async_compose`
anyway.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
A racing client may delete the part we're trying to push to. Use
`assert_exists()` to check, and re-read metadata if we receive
-ENOENT.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
To break livelocks between two clients, increase the number of
retries, but pause a random number (up to 100) of milliseconds every
ten retries.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>

if (update.max_push_part_num()) {
if (update.max_push_part_num() &&
(max_push_part_num != *update.max_push_part_num())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a thought instead of != check we could have
*update.max_push_part_num() > max_push_part_num ??

@kchheda3
Copy link
Contributor

kchheda3 commented Jan 5, 2023

Also wondering now do we even need the back-off changes? coz the version will now not change for same update ?

We might not, that's why I asked if you could test without it first. But I figured I should write it up just in case we need it for a fallback.

We have finished our testing without the back-off changes and everything is working fine w.r.t object count/mismatch.
so you can get rid of the backoff changes.

@adamemerson
Copy link
Contributor Author

Moving this to wip-57562

@adamemerson adamemerson closed this Jan 9, 2023
@adamemerson adamemerson deleted the wip-57562-redux branch January 10, 2023 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants