Skip to content

rgw: Fix race condition on FIFO client on new head creation#48632

Merged
cbodley merged 8 commits intoceph:mainfrom
adamemerson:wip-57562
Jan 18, 2023
Merged

rgw: Fix race condition on FIFO client on new head creation#48632
cbodley merged 8 commits intoceph:mainfrom
adamemerson:wip-57562

Conversation

@adamemerson
Copy link
Contributor

@adamemerson adamemerson commented Oct 26, 2022

A thorough addressing of races and possible inconsistencies in FIFO code.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson aemerson@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@adamemerson adamemerson force-pushed the wip-57562 branch 2 times, most recently from f7734aa to 287b627 Compare November 1, 2022 19:14
@jzhu116-bloomberg
Copy link
Contributor

https://github.com/adamemerson/ceph/blob/wip-57562/src/rgw/cls_fifo_legacy.cc#L691
Should it be "if (new_tail > info.tail_part_num)"?

https://github.com/adamemerson/ceph/blob/wip-57562/src/rgw/cls_fifo_legacy.cc#L696
Do we need to add "!head_part_num" into this condition, or it always comes together with another condition?

@adamemerson adamemerson force-pushed the wip-57562 branch 2 times, most recently from 792c27a to 74848f8 Compare November 14, 2022 20:51
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Fix logic error where disjunction was used instead of conjunction.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Part tags make part creation and setting the head non-idempotent,
leading to issues where racing RGWs may get confused about the correct
tag for a part. (Or worse, potentially have the metadata header hold
different value for a part than the part's header.)

Consistently only requires that all nodes agree on the number.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Except for the version type, where only versions with the same tag are
comparable.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Since we no longer use tags, journal entries are just an operation and
a part number. If an entry being added is already in the journal, skip
it.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Only increment the version if we make an actual change.

Return whether we have changed something or not so the OSD side can
skip writing if there's no change.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
@adamemerson adamemerson requested review from kchheda3 and mattbenjamin and removed request for kchheda3 January 9, 2023 19:33
@adamemerson
Copy link
Contributor Author

Someone please review this. It's ready to go in and be backported.

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't feel expert enough to approve, but lgtm

Copy link
Contributor

@kchheda3 kchheda3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Adam

We don't really need the overhead and complexity of a multimap.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
First, make `_prepare_new_head` take the new head part number, so two
calls racing from the same push will attempt to create the same head.

Also remove the neorados FIFO since it doesn't have all the bug fixes
in the legacy version and will be rewritten in terms of `async_compose`
anyway.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
A racing client may delete the part we're trying to push to. Use
`assert_exists()` to check, and re-read metadata if we receive
-ENOENT.

Fixes: https://tracker.ceph.com/issues/57562
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
@adamemerson
Copy link
Contributor Author

@cbodley cbodley merged commit 02948ff into ceph:main Jan 18, 2023
@adamemerson adamemerson deleted the wip-57562 branch January 20, 2023 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants