Skip to content

cls/rgw: rgw_dir_suggest_changes detects race with completion#45345

Merged
cbodley merged 2 commits intoceph:masterfrom
cbodley:wip-cls-rgw-suggest-version
Mar 18, 2022
Merged

cls/rgw: rgw_dir_suggest_changes detects race with completion#45345
cbodley merged 2 commits intoceph:masterfrom
cbodley:wip-cls-rgw-suggest-version

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Mar 10, 2022

if bucket listing races with a pending index transaction, its suggested removal may be mistakenly applied if that index transaction completes before the osd receives this suggestion

in rgw_dir_suggest_changes(), the sole condition for applying a suggested change is that the cur_disk.pending_map is empty. this is true after rgw_bucket_complete_op()

on index completion, rgw_bucket_dir_entry::index_ver is updated to match the new value of rgw_bucket_dir_header::ver. because most of struct rgw_bucket_dir_entry makes the round trip through bucket listing -> dir_suggest, we have access to the index_ver of the suggested entry. by comparing this against the stored entry, we can ignore any suggestions that were sent before the most recent completion

Fixes: https://tracker.ceph.com/issues/54528

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mirabile dictu

}

CLS_LOG(20, "cur_disk.pending_map.empty()=%d op=%d cur_disk.exists=%d cur_change.pending_map.size()=%d cur_change.exists=%d",
CLS_LOG(20, "cur_disk.pending_map.empty()=%d op=%d cur_disk.exists=%d "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's amazing this step was missed before (or maybe got removed?); it seems clearly part of the design

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the dir_suggest code path might not have been tested much

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this will merge before my instrumentation PR, which it should. Then I can convert this to the instrumented version.

(int)cur_disk.index_ver, cur_change.exists,
(int)cur_change.index_ver);

if (cur_change.index_ver < cur_disk.index_ver) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@github-actions github-actions bot added the tests label Mar 10, 2022
@cbodley
Copy link
Contributor Author

cbodley commented Mar 10, 2022

added a simple test case that passes with this fix:

[ RUN      ] cls_rgw.index_suggest_complete
[       OK ] cls_rgw.index_suggest_complete (6 ms)

..and fails against master:

[ RUN      ] cls_rgw.index_suggest_complete
/home/cbodley/ceph/src/test/cls_rgw/test_cls_rgw.cc:430: Failure
Expected equality of these values:
  1
  entries.size()
    Which is: 0
[  FAILED  ] cls_rgw.index_suggest_complete (7 ms)

cbodley added 2 commits March 10, 2022 17:04
if bucket listing races with a pending index transaction, its suggested
removal may be mistakenly applied if that index transaction completes
before the osd receives this suggestion

in `rgw_dir_suggest_changes()`, the sole condition for applying a
suggested change is that the `cur_disk.pending_map` is empty. this is
true after rgw_bucket_complete_op()

on index completion, `rgw_bucket_dir_entry::index_ver` is updated to match
the new value of `rgw_bucket_dir_header::ver`. because most of `struct
rgw_bucket_dir_entry` makes the round trip through bucket listing ->
dir_suggest, we have access to the index_ver of the suggested entry. by
comparing this against the stored entry, we can ignore any suggestions
that were sent before the most recent completion

Fixes: https://tracker.ceph.com/issues/54528

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley cbodley force-pushed the wip-cls-rgw-suggest-version branch from e756e91 to a350888 Compare March 10, 2022 22:04
@yehudasa
Copy link
Member

@cbodley the fact that we get dir_suggest happening on listing before completion is called and is effective makes me think we have an issue with the time delta calculation, as it really should only happen after a very long period of time (hours). I remember seeing such a bug relatively recently.

@mattbenjamin
Copy link
Contributor

mattbenjamin commented Mar 10, 2022

@cbodley the fact that we get dir_suggest happening on listing before completion is called and is effective makes me think we have an issue with the time delta calculation, as it really should only happen after a very long period of time (hours). I remember seeing such a bug relatively recently.

The hard-coded timeout has been 120s, and we just made it configurable. I assumed it would be some value in hours, as well. But also, that's the "tag timeout" and from what I understand, this path isn't even covered by it?

@cbodley
Copy link
Contributor Author

cbodley commented Mar 10, 2022

@cbodley the fact that we get dir_suggest happening on listing before completion is called and is effective makes me think we have an issue with the time delta calculation, as it really should only happen after a very long period of time (hours).

are you talking about timing with respect to the tag timeout? we were also expecting that to prevent these racing suggestions from being applied to the index

but the tag timeout only applies to pending transactions in pending_map - in this case, we get the call to rgw_bucket_complete_op() first, which clears its pending entry. and once pending_map is empty, rgw_dir_suggest_changes() will apply any changes it gets

@yehudasa
Copy link
Member

@cbodley the fact that we get dir_suggest happening on listing before completion is called and is effective makes me think we have an issue with the time delta calculation, as it really should only happen after a very long period of time (hours).

are you talking about timing with respect to the tag timeout? we were also expecting that to prevent these racing suggestions from being applied to the index

but the tag timeout only applies to pending transactions in pending_map - in this case, we get the call to rgw_bucket_complete_op() first, which clears its pending entry. and once pending_map is empty, rgw_dir_suggest_changes() will apply any changes it gets

Ah, I see. Yeah, makes sense.

And apparently I changed the tag timeout to 120 seconds from 24 hours here: 99efdc6.

@cbodley
Copy link
Contributor Author

cbodley commented Mar 10, 2022

And apparently I changed the tag timeout to 120 seconds from 24 hours here: 99efdc6.

oh cool. there were some conversations recently about raising that default, but it's good to see your explanation for why it should stay relatively small - at least in multisite

@mattbenjamin
Copy link
Contributor

mattbenjamin commented Mar 10, 2022

And apparently I changed the tag timeout to 120 seconds from 24 hours here: 99efdc6.

oh cool. there were some conversations recently about raising that default, but it's good to see your explanation for why it should stay relatively small - at least in multisite

I don't feel convinced. 120s is just 2 * the most common client timeout for S3 requests. It seems completely unreasonable to have a timeout based mechanism that changes the apparent result of operations acked to clients on the basis of such a timeout.

Copy link
Contributor

@mkogan1 mkogan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, tested on the reproducer environment and the BZ no longer reproduces
(rhbz2042585 -- Client uploaded S3 object not listed but can be downloaded)

@mkogan1
Copy link
Contributor

mkogan1 commented Mar 13, 2022

jenkins test make check

@cbodley
Copy link
Contributor Author

cbodley commented Mar 17, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants