cls/rgw: rgw_dir_suggest_changes detects race with completion#45345
cls/rgw: rgw_dir_suggest_changes detects race with completion#45345cbodley merged 2 commits intoceph:masterfrom
Conversation
| } | ||
|
|
||
| CLS_LOG(20, "cur_disk.pending_map.empty()=%d op=%d cur_disk.exists=%d cur_change.pending_map.size()=%d cur_change.exists=%d", | ||
| CLS_LOG(20, "cur_disk.pending_map.empty()=%d op=%d cur_disk.exists=%d " |
There was a problem hiding this comment.
it's amazing this step was missed before (or maybe got removed?); it seems clearly part of the design
There was a problem hiding this comment.
I think the dir_suggest code path might not have been tested much
There was a problem hiding this comment.
I'm guessing this will merge before my instrumentation PR, which it should. Then I can convert this to the instrumented version.
| (int)cur_disk.index_ver, cur_change.exists, | ||
| (int)cur_change.index_ver); | ||
|
|
||
| if (cur_change.index_ver < cur_disk.index_ver) { |
|
added a simple test case that passes with this fix: ..and fails against master: |
if bucket listing races with a pending index transaction, its suggested removal may be mistakenly applied if that index transaction completes before the osd receives this suggestion in `rgw_dir_suggest_changes()`, the sole condition for applying a suggested change is that the `cur_disk.pending_map` is empty. this is true after rgw_bucket_complete_op() on index completion, `rgw_bucket_dir_entry::index_ver` is updated to match the new value of `rgw_bucket_dir_header::ver`. because most of `struct rgw_bucket_dir_entry` makes the round trip through bucket listing -> dir_suggest, we have access to the index_ver of the suggested entry. by comparing this against the stored entry, we can ignore any suggestions that were sent before the most recent completion Fixes: https://tracker.ceph.com/issues/54528 Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
e756e91 to
a350888
Compare
|
@cbodley the fact that we get dir_suggest happening on listing before completion is called and is effective makes me think we have an issue with the time delta calculation, as it really should only happen after a very long period of time (hours). I remember seeing such a bug relatively recently. |
The hard-coded timeout has been 120s, and we just made it configurable. I assumed it would be some value in hours, as well. But also, that's the "tag timeout" and from what I understand, this path isn't even covered by it? |
are you talking about timing with respect to the tag timeout? we were also expecting that to prevent these racing suggestions from being applied to the index but the tag timeout only applies to pending transactions in |
Ah, I see. Yeah, makes sense. And apparently I changed the tag timeout to 120 seconds from 24 hours here: 99efdc6. |
oh cool. there were some conversations recently about raising that default, but it's good to see your explanation for why it should stay relatively small - at least in multisite |
I don't feel convinced. 120s is just 2 * the most common client timeout for S3 requests. It seems completely unreasonable to have a timeout based mechanism that changes the apparent result of operations acked to clients on the basis of such a timeout. |
mkogan1
left a comment
There was a problem hiding this comment.
ACK, tested on the reproducer environment and the BZ no longer reproduces
(rhbz2042585 -- Client uploaded S3 object not listed but can be downloaded)
|
jenkins test make check |
if bucket listing races with a pending index transaction, its suggested removal may be mistakenly applied if that index transaction completes before the osd receives this suggestion
in
rgw_dir_suggest_changes(), the sole condition for applying a suggested change is that thecur_disk.pending_mapis empty. this is true after rgw_bucket_complete_op()on index completion,
rgw_bucket_dir_entry::index_veris updated to match the new value ofrgw_bucket_dir_header::ver. because most ofstruct rgw_bucket_dir_entrymakes the round trip through bucket listing -> dir_suggest, we have access to the index_ver of the suggested entry. by comparing this against the stored entry, we can ignore any suggestions that were sent before the most recent completionFixes: https://tracker.ceph.com/issues/54528
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows