osd: only call stat/getattrs once per object during deep-scrub#56995
osd: only call stat/getattrs once per object during deep-scrub#56995
Conversation
We call back into be_scan_list repeatedly for the same object in deep scrub if the object is too large/has too many omap entries to be done in a single call. This causes us to wastefully call ObjectStore::getattrs and stat multiple times. This has been a long standing, but mostly harmless bug (aside from the extra metadata traffic). However, 1a4d3f0, between reef and squid, switched ScrubMap::object::attrs to a map<string, bufferlist> from a map<string, bufferptr>. This should have been harmless, except that the ObjectStore::getattrs overload for that type uses aset[i->first].append(i->second); rather than aset.emplace(i.first.c_str(), i.second); to populate the map. Combined with this bug, that means that if we need 7 calls into be_scan_list to deep scrub an object, we'll end up with attributes concatenated 7 times each. This didn't cause visible problems with squid/main testing since all of the replicas would end up doing the same thing and they'd still decode ok, but it did become visible during reef->squid upgrade testing as the older osds sent maps with the correct contents. The next commit will fix ObjectStore::getattrs to clear the map. Fixes: https://tracker.ceph.com/issues/65185 Signed-off-by: Samuel Just <sjust@redhat.com>
Passing in a non-empty map would otherwise exhibit quite unexpected behavior. For the bufferptr overload, any preexisting entries would not be overwritten due to how std::map::emplace behaves. For the bufferlist overload, it would result in appending to any pre-existing entries. The prior commit cleans up one such inadvertent caller which resulted in the below bug. Fixes: https://tracker.ceph.com/issues/65185 Signed-off-by: Samuel Just <sjust@redhat.com>
|
Waiting for test branches to build. wip-sjust-squid-testing-2024-04-18 |
| store->getattrs( | ||
| int r = 0; | ||
| ScrubMap::object &o = map.objects[poid]; | ||
| if (!pos.metadata_done) { |
There was a problem hiding this comment.
ACK. This leads to calling to calling the getattrs() once.
| return -EINPROGRESS; | ||
|
|
||
| if (pos.deep) { | ||
| r = be_deep_scrub(poid, map, pos, o); |
There was a problem hiding this comment.
ACK for decoupling the error handling of getattrs() and be_deep_scrub() .
| r = -ENOENT; | ||
| goto out; | ||
| } | ||
| aset.clear(); |
There was a problem hiding this comment.
I'm not sure this BlueStore-related commit is actually part of the fix specifically for https://tracker.ceph.com/issues/65185. In my understanding this bug is tackled by the first commit in the PR.
Although I agree BlueStore::getattrs() should clear the aset and this change is very desired to get into main right now, I'm more concerned about backporting it to squid immediately. I'm worried we might have a caller depending on the misbehavior of this method.
Is there a business in treating this wide, BlueStore patch separately (e.g. dedicated tracker)?
There was a problem hiding this comment.
I'd be truly shocked if there were any users doing this on purpose. I think the balance of probability is that if there actually were any users that passed in non-empty maps, we'd want this behavior anyway.
There was a problem hiding this comment.
I don't want to leave this behavior in the code base. It's quite dangerous and I'm not really worried about backporting it tbh.
|
This PR is under test in https://tracker.ceph.com/issues/65592. |
|
jenkins test windows |
|
https://pulpito.ceph.com/sjust-2024-04-19_02:04:26-upgrade:reef-x:stress-split-wip-sjust-squid-testing-2024-04-18-distro-default-smithi/ doesn't show the scrub error, but all of the tests failed anyway. Is anyone looking into these other upgrade test failures? @rzarzynski |
|
The problem behind the first dead-with- |
|
Grepped specifically for this kind of error in the two other ones: Everything else in red because a package problem: CC: @athanatos. |
|
I think we can merge this now. As Sam is on PTO, I'll wait a bit for possible objections (@rzarzynski ?), then probably merge |
|
@ronen-fr: https://tracker.ceph.com/issues/65592 tracks the QA. @yuriw, @ljflores: ping. |
Fixes: https://tracker.ceph.com/issues/65185
See bug for details.
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e