osd/PG.cc: handle removal of pgmeta object#40993
Conversation
In 7f04700, we made the pg removal code much more efficient. But it started marking the pgmeta object as an unexpected onode, which in reality is expected to be removed after all the other objects. This behavior is very easily reproducible in a vstart cluster: ceph osd pool create test 1 1 rados -p test bench 10 write --no-cleanup ceph osd pool delete test test --yes-i-really-really-mean-it Before this patch: "do_delete_work additional unexpected onode list (new onodes has appeared since PG removal started[ceph#2:00000000::::head#]" seen in the OSD logs. After this patch: "do_delete_work removing pgmeta object ceph#2:00000000::::head#" is seen. Related to:https://tracker.ceph.com/issues/50466 Signed-off-by: Neha Ojha <nojha@redhat.com>
|
@neha-ojha this will suppress the warning, but not solve the performance cost of the full collection list just above. Now that we understand the extra leftover object, can we just delete it directly instead of listing the entire collection? |
I think we're keeping the pgmeta object around so we can still open up the pg and continue if the osd crashes at this point - similar to not removing a directory until all files in it are gone |
| dout(0) << __func__ << " additional unexpected onode list" | ||
| <<" (new onodes has appeared since PG removal started" | ||
| << olist << dendl; | ||
| for (auto& oid : olist) { |
There was a problem hiding this comment.
nit, i'd suggest drop the if (!olist.empty()) check at line 2671.
In that case, do we still need the entire block from line 2660 to 2682? AFAIU the unexpected leftover onode is now understood, is cleaned up elsewhere, so we can drop this expensive collection_list from the beginning. |
I think @dvanders tells about latency spike and possibility to be marked down by mon. Log on tracker. |
@dvanders @k0ste I will verify this and address it in a follow-up PR. |
In 7f04700, we made the pg removal code
much more efficient. But it started marking the pgmeta object as an unexpected
onode, which in reality is expected to be removed after all the other objects.
This behavior is very easily reproducible in a vstart cluster:
ceph osd pool create test 1 1
rados -p test bench 10 write --no-cleanup
ceph osd pool delete test test --yes-i-really-really-mean-it
Before this patch:
"do_delete_work additional unexpected onode list (new onodes has appeared
since PG removal started[#2:00000000::::head#]" seen in the OSD logs.
After this patch:
"do_delete_work removing pgmeta object #2:00000000::::head#" is seen.
Related to:https://tracker.ceph.com/issues/50466
Signed-off-by: Neha Ojha nojha@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox