Skip to content

osd/PG.cc: handle removal of pgmeta object#40993

Merged
tchaikov merged 1 commit intoceph:masterfrom
neha-ojha:wip-50466
May 8, 2021
Merged

osd/PG.cc: handle removal of pgmeta object#40993
tchaikov merged 1 commit intoceph:masterfrom
neha-ojha:wip-50466

Conversation

@neha-ojha
Copy link
Member

In 7f04700, we made the pg removal code
much more efficient. But it started marking the pgmeta object as an unexpected
onode, which in reality is expected to be removed after all the other objects.

This behavior is very easily reproducible in a vstart cluster:

ceph osd pool create test 1 1
rados -p test bench 10 write --no-cleanup
ceph osd pool delete test test --yes-i-really-really-mean-it

Before this patch:

"do_delete_work additional unexpected onode list (new onodes has appeared
since PG removal started[#2:00000000::::head#]" seen in the OSD logs.

After this patch:

"do_delete_work removing pgmeta object #2:00000000::::head#" is seen.

Related to:https://tracker.ceph.com/issues/50466
Signed-off-by: Neha Ojha nojha@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

In 7f04700, we made the pg removal code
much more efficient. But it started marking the pgmeta object as an unexpected
onode, which in reality is expected to be removed after all the other objects.

This behavior is very easily reproducible in a vstart cluster:

ceph osd pool create test 1 1
rados -p test bench 10 write --no-cleanup
ceph osd pool delete test test  --yes-i-really-really-mean-it

Before this patch:

"do_delete_work additional unexpected onode list (new onodes has appeared
since PG removal started[ceph#2:00000000::::head#]" seen in the OSD logs.

After this patch:

"do_delete_work removing pgmeta object ceph#2:00000000::::head#" is seen.

Related to:https://tracker.ceph.com/issues/50466
Signed-off-by: Neha Ojha <nojha@redhat.com>
@github-actions github-actions bot added the core label Apr 22, 2021
@neha-ojha neha-ojha requested review from ifed01 and jdurgin April 22, 2021 18:06
@dvanders
Copy link
Contributor

@neha-ojha this will suppress the warning, but not solve the performance cost of the full collection list just above. Now that we understand the extra leftover object, can we just delete it directly instead of listing the entire collection?

@jdurgin
Copy link
Member

jdurgin commented Apr 22, 2021

@neha-ojha this will suppress the warning, but not solve the performance cost of the full collection list just above. Now that we understand the extra leftover object, can we just delete it directly instead of listing the entire collection?

I think we're keeping the pgmeta object around so we can still open up the pg and continue if the osd crashes at this point - similar to not removing a directory until all files in it are gone

Copy link
Member

@xiexingguo xiexingguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

dout(0) << __func__ << " additional unexpected onode list"
<<" (new onodes has appeared since PG removal started"
<< olist << dendl;
for (auto& oid : olist) {
Copy link
Contributor

@tchaikov tchaikov Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, i'd suggest drop the if (!olist.empty()) check at line 2671.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #41233 to address this comment.

@dvanders
Copy link
Contributor

@neha-ojha this will suppress the warning, but not solve the performance cost of the full collection list just above. Now that we understand the extra leftover object, can we just delete it directly instead of listing the entire collection?

I think we're keeping the pgmeta object around so we can still open up the pg and continue if the osd crashes at this point - similar to not removing a directory until all files in it are gone

In that case, do we still need the entire block from line 2660 to 2682? AFAIU the unexpected leftover onode is now understood, is cleaned up elsewhere, so we can drop this expensive collection_list from the beginning.

@k0ste
Copy link
Contributor

k0ste commented Apr 23, 2021

@neha-ojha this will suppress the warning, but not solve the performance cost of the full collection list just above. Now that we understand the extra leftover object, can we just delete it directly instead of listing the entire collection?

I think @dvanders tells about latency spike and possibility to be marked down by mon. Log on tracker.

@tchaikov tchaikov merged commit a46db0c into ceph:master May 8, 2021
@neha-ojha neha-ojha deleted the wip-50466 branch May 10, 2021 20:46
@neha-ojha
Copy link
Member Author

@neha-ojha this will suppress the warning, but not solve the performance cost of the full collection list just above. Now that we understand the extra leftover object, can we just delete it directly instead of listing the entire collection?

I think we're keeping the pgmeta object around so we can still open up the pg and continue if the osd crashes at this point - similar to not removing a directory until all files in it are gone

In that case, do we still need the entire block from line 2660 to 2682? AFAIU the unexpected leftover onode is now understood, is cleaned up elsewhere, so we can drop this expensive collection_list from the beginning.

@dvanders @k0ste I will verify this and address it in a follow-up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants