Skip to content

os/bluestore: set rocksdb iterator bounds for Bluestore::_collection_list()#49438

Merged
yuriw merged 1 commit intoceph:mainfrom
cfsnyder:wip-58274-bluestore-collection-list-bounds
May 3, 2024
Merged

os/bluestore: set rocksdb iterator bounds for Bluestore::_collection_list()#49438
yuriw merged 1 commit intoceph:mainfrom
cfsnyder:wip-58274-bluestore-collection-list-bounds

Conversation

@cfsnyder
Copy link
Contributor

@cfsnyder cfsnyder commented Dec 14, 2022

When Bluestore::_collection_list() is called during PG removal, rocksdb often has to iterate over many deletion tombstones from recently deleted onode keys. This change bounds the rocksdb iteration when possible, to avoid high latencies caused by iteration over many contiguous deletion tombstones.

Fixes: https://tracker.ceph.com/issues/58274
Signed-off-by: Cory Snyder csnyder@iland.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@cfsnyder cfsnyder requested a review from a team as a code owner December 14, 2022 19:40
@cfsnyder cfsnyder requested review from aclamk and ifed01 December 14, 2022 19:47
@cfsnyder
Copy link
Contributor Author

jenkins test api

1 similar comment
@cfsnyder
Copy link
Contributor Author

jenkins test api

Copy link
Contributor

@ifed01 ifed01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the patch. It's nice to have it as a code cleanup even if it doesn't bring much improvement to DB lookup performance.
The patch breaks the following test cases from ceph_test_objectstore though:
[ FAILED ] ObjectStore/StoreTest.List_0xfffffff_Hash_Test_in_meta/2, where GetParam() = "bluestore"
[ FAILED ] ObjectStore/StoreTest.List_0xfffffff_Hash_Test_in_PG/2, where GetParam() = "bluestore"
[ FAILED ] ObjectStore/StoreTest.HashCollisionSorting/2, where GetParam() = "bluestore"

dout(30) << __func__ << " pend " << pend << dendl;
continue;
while (true) {
if (!it->valid()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be replace with
while(it->valid()) {
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will update that!

@cfsnyder cfsnyder force-pushed the wip-58274-bluestore-collection-list-bounds branch from cd20779 to 7b0404b Compare January 6, 2023 09:14
@cfsnyder cfsnyder force-pushed the wip-58274-bluestore-collection-list-bounds branch from 7b0404b to 1338a27 Compare January 12, 2023 14:22
@cfsnyder
Copy link
Contributor Author

@ifed01 I updated to fix those tests. Do they not run as part of the GH checks?

@cfsnyder cfsnyder requested a review from ifed01 January 12, 2023 16:53
it->lower_bound(start);
} else {
pend = temp ? coll_range_temp_end : end;
it->upper_bound(cur_range_start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be it->lower_bound(cur_range_start) instead?
Overwise we miss the first entry when start <= cur_range_start, don't we?
Looks like the original implementation has got a bug here as well,,,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look to me like an actual onode key could == cur_range_start because get_coll_range is only returning a key prefix and an actual key would necessarily have additional bytes for the namespace, oid, etc. That said, the same argument also supports the fact that it isn't meaningful to use upper_bound vs. lower_bound in this case, so I think I should simplify by removing the if/else and always doing:

it->lower_bound(low);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have this change and tag the PR with "needs_qa" afterwards.

it->lower_bound(start);
} else {
pend = temp ? coll_range_temp_end : end;
it->upper_bound(cur_range_start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@cfsnyder cfsnyder force-pushed the wip-58274-bluestore-collection-list-bounds branch from 1338a27 to 54a8cae Compare January 19, 2023 16:04
Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good improvement, but needs fixes.

@cfsnyder cfsnyder force-pushed the wip-58274-bluestore-collection-list-bounds branch from 54a8cae to 80cde1a Compare February 22, 2023 11:57
@ljflores
Copy link
Member

jenkins test make check

@ljflores
Copy link
Member

Hey @cfsnyder, we found a failure related to this PR in teuthology:

/a/yuriw-2023-02-13_22:35:30-rados-wip-yuri11-testing-2023-02-06-1424-distro-default-smithi/7173572

2023-02-15T18:07:25.878 INFO:tasks.workunit.client.0.smithi107.stdout:2023-02-15T18:07:17.736778+0000 osd.1 (osd.1) 6 : cluster [ERR] 1.0 deep-scrub 1 errors
2023-02-15T18:07:25.878 INFO:tasks.workunit.client.0.smithi107.stdout:2023-02-15T18:07:18.829538+0000 mgr.x (mgr.4100) 108 : cluster [DBG] pgmap v116: 1 pgs: 1 active+clean; 109 KiB data, 48 MiB used, 300 GiB / 300 GiB avail
2023-02-15T18:07:25.879 INFO:tasks.workunit.client.0.smithi107.stdout:2023-02-15T18:07:20.830056+0000 mgr.x (mgr.4100) 109 : cluster [DBG] pgmap v117: 1 pgs: 1 active+clean; 109 KiB data, 48 MiB used, 300 GiB / 300 GiB avail
2023-02-15T18:07:25.879 INFO:tasks.workunit.client.0.smithi107.stdout:2023-02-15T18:07:22.882002+0000 mon.a (mon.0) 450 : cluster [ERR] Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS)
2023-02-15T18:07:25.879 INFO:tasks.workunit.client.0.smithi107.stdout:2023-02-15T18:07:22.882076+0000 mon.a (mon.0) 451 : cluster [ERR] Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2023-02-15T18:07:25.879 INFO:tasks.workunit.client.0.smithi107.stdout:2023-02-15T18:07:22.830625+0000 mgr.x (mgr.4100) 110 : cluster [DBG] pgmap v118: 1 pgs: 1 active+clean+inconsistent; 109 KiB data, 48 MiB used, 300 GiB / 300 GiB avail
2023-02-15T18:07:25.879 INFO:tasks.workunit.client.0.smithi107.stdout:2023-02-15T18:07:24.831185+0000 mgr.x (mgr.4100) 111 : cluster [DBG] pgmap v119: 1 pgs: 1 active+clean+inconsistent; 109 KiB data, 48 MiB used, 300 GiB / 300 GiB avail
2023-02-15T18:07:25.892 INFO:tasks.workunit.client.0.smithi107.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/pg-split-merge.sh:84: TEST_a_merge_empty:  ceph pg ls
2023-02-15T18:07:26.291 INFO:tasks.workunit.client.0.smithi107.stdout:PG   OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES   OMAP_BYTES*  OMAP_KEYS*  LOG  LOG_DUPS  STATE                      SINCE  VERSION  REPORTED  UP         ACTING     SCRUB_STAMP                      DEEP_SCRUB_STAMP                 LAST_SCRUB_DURATION  SCRUB_SCHEDULING
2023-02-15T18:07:26.291 INFO:tasks.workunit.client.0.smithi107.stdout:1.0       11         0          0        0  111309            0           0    0        11  active+clean+inconsistent     8s     23'7    43:108  [1,0,2]p1  [1,0,2]p1  2023-02-15T18:07:17.736806+0000  2023-02-15T18:07:17.736806+0000                    1  periodic scrub scheduled @ 2023-02-16T23:29:18.565994+0000
2023-02-15T18:07:26.291 INFO:tasks.workunit.client.0.smithi107.stdout:
2023-02-15T18:07:26.292 INFO:tasks.workunit.client.0.smithi107.stdout:* NOTE: Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details.
2023-02-15T18:07:26.310 INFO:tasks.workunit.client.0.smithi107.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/pg-split-merge.sh:85: TEST_a_merge_empty:  grep ' active.clean '
2023-02-15T18:07:26.310 INFO:tasks.workunit.client.0.smithi107.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/pg-split-merge.sh:85: TEST_a_merge_empty:  ceph pg ls
2023-02-15T18:07:26.711 INFO:tasks.workunit.client.0.smithi107.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/pg-split-merge.sh:85: TEST_a_merge_empty:  return 1

The failure can be reproduced locally with:

ninja ceph-objectstore-tool vstart
../qa/run-standalone.sh "pg-split-merge.sh TEST_a_merge_empty" |& tee out.log

@cfsnyder
Copy link
Contributor Author

cfsnyder commented Mar 1, 2023

Thanks @ljflores , I'll look into this

@aclamk
Copy link
Contributor

aclamk commented May 25, 2023

@cfsnyder ping?

@cfsnyder
Copy link
Contributor Author

Thanks for the ping @aclamk. I do intend to get back to this and wrap it up, I've just been preoccupied with more pressing matters lately. I'll do my best to fit it in in the next week or so.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jul 26, 2023
@github-actions github-actions bot added the stale label Oct 24, 2023
@ifed01
Copy link
Contributor

ifed01 commented Oct 25, 2023

jenkins test make check

@github-actions github-actions bot removed the stale label Oct 25, 2023
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Dec 24, 2023
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Jan 23, 2024
@markhpc markhpc reopened this Feb 1, 2024
@markhpc
Copy link
Member

markhpc commented Feb 1, 2024

We still want this likely!

@ljflores
Copy link
Member

ljflores commented Feb 1, 2024

@cfsnyder looks like the blocker was that it failed a teuthology test. Can you take another look and/or let us know if you need another QA run?

@github-actions github-actions bot removed the stale label Feb 1, 2024
@cfsnyder cfsnyder force-pushed the wip-58274-bluestore-collection-list-bounds branch from 80cde1a to 7054fb9 Compare March 25, 2024 18:29
@ifed01 ifed01 self-requested a review March 26, 2024 12:13
@ifed01
Copy link
Contributor

ifed01 commented Mar 26, 2024

jenkins test make check

@ifed01 ifed01 requested a review from aclamk March 26, 2024 12:14
dout(20) << __func__ << " reached max " << max << dendl;
*pnext = it->oid();
return 0;
ls->push_back(it->oid());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good that most of dout were removed.
But I think the one like
dout(20) << __func__ << " oid " << it->oid() << dendl;
would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, just added that log line. thanks for the review!

Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!
But please add oid dout.

@cfsnyder cfsnyder force-pushed the wip-58274-bluestore-collection-list-bounds branch from 7054fb9 to b6cb356 Compare April 9, 2024 13:25
…list()

When Bluestore::_collection_list() is called during PG removal, rocksdb
often has to iterate over many rocksdb deletion tombstones from recenently
deleted onode keys. This change bounds the rocksdb iteration when possible,
to avoid high latencies caused by iteration over many contiguous deletion
tombstones.

Fixes: https://tracker.ceph.com/issues/58274
Signed-off-by: Cory Snyder <csnyder@iland.com>
@cfsnyder cfsnyder force-pushed the wip-58274-bluestore-collection-list-bounds branch from b6cb356 to a6bcb57 Compare April 9, 2024 13:28
@yuriw
Copy link
Contributor

yuriw commented Apr 17, 2024

This PR is under test in https://tracker.ceph.com/issues/65560.

@amathuria
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants