BlueStore:NCB:Bug-Fix for recovery code with shared blobs by benhanokh · Pull Request #44563 · ceph/ceph

benhanokh · 2022-01-12T20:56:45Z

Fixes: https://tracker.ceph.com/issues/53678
Signed-off-by: gbenhano@redhat.com

Replaced the BitmapAllocator used by NCB Recovery code with a Simple bitmap allowing for bits to be set multiple times without any adverse effect(since shared-blobs will report the same allocation multiple times)

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

src/os/bluestore/BlueStore.cc

src/os/bluestore/simple_bitmap.h

src/os/bluestore/simple_bitmap.cc

src/os/bluestore/BlueStore.cc

ronen-fr · 2022-01-17T12:13:37Z

src/os/bluestore/BlueStore.cc

+{
+  //dout(10) << "offset=" << offset << ", length=" <<length<< ", min_alloc_size=" <<min_alloc_size
+  //	   << ", min_alloc_size_mask=" << min_alloc_size_mask << dendl;
+#if 0


better not leave the unused code there

I changed code which is used elsewhere because I don't the case could ever happen (if if it will the old code will break), but I still want to keep a reference to the previous code

I really think this function need cleanup.
I changed semantics here and want people to be able to see it

It isn't a good idea to leave dead code around - either '#if 0' or commented-out.
If you want to explain a change from a previous version - you can add relevant text in a comment, explaining
what was changed and why.

src/os/bluestore/simple_bitmap.h

src/os/bluestore/simple_bitmap.cc

src/os/bluestore/BlueStore.cc

ronen-fr · 2022-01-17T12:23:26Z

src/os/bluestore/BlueStore.cc

-    derr << "****failed create_bitmap_allocator()" << dendl;
+  utime_t       start = ceph_clock_now();
+  SimpleBitmap *sbmap = create_simple_bitmap_allocator(cct, path, bdev->get_size(), min_alloc_size);
+  if (sbmap == nullptr) {


This cannot happen. A ctor never fails in this way

ronen-fr · 2022-01-17T12:25:08Z

src/os/bluestore/BlueStore.cc

+}
+
 //---------------------------------------------------------
 int BlueStore::read_allocation_from_drive_on_startup()


what are the various failure values?

ronen-fr · 2022-01-17T12:25:34Z

src/os/bluestore/BlueStore.cc

-  if (allocator) {
-    dout(5) << "bitmap-allocator=" << allocator << dendl;
-  } else {
+  SimpleBitmap *sbmap = create_simple_bitmap_allocator(cct, path, bdev->get_size(), min_alloc_size);


benhanokh · 2022-01-24T15:56:35Z

@neha-ojha can you please check the failures
http://pulpito.front.sepia.ceph.com/benhanokh-2022-01-23_06:48:03-rados-WIP_GBH_NCB_new_alloc_map_A1-distro-basic-smithi/

neha-ojha · 2022-01-24T16:44:20Z

@benhanokh the make check failures in https://jenkins.ceph.com/job/ceph-pull-requests/88970/ seem related, PTAL

benhanokh · 2022-01-27T18:50:10Z

@neha-ojha can you please check the failures ?
I saw nothing I could attribute to my code
thx
http://pulpito.front.sepia.ceph.com/benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/

ronen-fr · 2022-01-27T19:25:22Z

@neha-ojha can you please check the failures ?
I saw nothing I could attribute to my code
thx
http://pulpito.front.sepia.ceph.com/benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/

benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/6642395
is a data corruption. It might be a bug in the scrubber code, but it might well be a bug introduced here.

ronen-fr · 2022-01-30T11:40:10Z

benhanokh-2022-01-26_21:12:05-rados-WIP_GBH_NCB_new_alloc_map_A6-distro-basic-smithi/6642395
is a data corruption. It might be a bug in the scrubber code, but it might well be a bug introduced here.

Appears in master, too. So - scrub testing issue. Not related to this PR.

src/os/bluestore/simple_bitmap.cc

src/os/bluestore/simple_bitmap.h

aclamk

Good!

neha-ojha · 2022-02-03T17:44:11Z

@benhanokh The reason the api tests are failing is https://jenkins.ceph.com/job/ceph-api/31284/consoleFull#-4369474192a811ea2-3e7b-466b-84b4-d13df7e35809

../src/os/bluestore/bluestore_tool.cc: In function ‘int main(int, char**)’:
../src/os/bluestore/bluestore_tool.cc:575:73: error: no matching function for call to ‘BlueStore::read_allocation_from_drive_for_bluestore_tool(bool)’
  575 |     int r = bluestore.read_allocation_from_drive_for_bluestore_tool(true);
      |                                                                         ^
In file included from ../src/os/bluestore/bluestore_tool.cc:22:
../src/os/bluestore/BlueStore.h:3655:8: note: candidate: ‘int BlueStore::read_allocation_from_drive_for_bluestore_tool()’
 3655 |   int  read_allocation_from_drive_for_bluestore_tool();

neha-ojha · 2022-02-03T22:19:32Z

@benhanokh The reason the api tests are failing is https://jenkins.ceph.com/job/ceph-api/31284/consoleFull#-4369474192a811ea2-3e7b-466b-84b4-d13df7e35809

../src/os/bluestore/bluestore_tool.cc: In function ‘int main(int, char**)’:
../src/os/bluestore/bluestore_tool.cc:575:73: error: no matching function for call to ‘BlueStore::read_allocation_from_drive_for_bluestore_tool(bool)’
  575 |     int r = bluestore.read_allocation_from_drive_for_bluestore_tool(true);
      |                                                                         ^
In file included from ../src/os/bluestore/bluestore_tool.cc:22:
../src/os/bluestore/BlueStore.h:3655:8: note: candidate: ‘int BlueStore::read_allocation_from_drive_for_bluestore_tool()’
 3655 |   int  read_allocation_from_drive_for_bluestore_tool();

@benhanokh your last commit fixes the api test failure! could you please either squash it into previous commits or add a Signed-off-by to this commit and update the commit title to something like os/bluestore/bluestore_tool.cc: update read_allocation_from_drive_for_bluestore_tool call - I think we should be good to merge after that

Replaces the BitmapAllocator used by NCB Recovery code with a dedicated SimpleBitmap. The SimpleBitmap allows for bits to be set multiple times without any adverse effect. This is needed beacuse shared-blobs will report the same allocation multiple times. Fixes: https://tracker.ceph.com/issues/53678 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>

neha-ojha · 2022-02-04T17:32:30Z

@benhanokh please prepare a quincy backport for this, thanks!

…ies and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>

…ies and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) add a step to scrub allocation file after each teuthology test Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>

…ies and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) add a step to scrub allocation file after each teuthology test 18) make shutdown-timeout configurable Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>

…ies and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) make shutdown-timeout configurable Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>

…ies and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) make shutdown-timeout configurable 18) Fix BlueFS handling of SYNC compaction 19) Fix SimpleBitmap init to ignore incomplete block at the end of the bdev (when bdev-size in unaligned on block-size) 20) Force consisnt view of BlueFS by calling bluefs->compact_log() and then bluefs->sync_metadata() Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>

quiesce all activities and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) make shutdown-timeout configurable Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>

quiesce all activities and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) make shutdown-timeout configurable Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com> (cherry picked from commit 9b2a64a)

quiesce all activities and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) make shutdown-timeout configurable Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>

quiesce all activities and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) make shutdown-timeout configurable Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com> (cherry picked from commit 9b2a64a)

quiesce all activities and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) make shutdown-timeout configurable Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>

benhanokh requested a review from aclamk January 12, 2022 20:56

benhanokh self-assigned this Jan 12, 2022

github-actions bot added bluestore build/ops core labels Jan 12, 2022

ifed01 reviewed Jan 12, 2022

View reviewed changes

src/os/bluestore/BlueStore.cc Show resolved Hide resolved

aclamk reviewed Jan 14, 2022

View reviewed changes

src/os/bluestore/simple_bitmap.h Show resolved Hide resolved

aclamk reviewed Jan 14, 2022

View reviewed changes

src/os/bluestore/simple_bitmap.cc Outdated Show resolved Hide resolved

benhanokh force-pushed the NCB_new_alloc_map branch from 8477ef0 to 1fedba6 Compare January 16, 2022 10:37

benhanokh requested a review from a team as a code owner January 17, 2022 10:45

github-actions bot added the crimson label Jan 17, 2022

ronen-fr reviewed Jan 17, 2022

View reviewed changes

benhanokh force-pushed the NCB_new_alloc_map branch from b3d05a9 to ef3517e Compare January 19, 2022 19:23

github-actions bot added the tests label Jan 19, 2022

benhanokh force-pushed the NCB_new_alloc_map branch 3 times, most recently from cf4627f to c916905 Compare January 22, 2022 07:22

benhanokh force-pushed the NCB_new_alloc_map branch 2 times, most recently from eff20b1 to 7ae7e5b Compare January 26, 2022 13:11

benhanokh requested a review from aclamk January 26, 2022 13:15