quincy: OSD::Modify OSD Fast-Shutdown to work safely i.e. quiesce all activit… by benhanokh · Pull Request #45342 · ceph/ceph

benhanokh · 2022-03-10T18:59:27Z

backport tracker: https://tracker.ceph.com/issues/54523

backport of #44913
parent tracker: https://tracker.ceph.com/issues/53266

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh

quiesce all activities and destage allocations to disk before killing the OSD 1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager()) 2) skip service.prepare_to_stop() which can take as much as 10 seconds 3) skip debug options in fast-shutdown 4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD 5) clear op_shardedwq queues, this is safe since we didn't started processing them 6) stop timer 7) drain osd_op_tp (no new items will be added) 8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk 9) skip _shutdown_cache() when we are in the middle of a fast-shutdown 10) increase debug level on fast-shutdown 11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests 12) disable fsck-on-umount when running fast-shutdown 13) add an option to increase debug level at fast-shutdown umount() 14) set a time limit to fast-shutdown 15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed 16) Fix error message for qfsck (error was caused by PR ceph#44563) 17) make shutdown-timeout configurable Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com> (cherry picked from commit 9b2a64a)

Fixes problem with sync compaction (_rewrite_log_and_layout_sync). There was a problem with not updating log_seq after compacting log. It cause to stop _replay log right after first transaction. ... 20 bluefs _replay 0x0: op_dir_create sharding ... 20 bluefs _replay 0x0: op_dir_link sharding/def to 21 ... 20 bluefs _replay 0x0: op_jump_seq 1025 ... 10 bluefs _read h 0x555557c46400 0x1000~1000 from file(ino 1 size 0x1000 mtime 0.000000 allocated 410000 alloc_commit 410000 extents [1:0x1540000~410000]) ... 20 bluefs _read left 0xff000 len 0x1000 ... 20 bluefs _read got 4096 ... 10 bluefs _replay 0x1000: stop: seq 1025 != expected 1026 This is a product of bluefs fine grain locks refactor. Signed-off-by: Adam Kupczyk <akupczyk@redhat.com> (cherry picked from commit 2f8e370) Conflicts: src/test/objectstore/test_bluefs.cc (cherry picked from commit 4fd98ce)

Close window for possibility to capture allocator state and bluefs state that are not in sync. Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com> (cherry picked from commit 8d05255)

idryomov · 2022-03-14T17:17:14Z

jenkins test windows

yuriw · 2022-03-16T20:43:37Z

by @neha-ojha

rados approved, failures tracked in:

https://tracker.ceph.com/issues/52948
https://tracker.ceph.com/issues/52124
https://tracker.ceph.com/issues/54029
https://tracker.ceph.com/issues/43915

benhanokh and others added 3 commits March 10, 2022 20:59

os/bluestore: Fix problem with allocation desync

b89e8a0

Close window for possibility to capture allocator state and bluefs state that are not in sync. Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com> (cherry picked from commit 8d05255)

benhanokh added this to the quincy milestone Mar 10, 2022

benhanokh added the core label Mar 10, 2022

github-actions bot added bluestore common labels Mar 10, 2022

benhanokh requested a review from aclamk March 10, 2022 19:00

benhanokh mentioned this pull request Mar 10, 2022

OSD::Modify OSD Fast-Shutdown to work safely i.e. quiesce all activit… #44913

Merged

benhanokh requested a review from jdurgin March 11, 2022 07:59

jdurgin approved these changes Mar 11, 2022

View reviewed changes

jdurgin added the needs-qa label Mar 11, 2022

aclamk approved these changes Mar 11, 2022

View reviewed changes

yuriw added the wip-yuri2-testing label Mar 14, 2022

yuriw merged commit bf57e16 into ceph:quincy Mar 16, 2022

aclamk mentioned this pull request Mar 23, 2022

quincy: os/bluestore/bluefs: Improve unittest for compaction #45600

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quincy: OSD::Modify OSD Fast-Shutdown to work safely i.e. quiesce all activit…#45342

quincy: OSD::Modify OSD Fast-Shutdown to work safely i.e. quiesce all activit…#45342
yuriw merged 3 commits intoceph:quincyfrom
benhanokh:wip-54523-quincy

benhanokh commented Mar 10, 2022

Uh oh!

idryomov commented Mar 14, 2022

Uh oh!

yuriw commented Mar 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

benhanokh commented Mar 10, 2022

Uh oh!

idryomov commented Mar 14, 2022

Uh oh!

yuriw commented Mar 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants