OSD::Modify OSD Fast-Shutdown to work safely i.e. quiesce all activit…#44913
OSD::Modify OSD Fast-Shutdown to work safely i.e. quiesce all activit…#44913yuriw merged 3 commits intoceph:masterfrom
Conversation
f451f69 to
38ffd4d
Compare
38ffd4d to
9051e0a
Compare
9051e0a to
3dc81c4
Compare
3dc81c4 to
c10bb3a
Compare
|
looking pretty good, the teuthology piece could be a separate PR |
94d92f6 to
203e3e3
Compare
|
There are 2 tests showing corruption in the allocation file: I need to understand what was done in those tests and why we ended with corrupted allocation file, but for now we should stop the merge :-( |
|
It seems that the problem is an old race-condition in NCB code unrelated to safe-fast-shutdown. The problem might be a race in the way we free up space on BlueFS on compaction |
f35526c to
b94db00
Compare
f70b0d8 to
40213c8
Compare
40213c8 to
62dc694
Compare
|
@benhanokh please see this failure https://pulpito.ceph.com/nojha-2022-02-23_17:58:44-rados-GBH_safe_shutdown_v2_basecode_sanity_check_disabled_2-distro-basic-smithi/6702922/ |
62dc694 to
0417458
Compare
quiesce all activities and destage allocations to disk before killing the OSD
1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager())
2) skip service.prepare_to_stop() which can take as much as 10 seconds
3) skip debug options in fast-shutdown
4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD
5) clear op_shardedwq queues, this is safe since we didn't started processing them
6) stop timer
7) drain osd_op_tp (no new items will be added)
8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk
9) skip _shutdown_cache() when we are in the middle of a fast-shutdown
10) increase debug level on fast-shutdown
11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests
12) disable fsck-on-umount when running fast-shutdown
13) add an option to increase debug level at fast-shutdown umount()
14) set a time limit to fast-shutdown
15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed
16) Fix error message for qfsck (error was caused by PR ceph#44563)
17) make shutdown-timeout configurable
Fixes: https://tracker.ceph.com/issues/53266
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
Fixes problem with sync compaction (_rewrite_log_and_layout_sync). There was a problem with not updating log_seq after compacting log. It cause to stop _replay log right after first transaction. ... 20 bluefs _replay 0x0: op_dir_create sharding ... 20 bluefs _replay 0x0: op_dir_link sharding/def to 21 ... 20 bluefs _replay 0x0: op_jump_seq 1025 ... 10 bluefs _read h 0x555557c46400 0x1000~1000 from file(ino 1 size 0x1000 mtime 0.000000 allocated 410000 alloc_commit 410000 extents [1:0x1540000~410000]) ... 20 bluefs _read left 0xff000 len 0x1000 ... 20 bluefs _read got 4096 ... 10 bluefs _replay 0x1000: stop: seq 1025 != expected 1026 This is a product of bluefs fine grain locks refactor. Signed-off-by: Adam Kupczyk <akupczyk@redhat.com> (cherry picked from commit 2f8e370) Conflicts: src/test/objectstore/test_bluefs.cc
Close window for possibility to capture allocator state and bluefs state that are not in sync. Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
0417458 to
8d05255
Compare
|
|
||
| _close_db_and_around(); | ||
| if (cct->_conf->bluestore_fsck_on_umount) { | ||
| // disable fsck on fast-shutdown |
There was a problem hiding this comment.
"skip" is better word then "disable" here
| hb_back_server_messenger->shutdown(); | ||
|
|
||
| utime_t duration = ceph_clock_now() - start_time_func; | ||
| dout(0) <<"Slow Shutdown duration:" << duration << " seconds" << dendl; |
There was a problem hiding this comment.
Proposal: How about calling it "Full" or "Orderly" instead of "Slow"?
| // vstart overwrites osd_fast_shutdown value in the conf file -> force the value here! | ||
| //cct->_conf->osd_fast_shutdown = true; | ||
|
|
||
| dout(0) << "Fast Shutdown: - cct->_conf->osd_fast_shutdown = " |
There was a problem hiding this comment.
Need to improve output, show something like:
"Shutdown: Fast, null-fm=true"
|
|
||
| int OSD::shutdown() | ||
| { | ||
| // vstart overwrites osd_fast_shutdown value in the conf file -> force the value here! |
There was a problem hiding this comment.
Suspicious comments. Is there something missing in this PR?
If this is for testing purposes, then it should be a separate commit.
|
|
||
| // Debugging | ||
| if (cct->_conf.get_val<bool>("osd_debug_shutdown")) { | ||
| // Disabled debugging during fast-shutdown |
There was a problem hiding this comment.
We do not disable any debugging here....
|
Failures, unrelated: Details: |
|
jenkins test windows |
|
PR was backported to quincy - #45342 |
OSD::Modify OSD Fast-Shutdown to work safely i.e. quiesce all activities and destage allocations to disk before killing the OSD
Fixes: https://tracker.ceph.com/issues/53266
Signed-off-by: Gabriel Benhanokh gbenhano@redhat.com