bluefs: bluefs alloc unit should only be shrink#57015
Conversation
c75118d to
f1372de
Compare
|
This PR is under test in https://tracker.ceph.com/issues/65797. |
There was a problem hiding this comment.
2024-05-16T02:27:59.753 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:162: TEST_bluestore: kill 121010
2024-05-16T02:27:59.753 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:162: TEST_bluestore: sleep 1
2024-05-16T02:28:00.754 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:162: TEST_bluestore: kill 121010
2024-05-16T02:28:00.755 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh: line 162: kill: (121010) - No such process
2024-05-16T02:28:00.755 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:163: TEST_bluestore: ceph osd down 3
2024-05-16T02:28:01.275 INFO:tasks.workunit.client.0.smithi104.stderr:osd.3 is already down.
2024-05-16T02:28:01.291 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:166: TEST_bluestore: ceph-bluestore-tool --path td/osd-bluefs-volume-ops/0 fsck
2024-05-16T02:28:02.596 INFO:tasks.workunit.client.0.smithi104.stdout:fsck success
2024-05-16T02:28:02.617 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:168: TEST_bluestore: dd if=/dev/zero of=td/osd-bluefs-volume-ops/0/wal count=512 bs=1M
2024-05-16T02:28:02.922 INFO:tasks.workunit.client.0.smithi104.stderr:512+0 records in
2024-05-16T02:28:02.922 INFO:tasks.workunit.client.0.smithi104.stderr:512+0 records out
2024-05-16T02:28:02.922 INFO:tasks.workunit.client.0.smithi104.stderr:536870912 bytes (537 MB, 512 MiB) copied, 0.302297 s, 1.8 GB/s
2024-05-16T02:28:02.922 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:169: TEST_bluestore: ceph-bluestore-tool --path td/osd-bluefs-volume-ops/0 --dev-target td/osd-bluefs-volume-ops/0/wal --command bluefs-bdev-new-wal
2024-05-16T02:28:02.945 INFO:tasks.workunit.client.0.smithi104.stdout:inferring bluefs devices from bluestore path
2024-05-16T02:28:07.054 INFO:tasks.workunit.client.0.smithi104.stderr:*** Caught signal (Segmentation fault) **
2024-05-16T02:28:07.054 INFO:tasks.workunit.client.0.smithi104.stderr: in thread 7f08e70feac0 thread_name:ceph-bluestore-
2024-05-16T02:28:07.056 INFO:tasks.workunit.client.0.smithi104.stderr: ceph version 19.0.0-3728-g6cd0f801 (6cd0f8013e2dea00c3a29a0d8b10656b132d7c80) squid (dev)
2024-05-16T02:28:07.056 INFO:tasks.workunit.client.0.smithi104.stderr: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f08e8303520]
2024-05-16T02:28:07.056 INFO:tasks.workunit.client.0.smithi104.stderr: 2: (BlueFS::_write_super(int)+0xd5) [0x55b54ce483a5]
2024-05-16T02:28:07.056 INFO:tasks.workunit.client.0.smithi104.stderr: 3: (BlueFS::_init_alloc()+0x3aa) [0x55b54ce49d3a]
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 4: (BlueFS::mount()+0xd7) [0x55b54ce4c477]
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 5: (BlueStore::add_new_bluefs_device(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x45b) [0x55b54cec6c6b]
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 6: main()
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 7: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f08e82ead90]
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 8: __libc_start_main()
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 9: _start()
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr:2024-05-16T02:28:07.075+0000 7f08e70feac0 -1 *** Caught signal (Segmentation fault) **
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: in thread 7f08e70feac0 thread_name:ceph-bluestore-
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr:
@liangmingyuanneo, Looks related
|
@Matan-B Thanks, I will check it. Could I get the log of osd? |
The alloc unit has already forbidden changed for bluestore, what's more, it should forbidden increased in bluefs. Otherwise, it can leads to coredump or bad data. Let's explain it use Bitmap Allocater, it has two presentations: a) in BitmapAllocator::init_rm_free(offset, length), (offset + length) should bigger than offs. But when get_min_alloc_size() changed bigger, this can not be guaranteed. b) if init_rm_free() are successfully in luck, then in rocksdb compact, when release() be called, it release a small extent but may leads to larger extents be released to Bitmap. As a result, rocksdb data is corrupted, and the osd can not be booted again. Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>
f1372de to
eed0312
Compare
|
@ifed01 When new wal is added, there may be no new db device for the osd. Please review again at your convenience, thank you. |
| _write_super(BDEV_DB); | ||
| } | ||
|
|
||
| alloc_size_changed = false; |
There was a problem hiding this comment.
IMO better move under if statement above.
There was a problem hiding this comment.
Sorry for lately replay. Yep, I also think about move if statement. But because NEWWAL statement may not be executed, the original the _write_super() still be needed before NEWWAL statement. So I make the _write_super() after if statement. Anyway, this leads to the codes hard to read. If you think move in if statement is better, I will do it.
|
jenkins test make check |
|
jenkins test api |
|
jenkins test windows |
|
jenkins test windows |
Additionall this locks tail of DB/WAL volumes which is unaligned to configured (not minimal!!) BlueFS allocation unit. Effectively replaces changes from ceph#57015 Fixes: https://tracker.ceph.com/issues/68772 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Additionall this locks tail of DB/WAL volumes which is unaligned to configured (not minimal!!) BlueFS allocation unit. Effectively replaces changes from ceph#57015 Fixes: https://tracker.ceph.com/issues/68772 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Additionall this locks tail of DB/WAL volumes which is unaligned to configured (not minimal!!) BlueFS allocation unit. Effectively replaces changes from ceph#57015 Fixes: https://tracker.ceph.com/issues/68772 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Additionall this locks tail of DB/WAL volumes which is unaligned to configured (not minimal!!) BlueFS allocation unit. Effectively replaces changes from ceph#57015 Fixes: https://tracker.ceph.com/issues/68772 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Additionall this locks tail of DB/WAL volumes which is unaligned to configured (not minimal!!) BlueFS allocation unit. Effectively replaces changes from ceph#57015 Fixes: https://tracker.ceph.com/issues/68772 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io> (cherry picked from commit effaa68)
Additionall this locks tail of DB/WAL volumes which is unaligned to configured (not minimal!!) BlueFS allocation unit. Effectively replaces changes from ceph#57015 Fixes: https://tracker.ceph.com/issues/68772 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io> (cherry picked from commit effaa68)
Additionall this locks tail of DB/WAL volumes which is unaligned to configured (not minimal!!) BlueFS allocation unit. Effectively replaces changes from ceph#57015 Fixes: https://tracker.ceph.com/issues/68772 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io> (cherry picked from commit effaa68)
The alloc unit has already forbidden changed for bluestore, what's more, it should forbidden increased in bluefs. Otherwise, it can leads to coredump or bad data. Let's explain it use Bitmap Allocater, it has two presentations:
a) in BitmapAllocator::init_rm_free(offset, length), (offset + length) should bigger than offs. But when get_min_alloc_size() changed bigger, this can not be guaranteed.
b) if init_rm_free() are
successfully in luck, then in rocksdb compact, when release() be called, it release a small extent but may leads to larger extents be released to Bitmap. As a result, rocksdb data is corrupted, and the osd can not be booted again.
https://tracker.ceph.com/issues/65600
Signed-off-by: Mingyuan Liang liangmingyuan@baidu.com