Skip to content

bluefs: bluefs alloc unit should only be shrink#57015

Merged
yuriw merged 1 commit intoceph:mainfrom
liangmingyuanneo:wip-bluefs-max-alloc-size
Jun 5, 2024
Merged

bluefs: bluefs alloc unit should only be shrink#57015
yuriw merged 1 commit intoceph:mainfrom
liangmingyuanneo:wip-bluefs-max-alloc-size

Conversation

@liangmingyuanneo
Copy link
Contributor

The alloc unit has already forbidden changed for bluestore, what's more, it should forbidden increased in bluefs. Otherwise, it can leads to coredump or bad data. Let's explain it use Bitmap Allocater, it has two presentations:
a) in BitmapAllocator::init_rm_free(offset, length), (offset + length) should bigger than offs. But when get_min_alloc_size() changed bigger, this can not be guaranteed.
b) if init_rm_free() are
successfully in luck, then in rocksdb compact, when release() be called, it release a small extent but may leads to larger extents be released to Bitmap. As a result, rocksdb data is corrupted, and the osd can not be booted again.

https://tracker.ceph.com/issues/65600

Signed-off-by: Mingyuan Liang liangmingyuan@baidu.com

@liangmingyuanneo liangmingyuanneo force-pushed the wip-bluefs-max-alloc-size branch 2 times, most recently from c75118d to f1372de Compare April 25, 2024 13:46
@yuriw
Copy link
Contributor

yuriw commented May 3, 2024

This PR is under test in https://tracker.ceph.com/issues/65797.

Copy link
Contributor

@Matan-B Matan-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024-05-16T02:27:59.753 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:162: TEST_bluestore:  kill 121010
2024-05-16T02:27:59.753 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:162: TEST_bluestore:  sleep 1
2024-05-16T02:28:00.754 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:162: TEST_bluestore:  kill 121010
2024-05-16T02:28:00.755 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh: line 162: kill: (121010) - No such process
2024-05-16T02:28:00.755 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:163: TEST_bluestore:  ceph osd down 3
2024-05-16T02:28:01.275 INFO:tasks.workunit.client.0.smithi104.stderr:osd.3 is already down.
2024-05-16T02:28:01.291 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:166: TEST_bluestore:  ceph-bluestore-tool --path td/osd-bluefs-volume-ops/0 fsck
2024-05-16T02:28:02.596 INFO:tasks.workunit.client.0.smithi104.stdout:fsck success
2024-05-16T02:28:02.617 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:168: TEST_bluestore:  dd if=/dev/zero of=td/osd-bluefs-volume-ops/0/wal count=512 bs=1M
2024-05-16T02:28:02.922 INFO:tasks.workunit.client.0.smithi104.stderr:512+0 records in
2024-05-16T02:28:02.922 INFO:tasks.workunit.client.0.smithi104.stderr:512+0 records out
2024-05-16T02:28:02.922 INFO:tasks.workunit.client.0.smithi104.stderr:536870912 bytes (537 MB, 512 MiB) copied, 0.302297 s, 1.8 GB/s
2024-05-16T02:28:02.922 INFO:tasks.workunit.client.0.smithi104.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-bluefs-volume-ops.sh:169: TEST_bluestore:  ceph-bluestore-tool --path td/osd-bluefs-volume-ops/0 --dev-target td/osd-bluefs-volume-ops/0/wal --command bluefs-bdev-new-wal
2024-05-16T02:28:02.945 INFO:tasks.workunit.client.0.smithi104.stdout:inferring bluefs devices from bluestore path
2024-05-16T02:28:07.054 INFO:tasks.workunit.client.0.smithi104.stderr:*** Caught signal (Segmentation fault) **
2024-05-16T02:28:07.054 INFO:tasks.workunit.client.0.smithi104.stderr: in thread 7f08e70feac0 thread_name:ceph-bluestore-
2024-05-16T02:28:07.056 INFO:tasks.workunit.client.0.smithi104.stderr: ceph version 19.0.0-3728-g6cd0f801 (6cd0f8013e2dea00c3a29a0d8b10656b132d7c80) squid (dev)
2024-05-16T02:28:07.056 INFO:tasks.workunit.client.0.smithi104.stderr: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f08e8303520]
2024-05-16T02:28:07.056 INFO:tasks.workunit.client.0.smithi104.stderr: 2: (BlueFS::_write_super(int)+0xd5) [0x55b54ce483a5]
2024-05-16T02:28:07.056 INFO:tasks.workunit.client.0.smithi104.stderr: 3: (BlueFS::_init_alloc()+0x3aa) [0x55b54ce49d3a]
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 4: (BlueFS::mount()+0xd7) [0x55b54ce4c477]
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 5: (BlueStore::add_new_bluefs_device(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x45b) [0x55b54cec6c6b]
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 6: main()
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 7: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f08e82ead90]
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 8: __libc_start_main()
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: 9: _start()
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr:2024-05-16T02:28:07.075+0000 7f08e70feac0 -1 *** Caught signal (Segmentation fault) **
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr: in thread 7f08e70feac0 thread_name:ceph-bluestore-
2024-05-16T02:28:07.057 INFO:tasks.workunit.client.0.smithi104.stderr:

https://pulpito.ceph.com/yuriw-2024-05-15_21:09:29-rados-wip-yuri5-testing-2024-05-15-0804-distro-default-smithi/7707982

@liangmingyuanneo, Looks related

@liangmingyuanneo
Copy link
Contributor Author

liangmingyuanneo commented May 24, 2024

@Matan-B Thanks, I will check it. Could I get the log of osd?

The alloc unit has already forbidden changed for bluestore, what's more,
it should forbidden increased in bluefs. Otherwise, it can leads to
coredump or bad data. Let's explain it use Bitmap Allocater, it has two
presentations:
a) in BitmapAllocator::init_rm_free(offset, length),
(offset + length) should bigger than offs. But when get_min_alloc_size()
changed bigger, this can not be guaranteed.
b) if init_rm_free() are
successfully in luck, then in rocksdb compact, when release() be called,
it release a small extent but may leads to larger extents be released to
Bitmap. As a result, rocksdb data is corrupted, and the osd can not be
booted again.

Signed-off-by: Mingyuan Liang <liangmingyuan@baidu.com>
@liangmingyuanneo liangmingyuanneo force-pushed the wip-bluefs-max-alloc-size branch from f1372de to eed0312 Compare May 24, 2024 06:40
@liangmingyuanneo
Copy link
Contributor Author

liangmingyuanneo commented May 24, 2024

@ifed01 When new wal is added, there may be no new db device for the osd. Please review again at your convenience, thank you.

Copy link
Contributor

@ifed01 ifed01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

_write_super(BDEV_DB);
}

alloc_size_changed = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO better move under if statement above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for lately replay. Yep, I also think about move if statement. But because NEWWAL statement may not be executed, the original the _write_super() still be needed before NEWWAL statement. So I make the _write_super() after if statement. Anyway, this leads to the codes hard to read. If you think move in if statement is better, I will do it.

@ifed01
Copy link
Contributor

ifed01 commented May 24, 2024

jenkins test make check

@ifed01
Copy link
Contributor

ifed01 commented May 24, 2024

jenkins test api

@ifed01
Copy link
Contributor

ifed01 commented May 24, 2024

jenkins test windows

@Matan-B Matan-B self-requested a review May 27, 2024 08:20
@Matan-B Matan-B dismissed their stale review May 27, 2024 08:21

addressed

@Matan-B Matan-B removed their request for review May 27, 2024 08:21
@aclamk
Copy link
Contributor

aclamk commented Jun 4, 2024

jenkins test windows

@yuriw yuriw merged commit fa83b90 into ceph:main Jun 5, 2024
ifed01 added a commit to ifed01/ceph that referenced this pull request Mar 7, 2025
Additionall this locks tail of DB/WAL volumes which is unaligned to
configured (not minimal!!) BlueFS allocation unit.

Effectively replaces changes from
ceph#57015

Fixes: https://tracker.ceph.com/issues/68772

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
ifed01 added a commit to ifed01/ceph that referenced this pull request Mar 7, 2025
Additionall this locks tail of DB/WAL volumes which is unaligned to
configured (not minimal!!) BlueFS allocation unit.

Effectively replaces changes from
ceph#57015

Fixes: https://tracker.ceph.com/issues/68772

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
ifed01 added a commit to ifed01/ceph that referenced this pull request Mar 21, 2025
Additionall this locks tail of DB/WAL volumes which is unaligned to
configured (not minimal!!) BlueFS allocation unit.

Effectively replaces changes from
ceph#57015

Fixes: https://tracker.ceph.com/issues/68772

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
ifed01 added a commit to ifed01/ceph that referenced this pull request Mar 21, 2025
Additionall this locks tail of DB/WAL volumes which is unaligned to
configured (not minimal!!) BlueFS allocation unit.

Effectively replaces changes from
ceph#57015

Fixes: https://tracker.ceph.com/issues/68772

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
ifed01 added a commit to ifed01/ceph that referenced this pull request Mar 26, 2025
Additionall this locks tail of DB/WAL volumes which is unaligned to
configured (not minimal!!) BlueFS allocation unit.

Effectively replaces changes from
ceph#57015

Fixes: https://tracker.ceph.com/issues/68772

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit effaa68)
ifed01 added a commit to ifed01/ceph that referenced this pull request Apr 15, 2025
Additionall this locks tail of DB/WAL volumes which is unaligned to
configured (not minimal!!) BlueFS allocation unit.

Effectively replaces changes from
ceph#57015

Fixes: https://tracker.ceph.com/issues/68772

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit effaa68)
ifed01 added a commit to ifed01/ceph that referenced this pull request Apr 29, 2025
Additionall this locks tail of DB/WAL volumes which is unaligned to
configured (not minimal!!) BlueFS allocation unit.

Effectively replaces changes from
ceph#57015

Fixes: https://tracker.ceph.com/issues/68772

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit effaa68)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants