os/bluestore: Actually wait until completion in write_sync#26909
os/bluestore: Actually wait until completion in write_sync#26909tchaikov merged 1 commit intoceph:masterfrom
Conversation
This function is only used by RocksDB WAL writing so it must sync data. This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files` to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups. To my knowledge it doesn't hurt performance in other cases. Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`. Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`. Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559 Signed-off-by: Vitaliy Filippov <vitalif@yourcmc.ru>
|
Haven't looked at this closely yet, but a quick clarification: bluefs_preextend_wal_files is only useful in making writes faster right after the bluestore/OSD is created. Once you've written a little bit of data, we start recycling rocksdb log files, and this option has no effect. So there isn't much value in making it faster. |
|
Another case is just after the compaction... anyway, isn't it better when it gives consistent performance regardless of the log reuse than when it's not? |
|
I've just checked - yes, it seems to reuse log files. However, I observe a VERY strange behaviour: extra io_submit's and fdatasync's go away when I'm running |
|
Could you test it yourself with a single OSD and |
|
jenkins retest this please (no log) |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
retest this please |
test failures tracked by
|
|
@liewegas i just marked the corresponding ticket with "backport=mimic,nautilus", please revert the change if i am wrong. |
|
I see it wasn't backported into Nautilus 14.2.2, is it ok? |
|
yeah, just didn't get backported in time |
This function is only used by RocksDB WAL writing so it must sync data.
This fixes #18338 and thus allows to actually set
bluefs_preextend_wal_filesto true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups. To my knowledge it doesn't hurt performance in other cases. Test it yourself on any HDD withfio -ioengine=rbd -direct=1 -bs=4k -iodepth=1.Issue #18338 is easily reproduced without this patch by issuing a
kill -9to the OSD while doingfio -ioengine=rbd -direct=1 -bs=4M -iodepth=16.Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-off-by: Vitaliy Filippov vitalif@yourcmc.ru