Skip to content

osd/PrimaryLogPG: do not use approx_size() for log trimming#18338

Merged
tchaikov merged 1 commit intoceph:masterfrom
xiexingguo:wip-pg
Oct 18, 2017
Merged

osd/PrimaryLogPG: do not use approx_size() for log trimming#18338
tchaikov merged 1 commit intoceph:masterfrom
xiexingguo:wip-pg

Conversation

@xiexingguo
Copy link
Copy Markdown
Member

@xiexingguo xiexingguo commented Oct 17, 2017

There might be holes on log versions, thus the approx_size()
should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of accessing violation (though it's rare)
while searching for the oldest log entry to keep in the log list later.

Fix the above problem by counting the precise number of current
log entries instead.

Signed-off-by: xie xingguo xie.xingguo@zte.com.cn

There might be holes on log versions, thus the approx_size()
should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of accessing violation
while searching for the oldest log entry to keep in the log list later.

Fix the above problem by counting the precise number of current
log entries instead.

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
@xiexingguo xiexingguo requested a review from liewegas October 17, 2017 11:07
Copy link
Copy Markdown
Member

@liewegas liewegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we used approx_size() because std::list::size() used to be O(n) instead of O(1).

@xiexingguo
Copy link
Copy Markdown
Member Author

I think we used approx_size() because std::list::size() used to be O(n) instead of O(1).

Yeah. It's constant now!(C++11)

@tchaikov tchaikov merged commit 9adab8b into ceph:master Oct 18, 2017
@xiexingguo xiexingguo deleted the wip-pg branch October 18, 2017 03:00
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jul 30, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of accessing violation
while searching for the oldest log entry to keep in the log list later.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jul 30, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jul 30, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Jul 30, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Aug 20, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Sep 20, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Sep 20, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
xiexingguo added a commit to xiexingguo/ceph that referenced this pull request Sep 20, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
neha-ojha pushed a commit to neha-ojha/ceph that referenced this pull request Sep 27, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 3654d56)

Conflicts:
	src/osd/PrimaryLogPG.cc: trivial resolution
neha-ojha pushed a commit to neha-ojha/ceph that referenced this pull request Sep 27, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 3654d56)

Conflicts:
	src/osd/PrimaryLogPG.cc: trivial resolution
xiexingguo added a commit to ceph/ceph-ci that referenced this pull request Oct 13, 2018
In ceph/ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph/ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 3654d56)

Conflicts:
	src/osd/PrimaryLogPG.cc
ivancich pushed a commit to ivancich/ceph-fork that referenced this pull request Oct 29, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 3654d56)

Conflicts:
	src/osd/PrimaryLogPG.cc: trivial resolution
(cherry picked from commit 85a029a)

Resolves: rhbz#1608060
ivancich pushed a commit to ivancich/ceph-fork that referenced this pull request Nov 30, 2018
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 3654d56)

Conflicts:
	src/osd/PrimaryLogPG.cc: trivial resolution
(cherry picked from commit 85a029a)

Resolves: rhbz#1608060
neha-ojha pushed a commit to neha-ojha/ceph that referenced this pull request Jan 10, 2019
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 3654d56)

Conflicts:
	src/osd/PrimaryLogPG.cc: trivial resolution
neha-ojha pushed a commit to neha-ojha/ceph that referenced this pull request Jan 18, 2019
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 3654d56)

Conflicts:
	src/osd/PrimaryLogPG.cc: trivial resolution
ivancich pushed a commit to ivancich/ceph-fork that referenced this pull request Feb 12, 2019
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 3654d56)

Conflicts:
	src/osd/PrimaryLogPG.cc: trivial resolution
(cherry picked from commit 54b04ba)

Resolves: rhbz#1608060
vitalif added a commit to vitalif/ceph that referenced this pull request Mar 8, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-Off-By: Vitaliy Filippov <vitalif@yourcmc.ru>
vitalif added a commit to vitalif/ceph that referenced this pull request Mar 8, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-Off-By: Vitaliy Filippov <vitalif@yourcmc.ru>
vitalif added a commit to vitalif/ceph that referenced this pull request Mar 8, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-Off-By: Vitaliy Filippov <vitalif@yourcmc.ru>
vitalif added a commit to vitalif/ceph that referenced this pull request Mar 9, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-off-by: Vitaliy Filippov <vitalif@yourcmc.ru>
vitalif added a commit to vitalif/ceph that referenced this pull request Mar 9, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-off-by: Vitaliy Filippov <vitalif@yourcmc.ru>
vitalif added a commit to vitalif/ceph that referenced this pull request Mar 12, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-off-by: Vitaliy Filippov <vitalif@yourcmc.ru>
thmour pushed a commit to thmour/ceph that referenced this pull request Jun 7, 2019
In ceph#21580 I set a trap to catch some wired
and random segmentfaults and in a recent QA run I was able to observe it was
successfully triggered by one of the test case, see:

```
http://qa-proxy.ceph.com/teuthology/xxg-2018-07-30_05:25:06-rados-wip-hb-peers-distro-basic-smithi/2837916/teuthology.log
```

The root cause is that there might be holes on log versions, thus the
approx_size() method should (almost) always overestimate the actual number of log entries.
As a result, we might be at the risk of overtrimming log entries.

ceph#18338 reveals a probably easier way
to fix the above problem but unfortunately it also can cause big performance regression
and hence comes this pr..

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
(cherry picked from commit 3654d56)

Conflicts:
	src/osd/PrimaryLogPG.cc: trivial resolution
smithfarm pushed a commit to smithfarm/ceph that referenced this pull request Jun 15, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-off-by: Vitaliy Filippov <vitalif@yourcmc.ru>
(cherry picked from commit c703cf9)
smithfarm pushed a commit to smithfarm/ceph that referenced this pull request Jun 15, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-off-by: Vitaliy Filippov <vitalif@yourcmc.ru>
(cherry picked from commit c703cf9)

Conflicts:
	src/os/bluestore/KernelDevice.cc
- mimic has a single variable "fd_buffered" where master has an array "fd_buffereds"
k0ste pushed a commit to k0ste/ceph that referenced this pull request Aug 9, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-off-by: Vitaliy Filippov <vitalif@yourcmc.ru>
(cherry picked from commit c703cf9)

Conflicts:
- path: src/os/bluestore/KernelDevice.cc
  comment: luminous has a single variable "fd_buffered" where master has an array "fd_buffereds"
k0ste pushed a commit to k0ste/ceph that referenced this pull request Aug 9, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-off-by: Vitaliy Filippov <vitalif@yourcmc.ru>
(cherry picked from commit c703cf9)

Conflicts:
- path: src/os/bluestore/KernelDevice.cc
  comment: luminous has a single variable "fd_buffered" where master has an array "fd_buffereds"
k0ste pushed a commit to k0ste/ceph that referenced this pull request Aug 9, 2019
This function is only used by RocksDB WAL writing so it must sync data.

This fixes ceph#18338 and thus allows to actually set `bluefs_preextend_wal_files`
to true, gaining +100% single-thread write iops in disk-bound (HDD or bad SSD) setups.
To my knowledge it doesn't hurt performance in other cases.
Test it yourself on any HDD with `fio -ioengine=rbd -direct=1 -bs=4k -iodepth=1`.

Issue ceph#18338 is easily reproduced without this patch by issuing a `kill -9` to the OSD
while doing `fio -ioengine=rbd -direct=1 -bs=4M -iodepth=16`.

Fixes: https://tracker.ceph.com/issues/18338 https://tracker.ceph.com/issues/38559
Signed-off-by: Vitaliy Filippov <vitalif@yourcmc.ru>
(cherry picked from commit c703cf9)

Conflicts:
- path: src/os/bluestore/KernelDevice.cc
  comment: luminous has a single variable "fd_buffered" where master has an array "fd_buffereds"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants