os/bluestore: Fix problem with deferred writes replay#60753
os/bluestore: Fix problem with deferred writes replay#60753
Conversation
602b98c to
deef599
Compare
src/os/bluestore/BlueStore.cc
Outdated
| if (debug_deferred_replay_end) debug_deferred_replay_end(); | ||
| if (fake_ch) { | ||
| new_coll_map.clear(); | ||
| bdev->aio_submit(&ioctx); |
There was a problem hiding this comment.
Are we sure that it's safe enough to try to accumulate all the deferred writes in a single IO/DB txc batch? Couldn't it go beyond some bounds?
There was a problem hiding this comment.
It is limited by amount of deferred writes pending. It is limited by amount of deferred writes BlueStore queued before crash.
Plus ones that are stale.
I will take a look.
There was a problem hiding this comment.
Added batching of bdev ops.
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
Added debug hooks for: - init_alloc - deferred start/stop/operation This created framework for specific unit tests. These functions are as name suggests debug level, only for unittests. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Two possible cases of deferred write related corruption are replicated. 1. The case when L entries remain. 2. The case when deferred overwrites newly created BlueFS files Provides unittest for: https://tracker.ceph.com/issues/68060 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Modify _deferred_replay to execute it directly, without involving BlueStore state machine. In result, kv-sync thread is not necessary. RocksDB L entries (deferred writes) are removed directly. Fixes: https://tracker.ceph.com/issues/68060, part responsible for stale L entries. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
deef599 to
3e49cd4
Compare
|
jenkins test make check |
|
jenkins test make check arm64 |
| }; | ||
| } | ||
|
|
||
| // small deferred writes over object |
There was a problem hiding this comment.
The comment isn't correct now, is it?
There was a problem hiding this comment.
Doch, it checks out.
16x IO by 8 bytes and then one 64K write.
There was a problem hiding this comment.
I meant we don't overwrite the previous object any more
src/os/bluestore/BlueStore.cc
Outdated
| } | ||
| } | ||
| } else { | ||
| delete deferred_txn; |
There was a problem hiding this comment.
Looks like deferred_txn is leaking if not getting here.
Previously TransContext cared about that...
There was a problem hiding this comment.
In fact we don't need it to be allocated from heap any more.
There was a problem hiding this comment.
No longer allocated.
Modify _deferred_replay to separate: - applying IO to the disk - DB transaction to remove keys Changed _open_db_and_around. It now calls _deferred_replay. Adapted callers, including fsck. Fixed: https://tracker.ceph.com/issues/68060, original report. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
3e49cd4 to
f2b33bc
Compare
|
jenkins test make check arm64 |
|
jenkins test make check |
1 similar comment
|
jenkins test make check |
|
jenkins test make check arm64 |
Add code to print times; this is for make check arm64. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
|
jenkins test make check arm64 |
2 similar comments
|
jenkins test make check arm64 |
|
jenkins test make check arm64 |
| bool read_only, | ||
| bool to_repair, | ||
| bool apply_deferred, | ||
| bool remove_deferred) |
There was a problem hiding this comment.
Not a must for this specific PR, but can we create types for these 4 boolean values? having a function accept 4 boolean params is an invitation for problems...
|
Please note - this issue is affecting multiple CI runs, and is a real nuisance... |
|
Could this possibly break Crimson w/ Bluestore? See dead jobs, all due to unable to monut Bluestore: |
Fixes known problems with deferred write replay.
Based on unittest for deferred writes #60732
Fixes: https://tracker.ceph.com/issues/68060
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e