-
Notifications
You must be signed in to change notification settings - Fork 962
Description
BUG REPORT
Describe the bug
A prod server crashed because of the segfault in the RocksDB.
Unfortunately, the crash dump is lost. Logs point to org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers
It is hard to pinpoint the issue / match it to a specific rocksDB bug without the crash dump. I cannot repro the problem in unit test and even if I repro it I won't know if that's the exact problem.
So far the crash happened only one time, roughly the timing and code correlate with upgrade to a (internal) version (BK 4.14.x uses rocksdb 6.16.4) with change bringing the use of range deletion w/rocksDB #3653
After some research I have a gut feeling that the problem is related to fix of "a bug in iterator refresh which could segfault for DeleteRange users" facebook/rocksdb#10739
This should be included into RocksDB 7.8.0, I do not see it in 6.x versions. Instead i see 6.29.0 has "Added API warning against using Iterator::Refresh() together with DB::DeleteRange(), which are incompatible and have always risked causing the refreshed iterator to return incorrect results."
With that said, we have the following options:
- do nothing, hope the problem is extremely rare. Collect more info if/when it reoccurs.
- revert Bring back deleteRange for RocksDB to improve location delete performance #3653 cc @hangc0276 - do you have any perf test results that show how much this PR improved performance to help decide why we may want to not revert this?
- upgrade RocksDB to 7.8.0+. Upgrade to 7.x as attempted at Issue 3567: Upgrade rocksdb version to avoid checksum mismatch error #3568 but will need more work for backwards compat tests (at least) assuming there is no data incompatibility. I see some changes around dropping some data format options that may affect downgrade, so there is a risk.
- Upgrade to the RocksDB 6.29.5. It sounds like option 1 with extra steps but there are multiple fixes between 6.16.4 (or even 6.29.4.1 used by BK 4.16) and 6.29.5 that might reduce chances of the problem to surface, e.g.:
Fixed a bug caused by race among flush, incoming writes and taking snapshots. Queries to snapshots created with these race condition can return incorrect result, e.g. resurfacing deleted data.
Fixed a bug that DisableManualCompaction may assert when disable an unscheduled manual compaction.
Fixed a bug that Iterator::Refresh() reads stale keys after DeleteRange() performed.
Fixed a race condition when disable and re-enable manual compaction.
Fix a race condition when cancel manual compaction with DisableManualCompaction. Also DB close can cancel the manual compaction thread.
Fixed a data race on versions_ between DBImpl::ResumeImpl() and threads waiting for recovery to complete (#9496)
Fixed a read-after-free bug in DB::GetMergeOperands().
Fix a data loss bug for 2PC write-committed transaction caused by concurrent transaction commit and memtable switch
Fixed a major bug in which batched MultiGet could return old values for keys deleted by DeleteRange when memtable Bloom filter is enabled
To Reproduce
cannot repro
Expected behavior
no segfault