Skip to content

RocksDB: segfault in org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers #3734

@dlg99

Description

@dlg99

BUG REPORT

Describe the bug

A prod server crashed because of the segfault in the RocksDB.
Unfortunately, the crash dump is lost. Logs point to org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers

It is hard to pinpoint the issue / match it to a specific rocksDB bug without the crash dump. I cannot repro the problem in unit test and even if I repro it I won't know if that's the exact problem.

So far the crash happened only one time, roughly the timing and code correlate with upgrade to a (internal) version (BK 4.14.x uses rocksdb 6.16.4) with change bringing the use of range deletion w/rocksDB #3653

After some research I have a gut feeling that the problem is related to fix of "a bug in iterator refresh which could segfault for DeleteRange users" facebook/rocksdb#10739
This should be included into RocksDB 7.8.0, I do not see it in 6.x versions. Instead i see 6.29.0 has "Added API warning against using Iterator::Refresh() together with DB::DeleteRange(), which are incompatible and have always risked causing the refreshed iterator to return incorrect results."

With that said, we have the following options:

  1. do nothing, hope the problem is extremely rare. Collect more info if/when it reoccurs.
  2. revert Bring back deleteRange for RocksDB to improve location delete performance #3653 cc @hangc0276 - do you have any perf test results that show how much this PR improved performance to help decide why we may want to not revert this?
  3. upgrade RocksDB to 7.8.0+. Upgrade to 7.x as attempted at Issue 3567: Upgrade rocksdb version to avoid checksum mismatch error #3568 but will need more work for backwards compat tests (at least) assuming there is no data incompatibility. I see some changes around dropping some data format options that may affect downgrade, so there is a risk.
  4. Upgrade to the RocksDB 6.29.5. It sounds like option 1 with extra steps but there are multiple fixes between 6.16.4 (or even 6.29.4.1 used by BK 4.16) and 6.29.5 that might reduce chances of the problem to surface, e.g.:
Fixed a bug caused by race among flush, incoming writes and taking snapshots. Queries to snapshots created with these race condition can return incorrect result, e.g. resurfacing deleted data.
Fixed a bug that DisableManualCompaction may assert when disable an unscheduled manual compaction.
Fixed a bug that Iterator::Refresh() reads stale keys after DeleteRange() performed.
Fixed a race condition when disable and re-enable manual compaction.
Fix a race condition when cancel manual compaction with DisableManualCompaction. Also DB close can cancel the manual compaction thread.
Fixed a data race on versions_ between DBImpl::ResumeImpl() and threads waiting for recovery to complete (#9496)
Fixed a read-after-free bug in DB::GetMergeOperands().

Fix a data loss bug for 2PC write-committed transaction caused by concurrent transaction commit and memtable switch 

Fixed a major bug in which batched MultiGet could return old values for keys deleted by DeleteRange when memtable Bloom filter is enabled

To Reproduce

cannot repro

Expected behavior

no segfault

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions