RocksDB: segfault in org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers

**BUG REPORT**

***Describe the bug***

A prod server crashed because of the segfault in the RocksDB. 
Unfortunately, the crash dump is lost. Logs point to org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers

It is hard to pinpoint the issue / match it to a specific rocksDB bug without the crash dump. I cannot repro the problem in unit test and even if I repro it I won't know if that's the exact problem.

So far the crash happened only one time, roughly the timing and code correlate with upgrade to a (internal) version (BK 4.14.x uses rocksdb 6.16.4) with change bringing the use of range deletion w/rocksDB https://github.com/apache/bookkeeper/pull/3653 

After some research I have a gut feeling that the problem is related to fix of "a bug in iterator refresh which could segfault for DeleteRange users" https://github.com/facebook/rocksdb/pull/10739
This should be included into RocksDB 7.8.0, I do not see it in 6.x versions. Instead i see 6.29.0 has "Added API warning against using Iterator::Refresh() together with DB::DeleteRange(), which are incompatible and have always risked causing the refreshed iterator to return incorrect results." 

With that said, we have the following options:

1. do nothing, hope the problem is extremely rare. Collect more info if/when it reoccurs. 
2. revert https://github.com/apache/bookkeeper/pull/3653 cc @hangc0276 - do you have any perf test results that show how much this PR improved performance to help decide why we may want to not revert this?
3. upgrade RocksDB to 7.8.0+. Upgrade to 7.x as attempted at https://github.com/apache/bookkeeper/pull/3568 but will need more work for backwards compat tests (at least) assuming there is no data incompatibility. I see some changes around dropping some data format options that may affect downgrade, so there is a risk. 
4. Upgrade to the RocksDB 6.29.5. It sounds like option 1 with extra steps but there are multiple fixes between 6.16.4 (or even 6.29.4.1 used by BK 4.16) and 6.29.5 that might reduce chances of the problem to surface, e.g.:

```
Fixed a bug caused by race among flush, incoming writes and taking snapshots. Queries to snapshots created with these race condition can return incorrect result, e.g. resurfacing deleted data.
Fixed a bug that DisableManualCompaction may assert when disable an unscheduled manual compaction.
Fixed a bug that Iterator::Refresh() reads stale keys after DeleteRange() performed.
Fixed a race condition when disable and re-enable manual compaction.
Fix a race condition when cancel manual compaction with DisableManualCompaction. Also DB close can cancel the manual compaction thread.
Fixed a data race on versions_ between DBImpl::ResumeImpl() and threads waiting for recovery to complete (#9496)
Fixed a read-after-free bug in DB::GetMergeOperands().

Fix a data loss bug for 2PC write-committed transaction caused by concurrent transaction commit and memtable switch 

Fixed a major bug in which batched MultiGet could return old values for keys deleted by DeleteRange when memtable Bloom filter is enabled
```

***To Reproduce***

cannot repro

***Expected behavior***

no segfault


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RocksDB: segfault in org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers #3734

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RocksDB: segfault in org.rocksdb.WriteBatch::delete called from org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex#removeOffsetFromDeletedLedgers #3734

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions