Skip to content

Conversation

@hangc0276
Copy link
Contributor

Motivation

When there are a lot of ledgers to be deleted, the EntyLocationIndex in RocksDB will delete a large number of entries at one time.

Nov 11, 2022 @ 05:49:39.336     05:49:39.336 [db-storage-cleanup-10-1] INFO  org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex - Deleted indexes for 0 entries from 16074 ledgers in 625.242 seconds
Nov 11, 2022 @ 10:18:08.281     10:18:08.281 [db-storage-cleanup-10-1] INFO  org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex - Deleted indexes for 0 entries from 13376 ledgers in 248.66 seconds
Nov 11, 2022 @ 14:27:06.949     14:27:06.949 [db-storage-cleanup-10-1] INFO  org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex - Deleted indexes for 0 entries from 11798 ledgers in 265.118 seconds
Nov 11, 2022 @ 19:45:40.593     19:45:40.593 [db-storage-cleanup-10-1] INFO  org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex - Deleted indexes for 0 entries from 15546 ledgers in 1133.236 seconds

The RocksDB location index deletes operation will cause RocksDB busy, and impact the read latency to a few seconds.

Currently, we delete RocksDB entries in batch, and the batch size is hard-coded. We can not change it.

Changes

Make the RocksDB delete entries in batch configurable.

if (deletedEntriesInBatch > deleteEntriesBatchSize) {
batch.flush();
batch.clear();
deletedEntriesInBatch = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nov 11, 2022 @ 05:49:39.336     05:49:39.336 [db-storage-cleanup-10-1] INFO  org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex - Deleted indexes for 0 entries from 16074 ledgers in 625.242 seconds

deletedEntries=0, does it mean that the condition of deletedEntriesInBatch > DELETE_ENTRIES_BATCH_SIZE has not entered yet, will adjusting the size of DELETE_ENTRIES_BATCH_SIZE affect this delay? It doesn't look like the read latency to a few seconds is caused by this situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deletedEntries=0 is a bug in the log, we have fixed it in the master branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I made a mistake. I have see it #3283

@wenbingshen
Copy link
Member

@hangc0276 I'm not very clear about the current bookkeeper configuration management, including whether we need to add instructions for this new config in the docs?

Personally, I think this configuration is helpful for performance. Should it be explained to users in the docs?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eolivelli eolivelli merged commit a3401a2 into apache:master Nov 17, 2022
hangc0276 added a commit that referenced this pull request Nov 18, 2022
…ance (#3653)

### Motivation
The entry log location index deletion is deleted in batches one by one currently, and it will have low performance. Refer to: #3646

Matteo has introduced deleteRange API a few years ago, but rollback due to RocksDB delete ranges bug. #1620.  The RocksDB bug has been addressed since 5.18.0 https://github.com/facebook/rocksdb/blob/main/HISTORY.md#5180-2018-11-30. We can bring the `deleteRange` API back to improve the entry log location deletion performance.

### Changes
Bring `deleteRange` API back for entry log location deletion.
hangc0276 added a commit that referenced this pull request Nov 21, 2022
…ance (#3653)

The entry log location index deletion is deleted in batches one by one currently, and it will have low performance. Refer to: #3646

Matteo has introduced deleteRange API a few years ago, but rollback due to RocksDB delete ranges bug. #1620.  The RocksDB bug has been addressed since 5.18.0 https://github.com/facebook/rocksdb/blob/main/HISTORY.md#5180-2018-11-30. We can bring the `deleteRange` API back to improve the entry log location deletion performance.

Bring `deleteRange` API back for entry log location deletion.

(cherry picked from commit 696919c)
zhaohaidao pushed a commit to zhaohaidao/bookkeeper that referenced this pull request Nov 21, 2022
zhaohaidao pushed a commit to zhaohaidao/bookkeeper that referenced this pull request Nov 21, 2022
…ance (apache#3653)

### Motivation
The entry log location index deletion is deleted in batches one by one currently, and it will have low performance. Refer to: apache#3646

Matteo has introduced deleteRange API a few years ago, but rollback due to RocksDB delete ranges bug. apache#1620.  The RocksDB bug has been addressed since 5.18.0 https://github.com/facebook/rocksdb/blob/main/HISTORY.md#5180-2018-11-30. We can bring the `deleteRange` API back to improve the entry log location deletion performance.

### Changes
Bring `deleteRange` API back for entry log location deletion.
nicoloboschi pushed a commit to datastax/bookkeeper that referenced this pull request Jan 11, 2023
…ance (apache#3653)

The entry log location index deletion is deleted in batches one by one currently, and it will have low performance. Refer to: apache#3646

Matteo has introduced deleteRange API a few years ago, but rollback due to RocksDB delete ranges bug. apache#1620.  The RocksDB bug has been addressed since 5.18.0 https://github.com/facebook/rocksdb/blob/main/HISTORY.md#5180-2018-11-30. We can bring the `deleteRange` API back to improve the entry log location deletion performance.

Bring `deleteRange` API back for entry log location deletion.

(cherry picked from commit 696919c)
(cherry picked from commit e159510)
yaalsn pushed a commit to yaalsn/bookkeeper that referenced this pull request Jan 30, 2023
yaalsn pushed a commit to yaalsn/bookkeeper that referenced this pull request Jan 30, 2023
…ance (apache#3653)

### Motivation
The entry log location index deletion is deleted in batches one by one currently, and it will have low performance. Refer to: apache#3646

Matteo has introduced deleteRange API a few years ago, but rollback due to RocksDB delete ranges bug. apache#1620.  The RocksDB bug has been addressed since 5.18.0 https://github.com/facebook/rocksdb/blob/main/HISTORY.md#5180-2018-11-30. We can bring the `deleteRange` API back to improve the entry log location deletion performance.

### Changes
Bring `deleteRange` API back for entry log location deletion.
hangc0276 added a commit that referenced this pull request Feb 8, 2023
### Motivations
This PR is to resolve the issue #3734 in branch-4.14 by following this suggestion. #3734 (comment)

### Modifications
1. Revert #3653
2. Bring #3646 to branch-4.14 to make delete entries batch size configurable.
nicoloboschi pushed a commit to datastax/bookkeeper that referenced this pull request Mar 13, 2023
…e#3768)

### Motivations
This PR is to resolve the issue apache#3734 in branch-4.14 by following this suggestion. apache#3734 (comment)

### Modifications
1. Revert apache#3653
2. Bring apache#3646 to branch-4.14 to make delete entries batch size configurable.

(cherry picked from commit e56d6d6)
nicoloboschi pushed a commit to datastax/bookkeeper that referenced this pull request Mar 13, 2023
…e#3768)

### Motivations
This PR is to resolve the issue apache#3734 in branch-4.14 by following this suggestion. apache#3734 (comment)

### Modifications
1. Revert apache#3653
2. Bring apache#3646 to branch-4.14 to make delete entries batch size configurable.

(cherry picked from commit e56d6d6)
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
…ance (apache#3653)

### Motivation
The entry log location index deletion is deleted in batches one by one currently, and it will have low performance. Refer to: apache#3646

Matteo has introduced deleteRange API a few years ago, but rollback due to RocksDB delete ranges bug. apache#1620.  The RocksDB bug has been addressed since 5.18.0 https://github.com/facebook/rocksdb/blob/main/HISTORY.md#5180-2018-11-30. We can bring the `deleteRange` API back to improve the entry log location deletion performance.

### Changes
Bring `deleteRange` API back for entry log location deletion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants