Fixed retention of keys in compaction#11287
Merged
codelipenghui merged 2 commits intoapache:masterfrom Jul 13, 2021
Merged
Conversation
Member
|
thanks for your contribution. For this PR, do we need to update docs? |
Contributor
codelipenghui
left a comment
There was a problem hiding this comment.
The change looks good to me, I just left some minor comments about the test.
pulsar-broker/src/test/java/org/apache/pulsar/compaction/CompactionRetentionTest.java
Outdated
Show resolved
Hide resolved
| * Compaction should retain expired keys in the compacted view | ||
| */ | ||
| @Test | ||
| public void testCompaction() throws Exception { |
Contributor
There was a problem hiding this comment.
For the test, I think we need to verify the raw data of the topic has been deleted. because the issue only happens when the raw data has been deleted.
|
|
||
| assertTrue(expectedKeys.contains(msg.getKey()), "Unexpected key: " + msg.getKey()); | ||
|
|
||
| System.err.println("Received: " + msg.getKey()); |
Contributor
There was a problem hiding this comment.
Is there any special reason here for using the System.err?
codelipenghui
approved these changes
Jul 13, 2021
codelipenghui
pushed a commit
that referenced
this pull request
Jul 14, 2021
This change fixes few issues in the compaction mechanism, the * When a reader is created, reading from "earliest" message, it should read the compacted data and then continue from the next message. * When the compaction consumer starts, it shouldn't seek to the beginning. This causes 2 issues: * Rescanning of the topic each time the compaction runs * Keys that are being dropped from the topic are also getting dropped from the compacted view, while in fact they should be there until explicitly deleted (with an empty message for a key). The main source of the problem is that when creating a cursor on "earliest" message, the cursor gets automatically adjusted on the earliest message available to read. This confuses the check for the read-compacted because it may think the reader/consumer is already ahead of the compaction horizon. Introduced a "isFirstRead" flag to make sure we double check the start message id and use `MessageId.earliest` instead of the earliest available message to read on the topic. After the first read, the positioning will be fine. (cherry picked from commit feb4ff1)
Technoboy-
added a commit
to Technoboy-/pulsar
that referenced
this pull request
Jul 20, 2021
Technoboy-
added a commit
to Technoboy-/pulsar
that referenced
this pull request
Jul 21, 2021
Technoboy-
added a commit
to Technoboy-/pulsar
that referenced
this pull request
Jul 23, 2021
codelipenghui
pushed a commit
that referenced
this pull request
Jul 23, 2021
codelipenghui
added a commit
to codelipenghui/incubator-pulsar
that referenced
this pull request
Jan 5, 2022
### Motivation To fix the reader skipping remaining compacted data while the topic been unloaded. apache#11287 fixed the data skipped issue while the reader first time to read the messages with the earliest position. But if the reader has consumed some messages from the compacted ledger but not all, the start position will not be `earliest`, the broker will rewind the cursor for the reader to the next valid position of the original topic. So the remaining messages in the compacted ledger will be skipped. Here are the logs from broker: ``` 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.pulsar.broker.service.BrokerService - Created topic persistent://xxx/product-full-prod/5126 - dedup is disabled 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://xxx/product-full-prod/5126][xxx] Creating non-durable subscription at msg id 181759:14:-1:-1 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.NonDurableCursorImpl - [xxx/product-full-prod/persistent/5126] Created non-durable cursor read-position=221199:0 mark-delete-position=181759:13 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [xxx/product-full-prod/persistent/5126] Opened new cursor: NonDurableCursorImpl{ledger=xxx/product-full-prod/persistent/5126, ackPos=181759:13, readPos=221199:0} 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [xxx/product-full-prod/persistent/5126-xxx] Rewind from 221199:0 to 221199:0 ``` There some many compacted messages after `181759:13`, but the broker will not dispatch them to the reader. The issue also can be reproduced by the unit test that added in this PR. ### Modification If the cursor with `readCompacted = true`, just rewind to the next message of the mark delete position, so that the reader can continue to read the data from the compacted ledger. ### Verification New test added for testing the reader can get all the compacted messages and non-compacted message from the topic during the topic unloading.
3 tasks
codelipenghui
added a commit
to codelipenghui/incubator-pulsar
that referenced
this pull request
Jan 7, 2022
### Motivation To fix the reader skipping remaining compacted data while the topic been unloaded. apache#11287 fixed the data skipped issue while the reader first time to read the messages with the earliest position. But if the reader has consumed some messages from the compacted ledger but not all, the start position will not be `earliest`, the broker will rewind the cursor for the reader to the next valid position of the original topic. So the remaining messages in the compacted ledger will be skipped. Here are the logs from broker: ``` 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.pulsar.broker.service.BrokerService - Created topic persistent://xxx/product-full-prod/5126 - dedup is disabled 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://xxx/product-full-prod/5126][xxx] Creating non-durable subscription at msg id 181759:14:-1:-1 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.NonDurableCursorImpl - [xxx/product-full-prod/persistent/5126] Created non-durable cursor read-position=221199:0 mark-delete-position=181759:13 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [xxx/product-full-prod/persistent/5126] Opened new cursor: NonDurableCursorImpl{ledger=xxx/product-full-prod/persistent/5126, ackPos=181759:13, readPos=221199:0} 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [xxx/product-full-prod/persistent/5126-xxx] Rewind from 221199:0 to 221199:0 ``` There some many compacted messages after `181759:13`, but the broker will not dispatch them to the reader. The issue also can be reproduced by the unit test that added in this PR. ### Modification If the cursor with `readCompacted = true`, just rewind to the next message of the mark delete position, so that the reader can continue to read the data from the compacted ledger. ### Verification New test added for testing the reader can get all the compacted messages and non-compacted message from the topic during the topic unloading.
codelipenghui
added a commit
that referenced
this pull request
Jan 7, 2022
…g. (#13629) ### Motivation To fix the reader skipping remaining compacted data while the topic has been unloaded. #11287 fixed the data skipped issue while the reader first time to read the messages with the earliest position. But if the reader has consumed some messages from the compacted ledger but not all, the start position will not be `earliest`, the broker will rewind the cursor for the reader to the next valid position of the original topic. So the remaining messages in the compacted ledger will be skipped. Here are the logs from the broker: ``` 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.pulsar.broker.service.BrokerService - Created topic persistent://xxx/product-full-prod/5126 - dedup is disabled 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://xxx/product-full-prod/5126][xxx] Creating non-durable subscription at msg id 181759:14:-1:-1 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.NonDurableCursorImpl - [xxx/product-full-prod/persistent/5126] Created non-durable cursor read-position=221199:0 mark-delete-position=181759:13 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [xxx/product-full-prod/persistent/5126] Opened new cursor: NonDurableCursorImpl{ledger=xxx/product-full-prod/persistent/5126, ackPos=181759:13, readPos=221199:0} 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [xxx/product-full-prod/persistent/5126-xxx] Rewind from 221199:0 to 221199:0 ``` There some many compacted messages after `181759:13`, but the broker will not dispatch them to the reader. The issue also can be reproduced by the unit test that was added in this PR. ### Modification If the cursor with `readCompacted = true`, just rewind to the next message of the mark delete position, so that the reader can continue to read the data from the compacted ledger. ### Verification A new test added for testing the reader can get all the compacted messages and non-compacted messages from the topic during the topic unloading.
codelipenghui
added a commit
that referenced
this pull request
Jan 10, 2022
…g. (#13629) ### Motivation To fix the reader skipping remaining compacted data while the topic has been unloaded. #11287 fixed the data skipped issue while the reader first time to read the messages with the earliest position. But if the reader has consumed some messages from the compacted ledger but not all, the start position will not be `earliest`, the broker will rewind the cursor for the reader to the next valid position of the original topic. So the remaining messages in the compacted ledger will be skipped. Here are the logs from the broker: ``` 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.pulsar.broker.service.BrokerService - Created topic persistent://xxx/product-full-prod/5126 - dedup is disabled 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://xxx/product-full-prod/5126][xxx] Creating non-durable subscription at msg id 181759:14:-1:-1 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.NonDurableCursorImpl - [xxx/product-full-prod/persistent/5126] Created non-durable cursor read-position=221199:0 mark-delete-position=181759:13 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [xxx/product-full-prod/persistent/5126] Opened new cursor: NonDurableCursorImpl{ledger=xxx/product-full-prod/persistent/5126, ackPos=181759:13, readPos=221199:0} 10:44:36.035 [bookkeeper-ml-scheduler-OrderedScheduler-4-0] INFO org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [xxx/product-full-prod/persistent/5126-xxx] Rewind from 221199:0 to 221199:0 ``` There some many compacted messages after `181759:13`, but the broker will not dispatch them to the reader. The issue also can be reproduced by the unit test that was added in this PR. ### Modification If the cursor with `readCompacted = true`, just rewind to the next message of the mark delete position, so that the reader can continue to read the data from the compacted ledger. ### Verification A new test added for testing the reader can get all the compacted messages and non-compacted messages from the topic during the topic unloading. (cherry picked from commit 07f131f)
eolivelli
pushed a commit
to eolivelli/pulsar
that referenced
this pull request
Feb 25, 2022
This change fixes few issues in the compaction mechanism, the * When a reader is created, reading from "earliest" message, it should read the compacted data and then continue from the next message. * When the compaction consumer starts, it shouldn't seek to the beginning. This causes 2 issues: * Rescanning of the topic each time the compaction runs * Keys that are being dropped from the topic are also getting dropped from the compacted view, while in fact they should be there until explicitly deleted (with an empty message for a key). The main source of the problem is that when creating a cursor on "earliest" message, the cursor gets automatically adjusted on the earliest message available to read. This confuses the check for the read-compacted because it may think the reader/consumer is already ahead of the compaction horizon. Introduced a "isFirstRead" flag to make sure we double check the start message id and use `MessageId.earliest` instead of the earliest available message to read on the topic. After the first read, the positioning will be fine. (cherry picked from commit feb4ff1) (cherry picked from commit 9c39726)
bharanic-dev
pushed a commit
to bharanic-dev/pulsar
that referenced
this pull request
Mar 18, 2022
### Motivation This change fixes few issues in the compaction mechanism, the * When a reader is created, reading from "earliest" message, it should read the compacted data and then continue from the next message. * When the compaction consumer starts, it shouldn't seek to the beginning. This causes 2 issues: * Rescanning of the topic each time the compaction runs * Keys that are being dropped from the topic are also getting dropped from the compacted view, while in fact they should be there until explicitly deleted (with an empty message for a key). The main source of the problem is that when creating a cursor on "earliest" message, the cursor gets automatically adjusted on the earliest message available to read. This confuses the check for the read-compacted because it may think the reader/consumer is already ahead of the compaction horizon. ### Modifications Introduced a "isFirstRead" flag to make sure we double check the start message id and use `MessageId.earliest` instead of the earliest available message to read on the topic. After the first read, the positioning will be fine.
3 tasks
Closed
2 tasks
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This change fixes few issues in the compaction mechanism, the
The main source of the problem is that when creating a cursor on "earliest" message, the cursor gets automatically adjusted on the earliest message available to read. This confuses the check for the read-compacted because it may think the reader/consumer is already ahead of the compaction horizon.
Modifications
Introduced a "isFirstRead" flag to make sure we double check the start message id and use
MessageId.earliestinstead of the earliest available message to read on the topic. After the first read, the positioning will be fine.